Retrieval Augmented Generation in the Wild: Anton Troynikov

3.5K views · Nov 15, 2023 · 12:19 min · Watch on YouTube ↗

Takeaway

Production RAG needs dynamic, feedback-driven memory plus smart chunking/relevance — vector search alone is just the open-loop baseline.

Summary

Standard RAG loop is just 'open-loop' retrieval; future agentic apps need dynamic memory that supports human feedback, agent self-updates, and world models.
Distractors in context cause performance to fall off a cliff — relevance is about returning all relevant AND no irrelevant info.
Embedding-model choice may matter less than thought: same-objective models learn similar representations up to a linear transform.
Chunking experiments: use a small LM's next-token perplexity over a sliding window to find semantic boundaries, then fine-tune for your data; also try embedding-continuity discontinuities.
Cites Voyager (Nvidia) Minecraft agent learning skills via Chroma-backed retrieval, including from human demonstration.

ragchromaretrieval

Original description

In the last few months, we've seen an explosion of the use of retrieval in the context of AI. Document question answering, autonomous agents, and more use embeddings-based retrieval systems in a variety of ways. This talk will cover what we've learned building for these applications, the challenges developers face, and the future of retrieval in the context of AI.

Recorded live in San Francisco at the AI Engineer Summit 2023. See the full schedule of talks at https://ai.engineer/summit/schedule & join us at the AI Engineer World's Fair in 2024! Get your tickets today at https://ai.engineer/worlds-fair

About Anton Troynikov
Anton is the co-founder of Chroma. He does not believe AI will kill us all. Chroma build an open-source embeddings store, specifically built for AI-native applications.