← back

Building Production-Ready RAG Applications: Jerry Liu

405.4K views · Nov 15, 2023 · 18:34 min · Watch on YouTube ↗
Takeaway

Naive top-k vector retrieval rarely survives production; treat RAG as a tunable pipeline that you evaluate component-by-component before adding complexity.

Summary

  • LlamaIndex CEO Jerry Liu maps the naive RAG stack (ingestion + retrieval + synthesis) and the failure modes that block production: low precision, low recall, hallucination, outdated context, and lost-in-the-middle.
  • Evaluation must run end-to-end and at the retrieval-component level using IR metrics (hit rate, MRR, NDCG) plus LLM-as-judge over synthetic or human-labeled QA datasets.
  • Improvement ladder starts with table-stakes techniques (smarter chunking, better embeddings, hybrid retrieval) before harder fixes like agentic query decomposition and routing across data sources.
  • Suggests using LLMs for reasoning over data, not just generation — query planning, sub-question decomposition, and data-source routing.
ragllamaindexevals
Original description
Large Language Models (LLM's) are starting to revolutionize how users can search for, interact with, and generate new content. Some recent stacks and toolkits around Retrieval Augmented Generation (RAG) have emerged where users are building applications such as chatbots using LLMs on their own private data. This opens the door to a vast array of applications. However while setting up a naive RAG stack is easy, productionizing it is hard. In this talk, we talk about core techniques for evaluating and improving your retrieval systems for better performing RAG.

Recorded live in San Francisco at the AI Engineer Summit 2023. See the full schedule of talks at https://ai.engineer/summit/schedule & join us at the AI Engineer World's Fair in 2024! Get your tickets today at https://ai.engineer/worlds-fair

About Jerry Liu
Jerry Liu, the co-founder and CEO of LlamaIndex, brings a wealth of expertise to his role, with a career that spans the realms of ML engineering, AI research, and startups. Prior to his current position, he served as an ML engineer at Quora and engaged in AI research with Uber's ATG. A Princeton alumnus, Jerry's professional journey has been enriched by various publications, including his most recent works: Deep Structured Reactive Planning and MuSCLE: Multi Sweep Compression of LiDAR using Deep Entropy Models, reflecting his commitment to the field.