Layering every technique in RAG, one query at a time - David Karam, Pi Labs (fmr. Google Search)

17.7K views · Jul 29, 2025 · 20:22 min · Watch on YouTube ↗

Takeaway

Improve RAG by inspecting failing real queries and adding only the technique that fixes that class, layer by layer.

Summary

David Karam (ex-Google Search, Pi Labs) frames RAG improvement as outcome-driven 'quality engineering' rather than chasing techniques.
Walks through a layered stack: query understanding, retrieval (sparse+dense hybrid), reranking, generation grounding and post-hoc checks.
Argues evals on real cases — not benchmarks — drive what to add next; every technique should be justified by a failing query class.
Covers when to add query rewriting, HyDE, multi-vector, reranking, and self-RAG style validation in production search systems.

ragretrievalevals

Original description

Start with the simplest Search - in-memory embeddings with relevance ranking. End with the most complex planet-scale Search - 70+ corpus mix of token, embeddings, and knowledge graphs, all jointly retrieved, custom ranked, joint re-ranked, and then LLM-processed, at 160,000 queries per second in under 200msec.

This talk will be a fun “one query at a time” survey of all techniques in RAG in incremental complexity, showing the limits of each technique and what the next layered one opens up in terms of capabilities to handle ever-more complex queries in RAG. You’ll learn why queries like [falafel] are notoriously hard to Search over, why chunking your documents can be disastrous, how you can sometimes can get away with a simple bm25, and how some Search problems are so hard to solve that you’re better off punting the problem to the LLM or the UX. Brought to you by the team that worked on 50+ Search products, in the context of Google.com and custom Enterprise Search.

About David Karam
I'm David K. I love straddling the line between deep tech research and application development. I’ve spent a decade at Google as Product Director working on Search’s core AI and NLU systems, helping Search’s own version of “AI Engineers” develop magical applications. Around a year ago I left with my cofounder to start Pi Labs where we’re trying to bring that same spirit to the rest of the industry. Outside work I love to read, cook, and spend time in nature.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter

Timestamps:
00:00 Introduction and Context
01:41 Quality Engineering Loop and Mindset
04:09 In-Memory Retrieval
04:50 Term-Based Retrieval (BM25)
05:18 Relevance Embeddings (Vector Search)
06:15 Re-Rankers (Cross Encoders)
07:59 Custom Embeddings
09:40 Domain-Specific Ranking Signals
11:09 User Preference Signals
12:17 Query Orchestration (Fan Out)
14:26 Supplementary Retrieval
16:09 Distillation
17:14 Punting the Problem and Graceful Degradation