← back
Building Blocks for LLM Systems & Products: Eugene Yan
Takeaway
Production LLM systems hinge on task-specific evals, position-aware retrieval, NLI-based hallucination guardrails, and feedback loops — not bigger context windows.
Summary
- Four building blocks: evals, retrieval-augmented generation, guardrails, and collecting user feedback
- Evals: start small (Teknium's 40-question domain set), simplify task to deterministic checks (SQL execution, JSON keys) when possible, eyeball outputs even with automation
- RAG pitfalls: 'lost in the middle' — answers in middle positions perform worse than no retrieval; LLMs can't tell when retrieved context is irrelevant (Twilight/sci-fi example)
- Guardrails: use NLI-based factual-consistency models from summarization literature to detect hallucinations, plus moderation APIs
ragevalsguardrails
Original description
“There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.” - Andrej Karpathy This talk is about practical patterns for integrating large language models (LLMs) into systems and products. We’ll draw from academic research, industry resources, and practitioner know-how, and try to distill them into key ideas and practices. There are seven key patterns. I’ve also organized them along the spectrum of improving performance vs. reducing cost/risk, and closer to the data vs. closer to the user. Evals: To measure performance RAG: To add recent, external knowledge Fine-tuning: To get better at specific tasks Caching: To reduce latency & cost Guardrails: To ensure output quality Defensive UX: To anticipate & manage errors gracefully Collect user feedback: To build our data flywheel Recorded live in San Francisco at the AI Engineer Summit 2023. See the full schedule of talks at https://ai.engineer/summit/schedule & join us at the AI Engineer World's Fair in 2024! Get your tickets today at https://ai.engineer/worlds-fair About Eugene Yan Eugene Yan designs, builds, and operates machine learning systems that serve customers at scale. He's currently a Senior Applied Scientist at Amazon. Previously, he led machine learning at Lazada (acquired by Alibaba) and a Healthtech Series A. He writes & speaks about ML systems, engineering, and career at eugeneyan.com and https://ApplyingML.com