Building Blocks for LLM Systems & Products: Eugene Yan

5.8K views · Nov 02, 2023 · 17:23 min · Watch on YouTube ↗

Takeaway

Production LLM systems hinge on task-specific evals, position-aware retrieval, NLI-based hallucination guardrails, and feedback loops — not bigger context windows.

Summary

Four building blocks: evals, retrieval-augmented generation, guardrails, and collecting user feedback
Evals: start small (Teknium's 40-question domain set), simplify task to deterministic checks (SQL execution, JSON keys) when possible, eyeball outputs even with automation
RAG pitfalls: 'lost in the middle' — answers in middle positions perform worse than no retrieval; LLMs can't tell when retrieved context is irrelevant (Twilight/sci-fi example)
Guardrails: use NLI-based factual-consistency models from summarization literature to detect hallucinations, plus moderation APIs

ragevalsguardrails

Original description

“There is a large class of problems that are easy to imagine and build demos for, but extremely hard to make products out of. For example, self-driving: It’s easy to demo a car self-driving around a block, but making it into a product takes a decade.” - Andrej Karpathy

This talk is about practical patterns for integrating large language models (LLMs) into systems and products. We’ll draw from academic research, industry resources, and practitioner know-how, and try to distill them into key ideas and practices. There are seven key patterns. I’ve also organized them along the spectrum of improving performance vs. reducing cost/risk, and closer to the data vs. closer to the user.

Evals: To measure performance
RAG: To add recent, external knowledge
Fine-tuning: To get better at specific tasks
Caching: To reduce latency & cost
Guardrails: To ensure output quality
Defensive UX: To anticipate & manage errors gracefully
Collect user feedback: To build our data flywheel

Recorded live in San Francisco at the AI Engineer Summit 2023. See the full schedule of talks at https://ai.engineer/summit/schedule & join us at the AI Engineer World's Fair in 2024! Get your tickets today at https://ai.engineer/worlds-fair

About Eugene Yan
Eugene Yan designs, builds, and operates machine learning systems that serve customers at scale. He's currently a Senior Applied Scientist at Amazon. Previously, he led machine learning at Lazada (acquired by Alibaba) and a Healthtech Series A. He writes & speaks about ML systems, engineering, and career at eugeneyan.com and https://ApplyingML.com