Lessons From A Year Building With LLMs

14.5K views · Jul 19, 2024 · 35:20 min · Watch on YouTube ↗

Takeaway

The durable LLM-app playbook is the same iterative-improvement loop the industry has used for decades — anchored on evals, data, and putting real product in front of real users.

Summary

Six co-authors (Bryan Bischof, Hamel Husain, Jason Liu, Charles Frye, Shreya Shankar, Eugene Yan) deliver the keynote version of their O'Reilly white paper across strategic, operational, and tactical lessons.
Strategy: the model is not your moat for almost anyone — build in your zone of genius, treat models like SaaS, drop them when a competitor beats them; high MMLU != product.
The unifying theme is iterative improvement centered on evals and looking at data — same loop as MLOps, DevOps, Lean Startup, and the Toyota Production System (genchi genbutsu).
Tactical advice: ship in beta to real users, capture binary feedback first, treat user requests as PMF signal, and remember 'value is only created when metal gets bent'.

llm-appsevalsmlops

Original description

Special double-feature closing keynote from the 6 authors of the hit O'Reilly article on Applied LLMs.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Eugene Yan
I build ML systems to serve customers at scale, and write to learn and teach.

About Shreya Shankar
I'm Shreya Shankar. I am a machine learning (ML) engineer and computer scientist in the Bay Area.
I am completing my PhD in data management systems for ML, with a human-centered focus. I am fortunate to be advised by Dr. Aditya Parameswaran at UC Berkeley. Go Bears! 🐻
I also consult on ML engineering and production AI strategy for enterprises. Prior to my PhD, I was the first ML engineer at a startup, did research engineering at Google Brain, and engineering at Facebook. Before all of that, I did my BS and MS in computer science at Stanford. Go Trees! 🌲

About Hamel Husain
Hamel Husain started working with language models five years ago when he led the team that created CodeSearchNet, a precursor to GitHub CoPilot. Since then, he has seen many successful and unsuccessful approaches to building LLM products. Hamel is also an active open source maintainer and contributor of a wide range of ML/AI projects. Hamel is currently an independent consultant.

About Jason Liu
Jason is an independent AI consultant, advisor, writer, and educator. His main interests are structured outputs, search and retrieval for RAG as well as understanding how to leverage AI to build scalable and valuable businesses.

About Bryan Bischof
Bryan Bischof is the Head of AI at Hex, where he leads the team of engineers building Magic—the data science and analytics copilot. Bryan has worked all over the data stack leading teams in analytics, machine learning engineering, data platform engineering, and AI engineering. He started the data team at Blue Bottle Coffee, led several projects at Stitch Fix, and built the data teams at Weights and Biases. Bryan previously co-authored the book Building Production Recommendation Systems with O’Reilly, and teaches Data Science and Analytics in the graduate school at Rutgers. His Ph.D. is in pure mathematics.

About Charles Frye
AI Engineer at Modal Labs. Building useful technology with large neural networks.

00:00 Introduction
03:22 Strategic: Bryan Bischof & Charles Frye
14:47 Operational: Hamel Husain & Jason Liu
23:51 Tactical: Eugene Yan & Shreya Shankar