💬 LLM Apps
End-to-end LLM-powered applications. Prompt + context plumbing, structured outputs, retry & repair, user feedback loops.
The workflow
flowchart LR
A[User intent] --> B[Prompt + context<br/>assembly]
B --> C[LLM call<br/>structured output]
C --> D{Validation?}
D -->|Fail| E[Retry / repair]
E --> C
D -->|Pass| F[App action<br/>render, write, call]
F --> G[Telemetry<br/>+ user feedback]
Most "AI apps" are this loop with retries and validation. Streaming UX matters as much as model quality.
Key takeaways
Videos (47)
The New Code — Sean Grove, OpenAI
Treat natural-language specifications as the real source code; let models compile them into TypeScript, docs, or tests.
Pydantic is all you need: Jason Liu
Use Pydantic schemas plus function calling (via Instructor) so LLM outputs are typed, validatable Python objects rather than hand-parsed strings.
Make your LLM app a Domain Expert: How to Build an Expert System — Christopher Lovejoy, Anterior
In vertical AI, the surrounding system encoding domain expertise — not the model — determines who wins the last mile from 95% to 99%.
POC to PROD: Hard Lessons from 200+ Enterprise GenAI Deployments - Randall Hunt, Caylent
In enterprise GenAI, frame annotation, multimodal pooled embeddings, and boring infra (pgvector, Elasticsearch) beat exotic approaches.
Designing AI-Intensive Applications - swyx
AI engineering still lacks a canonical architectural pattern; evals, orchestration, and security are where production work lives beyond commoditized LLM calls.
Building an Agentic Platform — Ben Kus, CTO Box
For enterprises sitting on mountains of unstructured content, off-the-shelf generative models plus light preprocessing have already beaten the legacy IDP industry at structured extraction.
The Friction is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro, Earendil
Don't engineer friction out — friction is where human judgment lives; treat agent-written code as fast but emotionally indifferent.
Pydantic is STILL all you need: Jason Liu
Pydantic plus structured outputs remains the durable abstraction for LLM apps; validators with retries do the heavy lifting that prompt tricks cannot.
How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock
BlackRock scales internal AI by standardizing extraction, workflow, Q&A, and agentic patterns into a shared platform investment-ops teams reuse for each new domain app.
Teaching Gemini to Speak YouTube: Adapting LLMs for Video Recommendations to 2B+DAU - Devansh Tandon
Adapting a frontier LLM with semantically-quantized video tokens delivered novel cold-start recommendations at YouTube scale once serving cost was driven down 95%+.
Lessons From A Year Building With LLMs
The durable LLM-app playbook is the same iterative-improvement loop the industry has used for decades — anchored on evals, data, and putting real product in front of real users.
The 4 Patterns of AI Native Development — Patrick Debois
AI-native dev isn't sprinkled AI — it restructures the engineer's role into manager-of-agents, intent-author, idea-discoverer, and knowledge-curator.
No more bad outputs with structured generation: Remi Louf
Constrained decoding via logit masking gives you 99.9% valid JSON at zero inference cost — there is no reason left to YOLO unstructured outputs in production.
From Software Developer to AI Engineer: Antje Barth
Software devs become AI engineers by combining FM fundamentals with productivity assistants (Amazon Q) and managed model access (Bedrock Converse API).
Open Challenges for AI Engineering: Simon Willison
The GPT-4 moat has fallen but using these models well remains a hard, undocumented power-user skill that AI engineers must help others navigate.
How to Become an AI Engineer from a Fullstack Background - Reid Mayo
A focused 7-section curriculum (LLM overview → prompting → OpenAI → LangChain → evals → fine-tuning → advanced) plus Socratic ChatGPT tutoring lets full-stack engineers reach professional AI engineering quickly.
Using OSS models to build AI apps with millions of users — Hassan El Mghari
Million-user AI apps come from shipping single-API-call prototypes fast on OSS models, not from clever architecture.
[Workshop] AI Engineering 101
The minimum AI-engineer base layer is calling LLM APIs, embeddings, code gen, image gen, and STT — wired into a real product like a Telegram bot.
Building Blocks for LLM Systems & Products: Eugene Yan
Production LLM systems hinge on task-specific evals, position-aware retrieval, NLI-based hallucination guardrails, and feedback loops — not bigger context windows.
Open Questions for AI Engineering: Simon Willison
AI engineering's open questions — interface, safety, evals, and rapidly falling cost curves — define the discipline more than any single tool does.
Realtime Data Connectivity for AI: Tanmai Gopal
Give LLMs a unified SQL-like query interface, declarative authorization tied to data schema, and Python-as-plan execution rather than wrapping each API ad hoc.
Transforming search and discovery using LLMs — Tejaswi & Vinesh, Instacart
Plug LLMs in on top of engagement-derived candidates rather than letting them classify cold — the seeded prompt aligns LLM commonsense with real user-conversion data.
The Weekend AI Engineer: Hassan El Mghari
Ship trivial-but-useful AI apps as open source on weekends — simplicity, distribution, and being open-source compound far more than audience size.
What It Actually Takes to Deploy GenAI Applications to Enterprises: Arjun Bansal and Trey Doig
Enterprise GenAI deployment is mostly an accuracy and trust problem — you need configurable per-customer eval pipelines plus a 7-day path to ~95% accuracy before customers will renew.
Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase
Context delivery method and reasoning pattern (chain vs router vs agent loop) are the two design axes you choose deliberately when building LLM apps.
AI Engineering 201: The Rest of the Owl
Most LLM apps fail because teams over-invest in inference and under-invest in the LUI patterns, monitoring, and evals that turn a model into a product.
Human seeded Evals — Samuel Colvin, Pydantic
Static type safety + Pydantic structured-output validation + retry-on-validation-error makes agentic LLM apps refactorable and reliable.
360Brew: LLM-based Personalized Ranking and Recommendation - Hamed and Maziar, LinkedIn AI
One large LLM trained on promptified user behavior and distilled small can replace many specialized recommenders with strong cold-start gains.
Analyzing 10,000 Sales Calls With AI In 2 Weeks — Charlie Guo
AI-driven analysis of unstructured customer data is now a fortnight's solo project—if you spend on the smartest model and add prompt caching, structured outputs, and verifiable citations.
Scaling AI in Education: A Khanmigo case study: Shawn Jansepar
Khanmigo shows AI tutoring succeeds by reshaping UX around Socratic constraints, math-aware grading, and integrating teacher feedback loops rather than dumping chat interfaces on students.
Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner
Use Microsoft's free Azure pass plus the 'build-with-AI' template gallery to ship chat and RAG apps in minutes without spending your own money.
"Data readiness" is a Myth: Reliable AI with an Agentic Semantic Layer — Anushrut Gupta, PromptQL
Stop waiting for clean data — couple LLM plan generation with deterministic DSL execution and a steerable agentic semantic layer that learns tribal business knowledge over time.
What We Learned from Using LLMs in Pinterest — Mukuntha Narayanan, Han Wang, Pinterest
Fine-tuned LLMs on rich pin text (incl. VLM captions and user-engagement annotations) lift Pinterest search relevance 12-20%, productionized via knowledge distillation.
AI Didn't Kill the Web, It Moved in! — Olivier Leplus (AWS) & Yohan Lasorsa (Microsoft)
AI is permeating the entire web lifecycle from coding agents with skills, to browser-local inference, to apps designed for agent users.
AI Engineering Without Borders — swyx
Stop treating AI engineering tracks as silos — look for invariants and trends (context, cost, latency) that hold regardless of subspecialty.
Automating Escrow with USDC and AI - Corey Cooper, Circle
Marrying AI agents with USDC's programmable money rails enables automated, instantly-settling escrow that legacy T+2 payment systems can't match.
Navigating Challenges and Technical Debt in LLMs Deployment: Ahmed Menshawy
Enterprise LLM deployment lives or dies on RAG + scalable infra; auto-regressive models won't be AGI, so focus on operationalizing today's risks and unstructured data.
Building a Chess Coach — Anant Dole and Asbjorn Steinskog, Take Take Take
For domain skills LLMs can't do (chess), let dedicated systems compute truth and use the LLM purely as a grounded translator into natural language — and let Claude Code self-improve the pipeline via feedback.
Cognitive Exhaust Fumes, or: Read-Only AI Is Underrated — Šimon Podhajský, Head of AI, Waypoint
Read-only personal AI that only analyzes your data — never acts — produces deeper self-knowledge with bounded risk, and deserves to stand apart from agentic systems.
The LLM Triangle: Engineering Principles for Robust AI Applications - Almog Baku
Robust LLM apps come from translating expert SOPs into agent+code pipelines, treating the model as a procedurally-instructed intern rather than a magic black box.
How Intuit uses LLMs to explain taxes to millions of taxpayers - Jaspreet Singh, Intuit
At Intuit scale, regulated-domain LLM apps rely on multi-model GenOS, RAG/GraphRAG, fine-tuning, and tax-analyst-driven evals to stay accurate and upgradeable.
The Adversarial Path to the Personal Assistant: Sumit Agarwal
True personal AI requires aggregating a user's full data footprint via adversarial ETL — generic LLMs without that data give Google-plus answers.
The Robots are coming for your job, and that's okay - Elmer Thomas and Maria Bermudez
A small team can scale docs work by composing many narrow agents behind a lint-+CI-+human-review pipeline rather than betting on one mega-bot.
Lessons from building GenAI based applications — Juan Peredo
Building GenAI apps adds a whole AI architecture (model, hosting, eval, guardrails) on top of your stack — and ongoing model churn means evaluation never stops.
Buy Now, Maybe Pay Later: Dealing with Prompt-Tax While Staying at the Frontier - Andrew Thomspson
At the frontier you pay 'prompt tax' to migrate hundreds of domain prompts; favor prompting, embed domain experts, and rigorously experiment before each model switch.
My AI Thinks I'm Eating My Feelings (and Other Nutritional Insights) - Rami Alhamad
Real-time user feedback toasts beat formal evals for early consumer AI, and persistent per-user memory plus task chunking are what makes interactions feel fast and personal.
Creating and scaling your own custom copilots with Azure AI Studio: Hanchi Wang
Azure AI Studio + Prompt Flow gives enterprises framework-agnostic tracing (OpenTelemetry), evals, and monitoring to ship and maintain LLM copilots end-to-end.