← all topics

💬 LLM Apps

End-to-end LLM-powered applications. Prompt + context plumbing, structured outputs, retry & repair, user feedback loops.

47 videos · agentsllm-appsevalsragai-engineeringstructured-outputs

The workflow

flowchart LR
    A[User intent] --> B[Prompt + context<br/>assembly]
    B --> C[LLM call<br/>structured output]
    C --> D{Validation?}
    D -->|Fail| E[Retry / repair]
    E --> C
    D -->|Pass| F[App action<br/>render, write, call]
    F --> G[Telemetry<br/>+ user feedback]

Most "AI apps" are this loop with retries and validation. Streaming UX matters as much as model quality.

Key takeaways

Treat natural-language specifications as the real source code; let models compile them into TypeScript, docs, or tests.

Use Pydantic schemas plus function calling (via Instructor) so LLM outputs are typed, validatable Python objects rather than hand-parsed strings.

In vertical AI, the surrounding system encoding domain expertise — not the model — determines who wins the last mile from 95% to 99%.

In enterprise GenAI, frame annotation, multimodal pooled embeddings, and boring infra (pgvector, Elasticsearch) beat exotic approaches.

AI engineering still lacks a canonical architectural pattern; evals, orchestration, and security are where production work lives beyond commoditized LLM calls.

For enterprises sitting on mountains of unstructured content, off-the-shelf generative models plus light preprocessing have already beaten the legacy IDP industry at structured extraction.

Videos (47)

The New Code — Sean Grove, OpenAI

Treat natural-language specifications as the real source code; let models compile them into TypeScript, docs, or tests.

1.1M views · Jul 11, 2025

Pydantic is all you need: Jason Liu

Use Pydantic schemas plus function calling (via Instructor) so LLM outputs are typed, validatable Python objects rather than hand-parsed strings.

227.2K views · Nov 01, 2023

Make your LLM app a Domain Expert: How to Build an Expert System — Christopher Lovejoy, Anterior

In vertical AI, the surrounding system encoding domain expertise — not the model — determines who wins the last mile from 95% to 99%.

85.9K views · Jul 28, 2025

POC to PROD: Hard Lessons from 200+ Enterprise GenAI Deployments - Randall Hunt, Caylent

In enterprise GenAI, frame annotation, multimodal pooled embeddings, and boring infra (pgvector, Elasticsearch) beat exotic approaches.

39.5K views · Jul 23, 2025

Designing AI-Intensive Applications - swyx

AI engineering still lacks a canonical architectural pattern; evals, orchestration, and security are where production work lives beyond commoditized LLM calls.

27.3K views · Aug 09, 2025

Building an Agentic Platform — Ben Kus, CTO Box

For enterprises sitting on mountains of unstructured content, off-the-shelf generative models plus light preprocessing have already beaten the legacy IDP industry at structured extraction.

26.4K views · Aug 24, 2025

The Friction is Your Judgment — Armin Ronacher & Cristina Poncela Cubeiro, Earendil

Don't engineer friction out — friction is where human judgment lives; treat agent-written code as fast but emotionally indifferent.

25.6K views · Apr 18, 2026

Pydantic is STILL all you need: Jason Liu

Pydantic plus structured outputs remains the durable abstraction for LLM apps; validators with retries do the heavy lifting that prompt tricks cannot.

22.6K views · Sep 06, 2024

How BlackRock Builds Custom Knowledge Apps at Scale — Vaibhav Page & Infant Vasanth, BlackRock

BlackRock scales internal AI by standardizing extraction, workflow, Q&A, and agentic patterns into a shared platform investment-ops teams reuse for each new domain app.

18.4K views · Aug 23, 2025

Teaching Gemini to Speak YouTube: Adapting LLMs for Video Recommendations to 2B+DAU - Devansh Tandon

Adapting a frontier LLM with semantically-quantized video tokens delivered novel cold-start recommendations at YouTube scale once serving cost was driven down 95%+.

16.6K views · Jul 16, 2025

Lessons From A Year Building With LLMs

The durable LLM-app playbook is the same iterative-improvement loop the industry has used for decades — anchored on evals, data, and putting real product in front of real users.

14.5K views · Jul 19, 2024

The 4 Patterns of AI Native Development — Patrick Debois

AI-native dev isn't sprinkled AI — it restructures the engineer's role into manager-of-agents, intent-author, idea-discoverer, and knowledge-curator.

11.3K views · Jun 04, 2025

No more bad outputs with structured generation: Remi Louf

Constrained decoding via logit masking gives you 99.9% valid JSON at zero inference cost — there is no reason left to YOLO unstructured outputs in production.

10.7K views · Oct 14, 2024

From Software Developer to AI Engineer: Antje Barth

Software devs become AI engineers by combining FM fundamentals with productivity assistants (Amazon Q) and managed model access (Bedrock Converse API).

9.3K views · Jul 24, 2024

Open Challenges for AI Engineering: Simon Willison

The GPT-4 moat has fallen but using these models well remains a hard, undocumented power-user skill that AI engineers must help others navigate.

8.9K views · Jul 17, 2024

How to Become an AI Engineer from a Fullstack Background - Reid Mayo

A focused 7-section curriculum (LLM overview → prompting → OpenAI → LangChain → evals → fine-tuning → advanced) plus Socratic ChatGPT tutoring lets full-stack engineers reach professional AI engineering quickly.

7.7K views · Feb 02, 2024

Using OSS models to build AI apps with millions of users — Hassan El Mghari

Million-user AI apps come from shipping single-API-call prototypes fast on OSS models, not from clever architecture.

6.4K views · Jul 15, 2025

[Workshop] AI Engineering 101

The minimum AI-engineer base layer is calling LLM APIs, embeddings, code gen, image gen, and STT — wired into a real product like a Telegram bot.

5.9K views · Nov 06, 2023

Building Blocks for LLM Systems & Products: Eugene Yan

Production LLM systems hinge on task-specific evals, position-aware retrieval, NLI-based hallucination guardrails, and feedback loops — not bigger context windows.

5.8K views · Nov 02, 2023

Open Questions for AI Engineering: Simon Willison

AI engineering's open questions — interface, safety, evals, and rapidly falling cost curves — define the discipline more than any single tool does.

5.5K views · Nov 25, 2023

Realtime Data Connectivity for AI: Tanmai Gopal

Give LLMs a unified SQL-like query interface, declarative authorization tied to data schema, and Python-as-plan execution rather than wrapping each API ad hoc.

5.1K views · Oct 11, 2024

Transforming search and discovery using LLMs — Tejaswi & Vinesh, Instacart

Plug LLMs in on top of engagement-derived candidates rather than letting them classify cold — the seeded prompt aligns LLM commonsense with real user-conversion data.

4.6K views · Jul 16, 2025

The Weekend AI Engineer: Hassan El Mghari

Ship trivial-but-useful AI apps as open source on weekends — simplicity, distribution, and being open-source compound far more than audience size.

4.2K views · Nov 22, 2023

What It Actually Takes to Deploy GenAI Applications to Enterprises: Arjun Bansal and Trey Doig

Enterprise GenAI deployment is mostly an accuracy and trust problem — you need configurable per-customer eval pipelines plus a 7-day path to ~95% accuracy before customers will renew.

3.5K views · Nov 04, 2024

Building Context-Aware Reasoning Applications with LangChain and LangSmith: Harrison Chase

Context delivery method and reasoning pattern (chain vs router vs agent loop) are the two design axes you choose deliberately when building LLM apps.

3.5K views · Nov 01, 2023

AI Engineering 201: The Rest of the Owl

Most LLM apps fail because teams over-invest in inference and under-invest in the LUI patterns, monitoring, and evals that turn a model into a product.

3.3K views · Nov 08, 2023

Human seeded Evals — Samuel Colvin, Pydantic

Static type safety + Pydantic structured-output validation + retry-on-validation-error makes agentic LLM apps refactorable and reliable.

3.2K views · Jul 25, 2025

360Brew: LLM-based Personalized Ranking and Recommendation - Hamed and Maziar, LinkedIn AI

One large LLM trained on promptified user behavior and distilled small can replace many specialized recommenders with strong cold-start gains.

2.8K views · Jul 16, 2025

Analyzing 10,000 Sales Calls With AI In 2 Weeks — Charlie Guo

AI-driven analysis of unstructured customer data is now a fortnight's solo project—if you spend on the smartest model and add prompt caching, structured outputs, and verifiable citations.

2.5K views · Jun 03, 2025

Scaling AI in Education: A Khanmigo case study: Shawn Jansepar

Khanmigo shows AI tutoring succeeds by reshaping UX around Socratic constraints, math-aware grading, and integrating teacher feedback loops rather than dumping chat interfaces on students.

2.4K views · Feb 05, 2025

Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner

Use Microsoft's free Azure pass plus the 'build-with-AI' template gallery to ship chat and RAG apps in minutes without spending your own money.

2.3K views · Aug 14, 2024

"Data readiness" is a Myth: Reliable AI with an Agentic Semantic Layer — Anushrut Gupta, PromptQL

Stop waiting for clean data — couple LLM plan generation with deterministic DSL execution and a steerable agentic semantic layer that learns tribal business knowledge over time.

2.2K views · Jun 27, 2025

What We Learned from Using LLMs in Pinterest — Mukuntha Narayanan, Han Wang, Pinterest

Fine-tuned LLMs on rich pin text (incl. VLM captions and user-engagement annotations) lift Pinterest search relevance 12-20%, productionized via knowledge distillation.

2.1K views · Jul 16, 2025

AI Didn't Kill the Web, It Moved in! — Olivier Leplus (AWS) & Yohan Lasorsa (Microsoft)

AI is permeating the entire web lifecycle from coding agents with skills, to browser-local inference, to apps designed for agent users.

2.1K views · Apr 10, 2026

AI Engineering Without Borders — swyx

Stop treating AI engineering tracks as silos — look for invariants and trends (context, cost, latency) that hold regardless of subspecialty.

2.1K views · Oct 30, 2024

Automating Escrow with USDC and AI - Corey Cooper, Circle

Marrying AI agents with USDC's programmable money rails enables automated, instantly-settling escrow that legacy T+2 payment systems can't match.

2.0K views · Jul 14, 2025

Navigating Challenges and Technical Debt in LLMs Deployment: Ahmed Menshawy

Enterprise LLM deployment lives or dies on RAG + scalable infra; auto-regressive models won't be AGI, so focus on operationalizing today's risks and unstructured data.

2.0K views · Dec 31, 2024

Building a Chess Coach — Anant Dole and Asbjorn Steinskog, Take Take Take

For domain skills LLMs can't do (chess), let dedicated systems compute truth and use the LLM purely as a grounded translator into natural language — and let Claude Code self-improve the pipeline via feedback.

1.9K views · May 13, 2026

Cognitive Exhaust Fumes, or: Read-Only AI Is Underrated — Šimon Podhajský, Head of AI, Waypoint

Read-only personal AI that only analyzes your data — never acts — produces deeper self-knowledge with bounded risk, and deserves to stand apart from agentic systems.

1.8K views · Apr 08, 2026

The LLM Triangle: Engineering Principles for Robust AI Applications - Almog Baku

Robust LLM apps come from translating expert SOPs into agent+code pipelines, treating the model as a procedurally-instructed intern rather than a magic black box.

1.5K views · Feb 22, 2025

How Intuit uses LLMs to explain taxes to millions of taxpayers - Jaspreet Singh, Intuit

At Intuit scale, regulated-domain LLM apps rely on multi-model GenOS, RAG/GraphRAG, fine-tuning, and tax-analyst-driven evals to stay accurate and upgradeable.

1.1K views · Jul 23, 2025

The Adversarial Path to the Personal Assistant: Sumit Agarwal

True personal AI requires aggregating a user's full data footprint via adversarial ETL — generic LLMs without that data give Google-plus answers.

977 views · Feb 15, 2025

The Robots are coming for your job, and that's okay - Elmer Thomas and Maria Bermudez

A small team can scale docs work by composing many narrow agents behind a lint-+CI-+human-review pipeline rather than betting on one mega-bot.

919 views · Jun 03, 2025

Lessons from building GenAI based applications — Juan Peredo

Building GenAI apps adds a whole AI architecture (model, hosting, eval, guardrails) on top of your stack — and ongoing model churn means evaluation never stops.

831 views · Feb 22, 2025

Buy Now, Maybe Pay Later: Dealing with Prompt-Tax While Staying at the Frontier - Andrew Thomspson

At the frontier you pay 'prompt tax' to migrate hundreds of domain prompts; favor prompting, embed domain experts, and rigorously experiment before each model switch.

391 views · Jun 03, 2025

My AI Thinks I'm Eating My Feelings (and Other Nutritional Insights) - Rami Alhamad

Real-time user feedback toasts beat formal evals for early consumer AI, and persistent per-user memory plus task chunking are what makes interactions feel fast and personal.

365 views · Jun 03, 2025

Creating and scaling your own custom copilots with Azure AI Studio: Hanchi Wang

Azure AI Studio + Prompt Flow gives enterprises framework-agnostic tracing (OpenTelemetry), evals, and monitoring to ship and maintain LLM copilots end-to-end.

303 views · Feb 06, 2025