Agent Optimization with Pydantic AI: GEPA, Evals, Feedback Loops — Samuel Colvin, Pydantic

3.9K views · May 07, 2026 · 80:40 min · Watch on YouTube ↗

Takeaway

Treat prompts as managed variables and use GEPA-style genetic optimization with golden-set evals to systematically improve agent reliability.

Summary

Combines GEPA (Genetic-Pareto optimization library that breeds prompts/JSON via a Pareto-frontier of best candidates) with Pydantic Logfire managed variables.
Real task: detecting ancestral political relations of UK MPs from Wikipedia HTML using Pydantic AI structured outputs — models reliably over-include spouses/children.
Workflow: define structured output schema → golden dataset (relations curated with Opus 4.6) → Logfire managed variables hold optimizable prompt → GEPA breeds toward higher eval score.
Argues 'AI observability' is a feature of generic observability (OpenTelemetry logs/metrics/traces), not a category.
Demonstrates a manual loop today that platform-level autonomous agent optimization will eventually subsume.

pydantic-aigepaprompt-optimization

Original description

Deploying an agent is only the start. In this workshop, Samuel Colvin shows how to improve agents after they are already live, using Pydantic AI and Logfire to change prompts, models, and other parameters in production without redeploying or restarting services.

The session covers managed variables for live prompt and model updates, how to run evals and compare prompt variants against real datasets, and how GEPA can be used to evolve better prompts from production traces and feedback signals. If you're building agents in production and want a practical path from manual tuning to continuous optimization, this is a strong hands-on walkthrough.

Speaker info:
- https://x.com/samuelcolvin
- https://www.linkedin.com/in/samuel-colvin/
- https://github.com/samuelcolvin

Timestamps:
0:00 Introduction to Samuel Colvin and the Pydantic ecosystem
1:29 Overview of GEPA for prompt optimization
3:02 Introduction to Logfire managed variables
3:55 Case study: Analyzing political dynasties using Wikipedia data
10:04 Getting started: Setting up the environment and API keys
16:55 Running the initial evaluation (evals) against a golden dataset
25:16 Comparing different prompt performance
34:00 Running the full GEPA optimization process
43:43 Q&A: Handling prompt size and systemic errors
57:01 Demonstrating managed variables in a FastAPI web server
1:11:06 Discussing implicit user feedback collection
1:15:42 Q&A: Real-world internal use cases and context engineering