Human seeded Evals — Samuel Colvin, Pydantic

3.2K views · Jul 25, 2025 · 12:02 min · Watch on YouTube ↗

Takeaway

Static type safety + Pydantic structured-output validation + retry-on-validation-error makes agentic LLM apps refactorable and reliable.

Summary

Defines agent = LLM loop with tools + system prompt + environment, but stresses the unsolved problem of when to exit the loop (plain text, final-result tools, structured outputs).
Pydantic AI uses structured-output validation + automatic retry: validation errors are fed back to the model as instructions so it self-corrects.
Type-safe agent generics: agent[deps_type, output_type] gives both static and runtime guarantees, enabling confident refactoring with coding agents.
Memory example uses record_memory/retrieve_memory tools with typed RunContext.deps for typed DB access.
Logfire (OpenTelemetry-based observability) traces each LLM call and tool exchange for debugging.

pydantic-aitype-safetystructured-outputs

Original description

In this talk I'll introduce the concept of Human-seeded Evals, explain the principle and demo them with Pydantic Logfire.

---related links---

https://x.com/samuel_colvin
https://www.linkedin.com/in/samuel-colvin/
https://github.com/samuelcolvin
https://pydantic.dev/