← back

Human seeded Evals — Samuel Colvin, Pydantic

3.2K views · Jul 25, 2025 · 12:02 min · Watch on YouTube ↗
Takeaway

Static type safety + Pydantic structured-output validation + retry-on-validation-error makes agentic LLM apps refactorable and reliable.

Summary

  • Defines agent = LLM loop with tools + system prompt + environment, but stresses the unsolved problem of when to exit the loop (plain text, final-result tools, structured outputs).
  • Pydantic AI uses structured-output validation + automatic retry: validation errors are fed back to the model as instructions so it self-corrects.
  • Type-safe agent generics: agent[deps_type, output_type] gives both static and runtime guarantees, enabling confident refactoring with coding agents.
  • Memory example uses record_memory/retrieve_memory tools with typed RunContext.deps for typed DB access.
  • Logfire (OpenTelemetry-based observability) traces each LLM call and tool exchange for debugging.
pydantic-aitype-safetystructured-outputs
Original description
In this talk I'll introduce the concept of Human-seeded Evals, explain the principle and demo them with Pydantic Logfire.

---related links---

https://x.com/samuel_colvin
https://www.linkedin.com/in/samuel-colvin/
https://github.com/samuelcolvin
https://pydantic.dev/