← back
Human seeded Evals — Samuel Colvin, Pydantic
Takeaway
Static type safety + Pydantic structured-output validation + retry-on-validation-error makes agentic LLM apps refactorable and reliable.
Summary
- Defines agent = LLM loop with tools + system prompt + environment, but stresses the unsolved problem of when to exit the loop (plain text, final-result tools, structured outputs).
- Pydantic AI uses structured-output validation + automatic retry: validation errors are fed back to the model as instructions so it self-corrects.
- Type-safe agent generics: agent[deps_type, output_type] gives both static and runtime guarantees, enabling confident refactoring with coding agents.
- Memory example uses record_memory/retrieve_memory tools with typed RunContext.deps for typed DB access.
- Logfire (OpenTelemetry-based observability) traces each LLM call and tool exchange for debugging.
pydantic-aitype-safetystructured-outputs
Original description
In this talk I'll introduce the concept of Human-seeded Evals, explain the principle and demo them with Pydantic Logfire. ---related links--- https://x.com/samuel_colvin https://www.linkedin.com/in/samuel-colvin/ https://github.com/samuelcolvin https://pydantic.dev/