From Self-driving to Autonomous Voice Agents — Brooke Hopkins, Coval

Original: From Self-driving to Autonomous Voice Agents — Brooke Hopkins, Coval

1.8K views · Jul 31, 2025 · 19:32 min · Watch on YouTube ↗

Takeaway

Borrow self-driving's large-scale probabilistic simulation playbook to escape voice-agent POC purgatory and ship reliable conversational systems.

Summary

Hopkins (ex-Waymo eval infra, founder of Coval) maps Waymo's eval evolution onto voice agents: manual drives → brittle scenario tests → large-scale simulation measuring event probabilities across thousands of runs.
Trust gap (overestimating call-volume automation while underestimating quality) keeps voice agents in POC hell; large-scale simulation is the unlock that made Waymo expand to new cities and works for voice too.
Pushes reference-free probabilistic evals over input/output golden sets: measure aggregate metrics like 'resolves user inquiry' or 'repeats itself' across simulated conversations rather than checklists.
Replicates Waymo's continuous virtuous loop: simulate → fix → regression set → presubmit/postsubmit CI → release → live monitoring → feed back into simulation; manual evals reserved for human-judgment cases.

voice-agentsevalssimulation

Original description

The reliability challenges facing voice & chat AI deployment today mirror those that the autonomous vehicle industry confronted years ago. This talk explores how evaluation methodologies developed for self-driving cars can be transferred to create autonomous, self-improving evaluation systems for conversational AI. Drawing from my experience building evaluation infrastructure at Waymo and now developing Coval, an enterprise-grade reliability platform for conversational agents, I'll demonstrate how systematic testing infrastructure is not just a technical requirement but a competitive advantage in the rapidly evolving AI landscape.

--

 
Brooke Hopkins is the Founder at Coval, where her team builds the enterprise-grade reliability infrastructure for conversational AI. Previously, she built evaluation systems at Waymo that helped enable safe autonomous driving. With experience spanning both physical and digital AI domains, Brooke brings unique insights into creating robust testing frameworks that can scale with AI's rapid development.