← back
2025 is the Year of Evals! Just like 2024, and 2023, and … — John Dickerson, CEO Mozilla AI
Takeaway
Agentic systems taking real actions are finally forcing evaluation tooling out of CIO-only sales and into board-level enterprise budgets in 2025.
Summary
- John Dickerson (CEO Mozilla AI, ex-Arthur AI) argues 2025 is finally the year of evals because three forces converge: ChatGPT making AI legible to CFOs/CEOs, post-2022 enterprise budget unfreeze for genAI pet projects, and agentic systems now acting autonomously.
- Pre-ChatGPT, ML monitoring (H2O, Algorithmia, Seldon, Fiddler, Arize, Arthur, WhyLabs, Galileo) had only lip-service ROI tied to KPIs and sold mainly into CIOs.
- Now evaluation is mandatory because agents take real actions; players like Braintrust, Arize, Galileo, Arthur are seeing hockey-stick growth.
- Frames monitoring and evaluation as two sides of the same sword: you can't observe without measurement.
evalsobservabilityenterprise-ai
Original description
AI is getting deployed without guardrails, without governance, without due diligence. Surely this is the year we’ll see a Fortune 500 CEO fired because of a preventable AI incident. Surely this is the year we’ll see enterprises wake up to pre-deployment evaluation and post-deployment monitoring being an urgent need. This story hasn’t changed for a decade, but surely this is the year it will. In this talk, I’ll cover what enterprise-level AI/ML evaluation has looked like for the last decade - what’s changed, what hasn’t, what sells, what doesn’t, and where I see things going from here on out. Evaluation matters - we all know this - but using my experience in the trenches over the last decade or so I hope to bridge the gap between what practitioners need and what the C-suite pays for in the space of AI evaluations. ---related links--- https://x.com/johnpdickerson https://www.linkedin.com/in/john-dickerson/ https://jpdickerson.com/ https://www.mozilla.ai/ Timestamps: 00:00 Introduction to Arthur AI and Mozilla AI 00:46 2025: The Year of Evals 01:15 AI/ML monitoring and evaluation 02:48 The Year of the Agent 03:26 The need for 'evals' wasn't obvious to the C-suite 04:15 Pre-ChatGPT launch 06:06 Venture capitalists' predictions 07:03 Macroeconomic side of things 08:06 OpenAI launching ChatGPT 09:15 2023: The Year of GenAI 09:39 2024: GenAI applications in production 10:22 2025: Scaling and autonomy 11:35 Definition of an agent 12:06 Connecting to downstream business KPIs 14:40 Shift to multi-agent systems monitoring 15:42 Q&A 16:16 Discussion on domain expertise in evaluations 18:13 Discussion on LLMs as judges