← back

2025 is the Year of Evals! Just like 2024, and 2023, and … — John Dickerson, CEO Mozilla AI

5.0K views · Aug 06, 2025 · 19:13 min · Watch on YouTube ↗
Takeaway

Agentic systems taking real actions are finally forcing evaluation tooling out of CIO-only sales and into board-level enterprise budgets in 2025.

Summary

  • John Dickerson (CEO Mozilla AI, ex-Arthur AI) argues 2025 is finally the year of evals because three forces converge: ChatGPT making AI legible to CFOs/CEOs, post-2022 enterprise budget unfreeze for genAI pet projects, and agentic systems now acting autonomously.
  • Pre-ChatGPT, ML monitoring (H2O, Algorithmia, Seldon, Fiddler, Arize, Arthur, WhyLabs, Galileo) had only lip-service ROI tied to KPIs and sold mainly into CIOs.
  • Now evaluation is mandatory because agents take real actions; players like Braintrust, Arize, Galileo, Arthur are seeing hockey-stick growth.
  • Frames monitoring and evaluation as two sides of the same sword: you can't observe without measurement.
evalsobservabilityenterprise-ai
Original description
AI is getting deployed without guardrails, without governance, without due diligence.  Surely this is the year we’ll see a Fortune 500 CEO fired because of a preventable AI incident.  Surely this is the year we’ll see enterprises wake up to pre-deployment evaluation and post-deployment monitoring being an urgent need.  This story hasn’t changed for a decade, but surely this is the year it will.

In this talk, I’ll cover what enterprise-level AI/ML evaluation has looked like for the last decade - what’s changed, what hasn’t, what sells, what doesn’t, and where I see things going from here on out.  Evaluation matters - we all know this - but using my experience in the trenches over the last decade or so I hope to bridge the gap between what practitioners need and what the C-suite pays for in the space of AI evaluations.



---related links---

https://x.com/johnpdickerson
https://www.linkedin.com/in/john-dickerson/
https://jpdickerson.com/
https://www.mozilla.ai/

Timestamps:

00:00 Introduction to Arthur AI and Mozilla AI
00:46 2025: The Year of Evals
01:15 AI/ML monitoring and evaluation
02:48 The Year of the Agent
03:26 The need for 'evals' wasn't obvious to the C-suite
04:15 Pre-ChatGPT launch
06:06 Venture capitalists' predictions
07:03 Macroeconomic side of things
08:06 OpenAI launching ChatGPT
09:15 2023: The Year of GenAI
09:39 2024: GenAI applications in production
10:22 2025: Scaling and autonomy
11:35 Definition of an agent
12:06 Connecting to downstream business KPIs
14:40 Shift to multi-agent systems monitoring
15:42 Q&A
16:16 Discussion on domain expertise in evaluations
18:13 Discussion on LLMs as judges