← back

Will Agent evaluation via MCP Stabilize Agent Networks? - Ari Heljakka

457 views · Jun 03, 2025 · 14:11 min · Watch on YouTube ↗
Takeaway

Exposing evaluators via MCP lets any agent get scored, explained feedback inline — turning evals from a one-off harness into a continuous stabilization loop for agent networks.

Summary

  • Ari Heljakka (CEO Root Signals) asks whether exposing eval engines via MCP can stabilize agent swarms attempting complex tasks.
  • Just adding an eval stack doesn't help — evaluations themselves must be continuously improved alongside the agents to stay aligned with business requirements.
  • Evaluation landscape covers reality-grounding (left side) and behavioral aspects (right side: goal inference, progress, tool selection); example: hotel reservation agent gets evaluators for policy adherence, output accuracy, and behavior.
  • Demos Root Signals' open-source MCP server: from Cursor's UI you list charges/evaluators, score an agent's output, get numeric score + explanation, and feed that back to drive a stabilization loop.
evalsmcpagent-networks
Original description
Exposing complex AI Evaluation frameworks to AI agents via MCP allows for a new paradigm of agents to self-improve in a controllable manner. Unlike the often unstable straight-forward self-criticism loops, the MCP-accessible evaluation frameworks can provide the persistence layer that stabilizes and standardizes the measure of progress towards plan fulfillment with agents. 

In this talk, we show how MCP-enabled evaluation engine already allows agents to self-improve in a way that is independent of agent architectures and frameworks, and holds promise to become a cornerstone of rigorous agent development.