← back
Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3 — Greg Kamradt, ARC Prize Foundation
Original: Measuring AGI: Interactive Reasoning Benchmarks for ARC-AGI-3 — Greg Kamradt, ARC Prize Foundation
Takeaway
AGI measurement needs interactive game-based benchmarks with hidden test sets so model intelligence can't be confused with memorized training data or developer-injected priors.
Summary
- Greg Kamradt (ARC Prize Foundation president) explains why Claude/Gemini beating Pokemon doesn't signal AGI — agents got stuck for 3 days, hallucinated, and benefited from training-set Pokemon knowledge.
- ARC defines intelligence as 'skill acquisition efficiency' (Chollet); ARC-AGI-2 has 1000+ unique tasks each requiring a skill never re-used elsewhere, validated by 400+ in-person human testers in San Diego.
- Single-turn benchmarks miss interactive intelligence; ARC-AGI-3 will use games as the medium, supplying controlled environments with defined rules and sparse rewards.
- Atari-era game benchmarks failed: dense rewards, inconsistent reporting, no hidden test set, and developers knew the games (developer intelligence leaks into the model).
- Goal: 100 games that neither model nor developer has seen — enables capability claims current benchmarks can't support.
benchmarksagiarc
Original description
ARC Prize Foundation is building the North Star for AGI—rigorous, open benchmarks that track reasoning progress in modern AI. We'll show why static AGI evaluations are useful, but fall short when comparing models to human intelligence. Sneak peak preview of ARC-AGI-3: a dynamic, game-like benchmark launching Q1 '26. About Greg Kamradt Greg Kamradt is President of the ARC Prize Foundation, the ARC‑AGI benchmark series that challenges frontier AI models on out‑of‑distribution reasoning tasks. He has taught thousands of developers to build production AI applications. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter