← back
Trends Across the AI Frontier — George Cameron, ArtificialAnalysis.ai
Takeaway
There are multiple AI frontiers (intelligence, open-weights, cost, speed) and choosing the right one for your app matters more than always reaching for the smartest model.
Summary
- Artificial Analysis benchmarks 150+ models; current intelligence leaders are o3, o4-mini high, DeepSeek R1, Grok 3 mini reasoning high, Gemini 2.5 Pro, Claude 4 Opus Thinking.
- Reasoning models are an order of magnitude more verbose: GPT-4.1 used 7M tokens to run the intelligence index vs Gemini 2.5 Pro's 130M.
- Latency: GPT-4.1 returns in ~4.7s median vs o4-mini-high ~40s — for a 30-step agent that's 5 minutes vs 30 seconds.
- Open-weights gap is the smallest ever — DeepSeek R1 within a few points of leaders; leading open models all come from China (DeepSeek, Alibaba Qwen 3).
- Cost has fallen >100x at GPT-4-level intelligence since mid-2023; running the index on o3 cost $2,000 vs $4 on 4.1 nano (~500x cheaper).
benchmarksreasoning-modelsfrontier
Original description
The entire AI stack is developing faster than ever - from chips to infrastructure to models. How do you sort the signal from the noise? Artificial Analysis an independent benchmarking and insights company dedicated to helping developers and companies pick the right models and technologies for building applications. This talk will walk through the state of the frontier across the AI stack. About George Cameron CPO of Artificial Analysis About Micah Hill-Smith I'm Micah, co-founder and CEO of Artificial Analysis - an independent AI benchmarking company. We help developers understand AI capabliites and make critical decisions about models and technologies. We publish extensive benchmarking results on our public website (including intelligence, performance, cost and more), and develop reports to inform key strategic decisions. I became obsessed with benchmarking AI models initially as an AI engineer building applications, and have previously spent time as a strategy consultant with McKinsey & Company. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter Timestamps [00:00] Introduction to Artificial Analysis: An overview of the company's work in benchmarking AI models across various modalities and metrics. [01:54] The State of AI Progress: A look at the rapid advancements in AI since the launch of ChatGPT, with a focus on the current leaders in AI intelligence. [04:06] The Reasoning Models Frontier: An exploration of the trade-offs between the enhanced intelligence of reasoning models and their increased latency and cost. [08:25] The Open Weights Frontier: A discussion on the closing intelligence gap between open-weights and proprietary models, with a nod to the significant contributions from China-based AI labs. [10:26] The Cost Frontier: An analysis of the dramatic decrease in the cost of accessing high-level AI intelligence and the implications for application development. [14:09] The Speed Frontier: A look at the remarkable increase in the output speed of AI models and the technological advancements driving this trend. [16:34] The Future of Compute Demand: A concluding perspective on why the demand for compute will likely continue to rise despite efficiency gains, driven by larger models, the quest for greater intelligence, and the rise of AI agents.