Can you prove AI ROI in Software Eng? (Stanford 120k Devs Study) – Yegor Denisov-Blanch, Stanford

34.5K views · Dec 11, 2025 · 16:40 min · Watch on YouTube ↗

Takeaway

AI productivity gains in software are real but median (~10%) and bimodal — top adopters pull away while heavy token-spenders without practices regress.

Summary

Stanford studied 120k devs across 46 AI-using teams matched against 46 non-AI teams using a panel-of-experts ML model trained on millions of code-commit evaluations across implementation time, maintainability, and complexity.
Median net productivity gain from AI is ~10% as of July 2025, but variance is widening — top performers compound while bottom performers stall, suggesting a 'rich gets richer' divergence.
Token spend correlates only loosely with productivity (r≈0.20) and shows a 'death valley' around 10M tokens/engineer/month where heavy users underperform moderate users.
Without measuring cohort placement, leaders can't course-correct on AI investment.

productivityevalsai-business

Original description

You’re investing millions in AI for software engineering. Can you prove it’s paying off?

Benchmarks show models can write code, but in enterprise deployments ROI is hard to measure, easy to bias, and often distorted by activity metrics (PR counts, DORA) that say “more” without proving “better.”

Drawing on field data from 120k+ developers across 600+ companies, I’ll show exactly where AI helps the most and how to measure the ROI of your software engineering AI deployment.

We’ll unpack why identical tools deliver ~0% lift in some orgs and 25%+ in others.

You’ll leave with a step-by-step ROI playbook: what to track, the traps to avoid, and the habits top-quartile teams use to make the most from AI.

Speaker: Yegor Denisov-Blanch  |  Researcher, Stanford
https://x.com/yegordb
https://www.linkedin.com/in/ydenisov/

Timestamps

00:00 Introduction & Methodology: ML Panels of Experts 
00:21 The Research Approach: Time Series & Cross-Sectional Data 
01:38 Four Key Topics Overview 
02:01 Case Study: 10% Productivity Gain & The Widening Gap 
03:16 Factors Driving Performance: Usage vs. Quality 
04:02 The Environment Cleanliness Index 
05:30 Managing Codebase Entropy & AI Trust 
06:17 AI Engineering Practices Benchmark & Fingerprinting 
07:38 Case Study: Unequal Adoption Across Business Units 
08:31 Challenges in Measuring AI ROI via Business Outcomes 
10:28 Proposed Measurement Framework: Usage & Outcomes 
11:59 Metric Framework: Primary Output vs. Guardrails 
12:54 Case Study: AI Adoption's Negative Impact on Quality 
14:04 Rework, Refactoring, and Effective Output Analysis 
15:43 Conclusion & Call for Research Participation