AI powered entomology: Lessons from millions of AI code reviews — Tomas Reimers, Graphite

Original: AI powered entomology: Lessons from millions of AI code reviews — Tomas Reimers, Graphite

4.6K views · Jul 22, 2025 · 10:21 min · Watch on YouTube ↗

Takeaway

AI code review only works when comments stay in the 'LLM-can-catch AND human-wants-it' quadrant — measured by downvotes plus whether developers actually act on them.

Summary

Graphite's Diamond reviewer surfaces bugs in PRs by reading code, but quickly hit a frustration ceiling beyond actual logic bugs.
Studied 10k human/LLM comments along two axes: can the LLM catch this kind of bug, and do humans want the comment from an LLM?
Sweet-spot quadrant: bugs, accidentally committed code, performance/security issues, documentation mismatches. Outside it: tribal knowledge (LLM can't see) and style/best-practice nits (humans tolerate from humans but hate from bots).
Measures success with upvote/downvote (now <4% downvotes), plus whether a comment actually led to a code change — proxy for usefulness.
Lessons: model-version upgrades (Sonnet→Opus 4) and context expansion shift the quadrant, requiring continuous re-validation.

code-reviewgraphiteevals

Original description

This talk will explore insights from millions of automated code reviews, revealing trends in bugs, vulnerabilities, and code health that Graphite’s AI code review agent have uncovered. This talk will also provide meta commentary into the types of bugs AI code review agents are great at spotting, and how far the field of AI code review has come in the last year alone.


---related links---