RL for Autonomous Coding — Aakanksha Chowdhery, Reflection.ai

7.3K views · Jul 16, 2025 · 19:27 min · Watch on YouTube ↗

Takeaway

Reinforcement learning with automated verifiers (tests, compilers) is the unlock for autonomous coding because correctness can be checked cheaply at scale.

Summary

Former PaLM/Gemini researcher recaps the journey from 2020 scaling laws through emergent chain-of-thought (PaLM 540B) to RLHF-driven instruction following.
Inference-time scaling techniques: majority voting/self-consistency over many samples, sequential revision with verifiers — both reliable in domains with automated verification.
Coding is the prime RL frontier because unit tests, compilers, and PyTorch act as objective verifiers, unlike open-domain QA where sampling 10k times doesn't pay off.
Pass@k curves on SWE-bench Verified with open DeepSeek hit ~80% by scaling sampling alone.
Training is expensive (tens of millions); inference is cheap, so pushing intelligence at inference time via verifier-driven RL is the path to autonomous-coding super-intelligence.

rlautonomous-codingverifiers

Original description

The models and techniques to build fully autonomous coding agents - not just coding copilots - are already here. In this talk, former Google DeepMind staff research scientist, now CEO of Reflection Misha Laskin will present new research on post-training open weight LLMs for autonomous SWE tasks. He’ll focus on how scaling LLMs with Reinforcement Learning improves the autonomous coding capabilities of LLMs, and provide insight on the technical challenges required to train such systems at scale.

About Aakanksha Chowdhery
Aakanksha Chowdhery is a Research Leader at Reflection AI pushing the frontier of reasoning for coding agents. She is also an adjunct faculty at Stanford. Before her startup journey, she was the technical Lead of 540B PaLM model and lead researcher at Google in pre-training, scaling, and post-training of Large Language Models. She was a lead researcher in Gemini, PaLM-E, MedPaLM, and Pathways project at Google. Prior to joining Google, she led interdisciplinary research initiatives at Microsoft Research and Princeton University across machine learning and distributed systems.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter

Timestamps:
[00:00:00] Introduction to LLMs and Scaling Laws

[00:01:41] Emergent Behavior in LLMs

[00:04:00] Reinforcement Learning from Human Feedback (RLHF)

[00:06:11] Inference-Time Scaling and Verification

[00:10:33] Challenges with Inference-Time Scaling

[00:11:16] The Next Frontier: Reinforcement Learning for Correct Generation

[00:13:20] Challenges in Scaling RL

[00:14:58] Autonomous Coding as a Prime Domain for RL

[00:15:53] Reflection.ai's Mission