The Unreasonable Effectiveness of Prompt Learning – Aparna Dhinakaran, Arize

16.0K views · Dec 23, 2025 · 10:56 min · Watch on YouTube ↗

Takeaway

For most agent teams, eval-driven prompt learning with natural-language feedback beats RL on data efficiency and engineering cost.

Summary

Arize CEO frames 'system prompt learning' (a la Karpathy) as RL's sample-efficient cousin: use natural-language critique as feedback to iterate prompts, not just scalar rewards.
Production coding agents (Claude Code, Cursor, Cline) ship enormous system prompts that are continuously iterated — these prompts are the product.
Tested prompt-learning loops against Claude and Cline coding agents, showing eval-driven prompt mutation can outperform RL-style training for smaller-team agent builders.
Advocates eval-then-mutate-prompt loops where LLM-judge feedback drives the next prompt version.

prompt-engineeringevalsagents

Original description

Your coding agent writes code—but not like your team. RL has boosted base models, but it’s opaque and hard to scale across enterprises. Most agents still rely on brittle, hand-edited system prompts or style guides (e.g., agent.md)—what if your agent learned from your reviews and updated them automatically? In this talk, I’ll show a system-prompt learning loop—RL techniques applied to prompts, not model weights—that continually tunes an agents.md, so the agent learns instructions from your PR's, feedback & evaluations. You’ll leave with a concrete recipe to capture runtime signals, and auto-tune system prompts—applicable to any type of agent you’re building.

Speakers: 
Aparna Dhinakaran  |  Co-founder & CPO, Arize
https://x.com/aparnadhinak
https://www.linkedin.com/in/aparnadhinakaran/