How to Train Your Agent: Building Reliable Agents with RL — Kyle Corbitt, OpenPipe

61.4K views · Jul 19, 2025 · 19:48 min · Watch on YouTube ↗

Takeaway

Always prompt your way to a working agent first; reach for RL only when prompting plateaus, and design rewards carefully to avoid hacks.

Summary

OpenPipe's Kyle Corbitt presents the open-source ART-E case study: an email-inbox question-answering agent trained with reinforcement learning.
Strong recommendation to start with a prompted baseline before any RL — this debugs tools, environment, and gives a meaningful comparison.
RL gave reliability gains over prompting for this multi-step search-and-answer task with tool use (search, read_email, answer).
Concrete lessons on reward design and avoiding reward hacking when training agents with tool calls.

agentsrlopenpipe

Original description

Have you ever launched an awesome agentic demo, only to realize no amount of prompting will make it reliable enough to deploy in production? Agent reliability is a famously difficult problem to solve!

In this talk we’ll learn how to use GRPO to help your agent learn from its successes and failures and improve over time. We’ve seen dramatic results with this technique, such as an email assistant agent that whose success rate jumped from 74% to 94% after replacing o4-mini with an open source model optimized using GRPO.

We’ll share case studies as well as practical lessons learned around the types of problems this works well for and the unexpected pitfalls to avoid.

About Kyle Corbitt
Kyle Corbitt is the co-founder and CEO of OpenPipe, the RL post-training company. OpenPipe has trained thousands of customer models for both enterprises and tech-forward startups.

Before founding OpenPipe, Kyle led the Startup School team at Y Combinator, which was responsible for the product and content that YC produces for early-stage companies. Prior to that he worked as an engineer at Google and studied ML at school.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter

Timestamps:

[00:00] - Introduction to building reliable agents with RL.

[00:49] - Case Study: ART-E, an AI email assistant.

[02:19] - The importance of starting with prompted models before moving to RL.

[03:17] - Performance improvements of RL over prompted models.

[05:18] - Cost and latency benefits of the RL approach.

[08:02] - The two hardest problems in modern RL: realistic environments and reward functions.

[13:13] - Optimizing agent behavior with "extra rewards."

[15:25] - The problem of "reward hacking" and how to address it.

[18:37] - The solution to reward hacking: