← back

Agent Reinforcement Fine Tuning – Will Hang & Cathy Zhou, OpenAI

21.4K views · Dec 09, 2025 · 16:55 min · Watch on YouTube ↗
Takeaway

Agent RFT lets you train reasoning models end-to-end on your real tools and reward function — the next lever after prompt and task optimization.

Summary

  • OpenAI's Agent RFT extends Reinforcement Fine-Tuning so the model can call user-hosted tools over the public internet during training and receive a custom reward signal from a user endpoint per rollout.
  • First time OpenAI lets models interact with the real outside world during training — agent explores many tool-call trajectories per task.
  • Built on top of reasoning models; designed for cases where prompt engineering, task simplification, and tool tweaks are exhausted.
  • Used internally to train Codex; intended path when squeezing more performance out of frontier reasoning agents.
fine-tuningagentsrl
Original description
Deep dive into OpenAI's approach to reinforcement fine-tuning for code models.

https://x.com/willhang_
https://x.com/cathyzhou

AIE is coming to London and SF! see dates and sign up to be notified of sponsorships, CFPs, and tickets: https://ai.engineer

Timestamps:

00:00 Introduction to Agent RFT & What Defines an Agent 
01:45 Hierarchy of Agent Optimization (Prompting - Task Opt - RFT) 
02:53 New RFT Features: Public Endpoints & Custom Rewards 
03:55 Addressing Domain Shift & Latency via Exploration 
05:41 Recommended Workflow: Baseline First 
06:54 Case Study: Cognition (Code Editing & Parallelism) 
08:53 Case Study: Codto (Deep Research & Tail Latency) 
10:33 Case Study: Cosine (Enterprise Code & Strict Grading) 
12:50 Case Study: Macco (GPU Kernels & Reward Hacking) 
14:46 Four Principles for RFT Success