← back
Agent Reinforcement Fine Tuning – Will Hang & Cathy Zhou, OpenAI
Takeaway
Agent RFT lets you train reasoning models end-to-end on your real tools and reward function — the next lever after prompt and task optimization.
Summary
- OpenAI's Agent RFT extends Reinforcement Fine-Tuning so the model can call user-hosted tools over the public internet during training and receive a custom reward signal from a user endpoint per rollout.
- First time OpenAI lets models interact with the real outside world during training — agent explores many tool-call trajectories per task.
- Built on top of reasoning models; designed for cases where prompt engineering, task simplification, and tool tweaks are exhausted.
- Used internally to train Codex; intended path when squeezing more performance out of frontier reasoning agents.
fine-tuningagentsrl
Original description
Deep dive into OpenAI's approach to reinforcement fine-tuning for code models. https://x.com/willhang_ https://x.com/cathyzhou AIE is coming to London and SF! see dates and sign up to be notified of sponsorships, CFPs, and tickets: https://ai.engineer Timestamps: 00:00 Introduction to Agent RFT & What Defines an Agent 01:45 Hierarchy of Agent Optimization (Prompting - Task Opt - RFT) 02:53 New RFT Features: Public Endpoints & Custom Rewards 03:55 Addressing Domain Shift & Latency via Exploration 05:41 Recommended Workflow: Baseline First 06:54 Case Study: Cognition (Code Editing & Parallelism) 08:53 Case Study: Codto (Deep Research & Tail Latency) 10:33 Case Study: Cosine (Enterprise Code & Strict Grading) 12:50 Case Study: Macco (GPU Kernels & Reward Hacking) 14:46 Four Principles for RFT Success