RFT, DPO, SFT: Fine-tuning with OpenAI — Ilan Bigio, OpenAI

16.9K views · Jun 23, 2025 · 106:14 min · Watch on YouTube ↗

Takeaway

Pick fine-tuning by what the model needs to learn: imitation (SFT), preference deltas (DPO), or reasoning behaviour through a grader (RFT).

Summary

Ilan Bigio (OpenAI DevEx) frames an LLM app as input + model + scaffolding, then focuses on optimizing the model via three fine-tuning offerings: SFT, DPO and RFT.
SFT (supervised fine-tuning): show input/output pairs; the model learns a soft mapping — best for classification, formatting, structured extraction and distillation into smaller models.
DPO (direct preference optimization): paired preferred/non-preferred outputs; the model learns the delta vector between them, useful for stylistic and subjective preferences.
RFT (reinforcement fine-tuning): the same algorithm behind o1/o3/o4-mini reasoning models — provide inputs plus a grader (optionally with reference answers) and the model tunes its chain-of-thought to maximize the grader's score, pushing beyond what prompting can achieve.
Recommends prompting first ('low barrier hammer') and reserving fine-tuning ('CNC machine') for cases that justify the upfront data curation and multi-hour iteration loop.

fine-tuningopenairft

Original description

Full workshop covering all forms of fine-tuning and prompt engineering, like SFT, DPO, RFT, prompt engineering / optimization, and agent scaffolding.

About Ilan Bigio
Ilan Bigio is a founding member of OpenAI’s Developer Experience team where he explores model capabilities, builds demos and developer tools, and shares his learnings through talks and docs.

His work includes creating the AI phone ordering demo showcased at DevDay 2024, leading technical development for Swarm, the precursor to the Agents SDK, and contributing to Codex CLI. Prior to that, he was a Solutions Architect at OpenAI, partnering with companies like Cursor, Khan Academy, and Klarna to shape their AI products. Before OpenAI, he was a full-stack Software Engineer at Google, building for YouTube at scale.

Ilan’s journey started as a hobby hacker, diving into operating systems and reverse engineering, before shifting to language models in 2020. He created projects like ShellAI—an open-source, AI-powered terminal assistant—and is passionate about sharing knowledge. With a multidisciplinary background spanning web development, AI/ML, and operating systems, he’s designed and taught courses at Brown and continues to share his expertise through in-depth technical OpenAI guides on topics like Function Calling, Latency Optimization, and Agent Orchestration.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter