🎯 Fine-Tuning

Adapting pre-trained models — full SFT, LoRA/QLoRA, DPO, preference tuning. When fine-tuning beats prompting + RAG.

20 videos · fine-tuningrlunslothloraagentsreasoning

The workflow

flowchart LR
    A[Base model] --> B{Method}
    B -->|Full SFT| C[All weights<br/>high compute]
    B -->|LoRA / QLoRA| D[Adapter layers<br/>parameter-efficient]
    B -->|DPO / RLHF| E[Preference pairs<br/>align behavior]
    C --> F[Eval on held-out]
    D --> F
    E --> F
    F --> G[Serve adapter<br/>or merged]

Fine-tune only after exhausting prompting + RAG. LoRA wins on cost; DPO wins on alignment.

Key takeaways

Unsloth's open-source stack lets you fine-tune, quantize and RL-train modern LLMs on free or consumer GPUs with bug-fixed open-source models.

Fine-tune only when prompt engineering plateaus, then layer SFT + DPO + LoRA/QLoRA and consider model merging to combine specialized variants cheaply.

At high volume, fine-tuned small open-source models — fed by production signals via something like OpenPipe — beat frontier APIs on cost, latency and even error rate.

Agentic reasoning isn't a separate research thread — it's the same RL-on-tool-use scaling recipe that powers o3 and DeepSeek, and it's becoming accessible outside frontier labs.

Agent RFT lets you train reasoning models end-to-end on your real tools and reward function — the next lever after prompt and task optimization.

A small GPT-2-style model can be trained from scratch on a laptop, and post-training — not architecture — is where modern frontier gains come from.

Videos (20)

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents — Daniel Han

Unsloth's open-source stack lets you fine-tune, quantize and RL-train modern LLMs on free or consumer GPUs with bug-fixed open-source models.

113.9K views · Jul 19, 2025

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

Fine-tune only when prompt engineering plateaus, then layer SFT + DPO + LoRA/QLoRA and consider model merging to combine specialized variants cheaply.

30.3K views · Sep 25, 2024

Finetuning: 500m AI agents in production with 2 engineers — Mustafa Ali & Kyle Corbitt

At high volume, fine-tuned small open-source models — fed by production signals via something like OpenPipe — beat frontier APIs on cost, latency and even error rate.

23.2K views · Apr 12, 2025

Training Agentic Reasoners — Will Brown, Prime Intellect

Agentic reasoning isn't a separate research thread — it's the same RL-on-tool-use scaling recipe that powers o3 and DeepSeek, and it's becoming accessible outside frontier labs.

21.8K views · Jul 07, 2025

Agent Reinforcement Fine Tuning – Will Hang & Cathy Zhou, OpenAI

Agent RFT lets you train reasoning models end-to-end on your real tools and reward function — the next lever after prompt and task optimization.

21.4K views · Dec 09, 2025

Training an LLM from Scratch, Locally — Angelos Perivolaropoulos, ElevenLabs

A small GPT-2-style model can be trained from scratch on a laptop, and post-training — not architecture — is where modern frontier gains come from.

18.6K views · May 04, 2026

RFT, DPO, SFT: Fine-tuning with OpenAI — Ilan Bigio, OpenAI

Pick fine-tuning by what the model needs to learn: imitation (SFT), preference deltas (DPO), or reasoning behaviour through a grader (RFT).

16.9K views · Jun 23, 2025

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

Pick adapter tuning for new domains, prefix tuning for behavior shaping, and LoRA/QLoRA when compute/memory is the binding constraint.

9.5K views · Nov 14, 2023

Let LLMs Wander: Engineering RL Environments — Stefano Fiorucci

RL environments built with libraries like Verifiers let small models discover strategies beyond what SFT examples can teach.

5.8K views · Apr 08, 2026

Lessons from Trillion Token Deployments at Fortune 500s — Alessandro Cappelli, Adaptive ML

Treat reinforcement learning as the missing 'last mile' for enterprise agents — it gives you ownership, latency-friendly small models, and a synthetic-data flywheel that SFT and prompting can't.

4.3K views · May 12, 2026

OpenThoughts: Data Recipes for Reasoning Models — Ryan Marten, Bespoke Labs

Post-training reasoning data recipes invert pre-training intuition: fewer high-quality synthetic sources, multiple samples per question, and weaker-but-clearer teachers outperform.

3.5K views · Jul 19, 2025

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

Open-weight model releases routinely ship with subtle bugs (BOS, GELU variants, sliding windows) that silently degrade fine-tunes — verify before training.

3.3K views · Jul 31, 2024

Training Albatross An Expert Finance LLM: Leo Pekelis

Building finance-expert LLMs requires both domain pretraining (with membership-inference-based data curation) and long-context extension because finance work demands both depth and document length.

1.9K views · Feb 13, 2025

The GenAI Maturity Curve or You Probably Don't Need Fine Tuning: Kyle Corbitt

Fine-tune only after prompting, RAG and evals have plateaued — premature fine-tuning bakes in bad specs and loses you flexibility.

1.6K views · Feb 09, 2025

Text-to-Speech Data Preparation and Fine-tuning Workshop - Ronan McGovern

You can fine-tune a token-based TTS model like Sesame CSM-1B to mimic a target voice with ~50 30-second YouTube clips and a free Colab GPU.

1.5K views · Jun 03, 2025

Fine tune 20 Llama Models in 5 Minutes: Santosh Radha

With Python decorators, Covalent lets you fan out fine-tuning and deploy auto-scaling inference endpoints across heterogeneous GPUs without touching Docker or Kubernetes.

1.3K views · Feb 09, 2025

LLM Quality Optimization Bootcamp: Thierry Moreau and Pedro Torruella

Fine-tune only after prompt-eng and RAG plateau; small Llama 3 + OpenPipe can deliver 47% accuracy gains and 200x cost cuts on narrow tasks like PII redaction.

1.1K views · Feb 08, 2025

Using AI to Build an Infinite Game: Jeff Schomay

Cheap fine-tuning ($1-2) can replace long prompts and produce stylistically consistent generative game content at runtime.

981 views · Feb 01, 2024

No-code fine-tuning: Mark Hennings

Fine-tuning a smaller model wins on speed, cost (~90%), and prompt-injection resistance for bounded tasks — but it needs to be no-code to reach non-dev teams.

608 views · Feb 05, 2025

Insights from Snorkel AI running Azure AI Infrastructure: Humza Iqbal and Lachlan Ainley

Enterprise fine-tuning needs SME-in-the-loop programmatic data development and domain-specific benchmarks — and PyTorch+Horovod+NFS on Azure scales it from one node to dozens.

208 views · Feb 08, 2025