← all topics

🎯 Fine-Tuning

Adapting pre-trained models β€” full SFT, LoRA/QLoRA, DPO, preference tuning. When fine-tuning beats prompting + RAG.

20 videos Β· fine-tuningrlunslothloraagentsreasoning

The workflow

flowchart LR
    A[Base model] --> B{Method}
    B -->|Full SFT| C[All weights<br/>high compute]
    B -->|LoRA / QLoRA| D[Adapter layers<br/>parameter-efficient]
    B -->|DPO / RLHF| E[Preference pairs<br/>align behavior]
    C --> F[Eval on held-out]
    D --> F
    E --> F
    F --> G[Serve adapter<br/>or merged]

Fine-tune only after exhausting prompting + RAG. LoRA wins on cost; DPO wins on alignment.

Key takeaways

Unsloth's open-source stack lets you fine-tune, quantize and RL-train modern LLMs on free or consumer GPUs with bug-fixed open-source models.
Fine-tune only when prompt engineering plateaus, then layer SFT + DPO + LoRA/QLoRA and consider model merging to combine specialized variants cheaply.
At high volume, fine-tuned small open-source models β€” fed by production signals via something like OpenPipe β€” beat frontier APIs on cost, latency and even error rate.
Agentic reasoning isn't a separate research thread β€” it's the same RL-on-tool-use scaling recipe that powers o3 and DeepSeek, and it's becoming accessible outside frontier labs.
Agent RFT lets you train reasoning models end-to-end on your real tools and reward function β€” the next lever after prompt and task optimization.
A small GPT-2-style model can be trained from scratch on a laptop, and post-training β€” not architecture β€” is where modern frontier gains come from.

Videos (20)

[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents β€” Daniel Han

Unsloth's open-source stack lets you fine-tune, quantize and RL-train modern LLMs on free or consumer GPUs with bug-fixed open-source models.

113.9K views Β· Jul 19, 2025

Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne

Fine-tune only when prompt engineering plateaus, then layer SFT + DPO + LoRA/QLoRA and consider model merging to combine specialized variants cheaply.

30.3K views Β· Sep 25, 2024

Finetuning: 500m AI agents in production with 2 engineers β€” Mustafa Ali & Kyle Corbitt

At high volume, fine-tuned small open-source models β€” fed by production signals via something like OpenPipe β€” beat frontier APIs on cost, latency and even error rate.

23.2K views Β· Apr 12, 2025

Training Agentic Reasoners β€” Will Brown, Prime Intellect

Agentic reasoning isn't a separate research thread β€” it's the same RL-on-tool-use scaling recipe that powers o3 and DeepSeek, and it's becoming accessible outside frontier labs.

21.8K views Β· Jul 07, 2025

Agent Reinforcement Fine Tuning – Will Hang & Cathy Zhou, OpenAI

Agent RFT lets you train reasoning models end-to-end on your real tools and reward function β€” the next lever after prompt and task optimization.

21.4K views Β· Dec 09, 2025

Training an LLM from Scratch, Locally β€” Angelos Perivolaropoulos, ElevenLabs

A small GPT-2-style model can be trained from scratch on a laptop, and post-training β€” not architecture β€” is where modern frontier gains come from.

18.6K views Β· May 04, 2026

RFT, DPO, SFT: Fine-tuning with OpenAI β€” Ilan Bigio, OpenAI

Pick fine-tuning by what the model needs to learn: imitation (SFT), preference deltas (DPO), or reasoning behaviour through a grader (RFT).

16.9K views Β· Jun 23, 2025

Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan

Pick adapter tuning for new domains, prefix tuning for behavior shaping, and LoRA/QLoRA when compute/memory is the binding constraint.

9.5K views Β· Nov 14, 2023

Let LLMs Wander: Engineering RL Environments β€” Stefano Fiorucci

RL environments built with libraries like Verifiers let small models discover strategies beyond what SFT examples can teach.

5.8K views Β· Apr 08, 2026

Lessons from Trillion Token Deployments at Fortune 500s β€” Alessandro Cappelli, Adaptive ML

Treat reinforcement learning as the missing 'last mile' for enterprise agents β€” it gives you ownership, latency-friendly small models, and a synthetic-data flywheel that SFT and prompting can't.

4.3K views Β· May 12, 2026

OpenThoughts: Data Recipes for Reasoning Models β€” Ryan Marten, Bespoke Labs

Post-training reasoning data recipes invert pre-training intuition: fewer high-quality synthetic sources, multiple samples per question, and weaker-but-clearer teachers outperform.

3.5K views Β· Jul 19, 2025

Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

Open-weight model releases routinely ship with subtle bugs (BOS, GELU variants, sliding windows) that silently degrade fine-tunes β€” verify before training.

3.3K views Β· Jul 31, 2024

Training Albatross An Expert Finance LLM: Leo Pekelis

Building finance-expert LLMs requires both domain pretraining (with membership-inference-based data curation) and long-context extension because finance work demands both depth and document length.

1.9K views Β· Feb 13, 2025

The GenAI Maturity Curve or You Probably Don't Need Fine Tuning: Kyle Corbitt

Fine-tune only after prompting, RAG and evals have plateaued β€” premature fine-tuning bakes in bad specs and loses you flexibility.

1.6K views Β· Feb 09, 2025

Text-to-Speech Data Preparation and Fine-tuning Workshop - Ronan McGovern

You can fine-tune a token-based TTS model like Sesame CSM-1B to mimic a target voice with ~50 30-second YouTube clips and a free Colab GPU.

1.5K views Β· Jun 03, 2025

Fine tune 20 Llama Models in 5 Minutes: Santosh Radha

With Python decorators, Covalent lets you fan out fine-tuning and deploy auto-scaling inference endpoints across heterogeneous GPUs without touching Docker or Kubernetes.

1.3K views Β· Feb 09, 2025

LLM Quality Optimization Bootcamp: Thierry Moreau and Pedro Torruella

Fine-tune only after prompt-eng and RAG plateau; small Llama 3 + OpenPipe can deliver 47% accuracy gains and 200x cost cuts on narrow tasks like PII redaction.

1.1K views Β· Feb 08, 2025

Using AI to Build an Infinite Game: Jeff Schomay

Cheap fine-tuning ($1-2) can replace long prompts and produce stylistically consistent generative game content at runtime.

981 views Β· Feb 01, 2024

No-code fine-tuning: Mark Hennings

Fine-tuning a smaller model wins on speed, cost (~90%), and prompt-injection resistance for bounded tasks β€” but it needs to be no-code to reach non-dev teams.

608 views Β· Feb 05, 2025

Insights from Snorkel AI running Azure AI Infrastructure: Humza Iqbal and Lachlan Ainley

Enterprise fine-tuning needs SME-in-the-loop programmatic data development and domain-specific benchmarks β€” and PyTorch+Horovod+NFS on Azure scales it from one node to dozens.

208 views Β· Feb 08, 2025