π― Fine-Tuning
Adapting pre-trained models β full SFT, LoRA/QLoRA, DPO, preference tuning. When fine-tuning beats prompting + RAG.
The workflow
flowchart LR
A[Base model] --> B{Method}
B -->|Full SFT| C[All weights<br/>high compute]
B -->|LoRA / QLoRA| D[Adapter layers<br/>parameter-efficient]
B -->|DPO / RLHF| E[Preference pairs<br/>align behavior]
C --> F[Eval on held-out]
D --> F
E --> F
F --> G[Serve adapter<br/>or merged]
Fine-tune only after exhausting prompting + RAG. LoRA wins on cost; DPO wins on alignment.
Key takeaways
Videos (20)
[Full Workshop] Reinforcement Learning, Kernels, Reasoning, Quantization & Agents β Daniel Han
Unsloth's open-source stack lets you fine-tune, quantize and RL-train modern LLMs on free or consumer GPUs with bug-fixed open-source models.
Everything you need to know about Fine-tuning and Merging LLMs: Maxime Labonne
Fine-tune only when prompt engineering plateaus, then layer SFT + DPO + LoRA/QLoRA and consider model merging to combine specialized variants cheaply.
Finetuning: 500m AI agents in production with 2 engineers β Mustafa Ali & Kyle Corbitt
At high volume, fine-tuned small open-source models β fed by production signals via something like OpenPipe β beat frontier APIs on cost, latency and even error rate.
Training Agentic Reasoners β Will Brown, Prime Intellect
Agentic reasoning isn't a separate research thread β it's the same RL-on-tool-use scaling recipe that powers o3 and DeepSeek, and it's becoming accessible outside frontier labs.
Agent Reinforcement Fine Tuning β Will Hang & Cathy Zhou, OpenAI
Agent RFT lets you train reasoning models end-to-end on your real tools and reward function β the next lever after prompt and task optimization.
Training an LLM from Scratch, Locally β Angelos Perivolaropoulos, ElevenLabs
A small GPT-2-style model can be trained from scratch on a laptop, and post-training β not architecture β is where modern frontier gains come from.
RFT, DPO, SFT: Fine-tuning with OpenAI β Ilan Bigio, OpenAI
Pick fine-tuning by what the model needs to learn: imitation (SFT), preference deltas (DPO), or reasoning behaviour through a grader (RFT).
Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan
Pick adapter tuning for new domains, prefix tuning for behavior shaping, and LoRA/QLoRA when compute/memory is the binding constraint.
Let LLMs Wander: Engineering RL Environments β Stefano Fiorucci
RL environments built with libraries like Verifiers let small models discover strategies beyond what SFT examples can teach.
Lessons from Trillion Token Deployments at Fortune 500s β Alessandro Cappelli, Adaptive ML
Treat reinforcement learning as the missing 'last mile' for enterprise agents β it gives you ownership, latency-friendly small models, and a synthetic-data flywheel that SFT and prompting can't.
OpenThoughts: Data Recipes for Reasoning Models β Ryan Marten, Bespoke Labs
Post-training reasoning data recipes invert pre-training intuition: fewer high-quality synthetic sources, multiple samples per question, and weaker-but-clearer teachers outperform.
Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han
Open-weight model releases routinely ship with subtle bugs (BOS, GELU variants, sliding windows) that silently degrade fine-tunes β verify before training.
Training Albatross An Expert Finance LLM: Leo Pekelis
Building finance-expert LLMs requires both domain pretraining (with membership-inference-based data curation) and long-context extension because finance work demands both depth and document length.
The GenAI Maturity Curve or You Probably Don't Need Fine Tuning: Kyle Corbitt
Fine-tune only after prompting, RAG and evals have plateaued β premature fine-tuning bakes in bad specs and loses you flexibility.
Text-to-Speech Data Preparation and Fine-tuning Workshop - Ronan McGovern
You can fine-tune a token-based TTS model like Sesame CSM-1B to mimic a target voice with ~50 30-second YouTube clips and a free Colab GPU.
Fine tune 20 Llama Models in 5 Minutes: Santosh Radha
With Python decorators, Covalent lets you fan out fine-tuning and deploy auto-scaling inference endpoints across heterogeneous GPUs without touching Docker or Kubernetes.
LLM Quality Optimization Bootcamp: Thierry Moreau and Pedro Torruella
Fine-tune only after prompt-eng and RAG plateau; small Llama 3 + OpenPipe can deliver 47% accuracy gains and 200x cost cuts on narrow tasks like PII redaction.
Using AI to Build an Infinite Game: Jeff Schomay
Cheap fine-tuning ($1-2) can replace long prompts and produce stylistically consistent generative game content at runtime.
No-code fine-tuning: Mark Hennings
Fine-tuning a smaller model wins on speed, cost (~90%), and prompt-injection resistance for bounded tasks β but it needs to be no-code to reach non-dev teams.
Insights from Snorkel AI running Azure AI Infrastructure: Humza Iqbal and Lachlan Ainley
Enterprise fine-tuning needs SME-in-the-loop programmatic data development and domain-specific benchmarks β and PyTorch+Horovod+NFS on Azure scales it from one node to dozens.