Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

83.8K views · Apr 29, 2026 · 20:13 min · Watch on YouTube ↗

Takeaway

Small frontier edge models require architecture-level rethinking around on-device latency, embedding budget, and task-narrow post-training rather than scaling shrinkage.

Summary

Liquid AI's LFM 2.5 architecture uses gated short convolutions plus GQA, beating Gemma 3 sliding-window and Gemma 2.5 gated-DeltaNet on latency and memory in on-device profiling (AMD Ryzen Max+ 395, Galaxy S25 Ultra)
Small models like Gemma 3 270M waste 63% of params on embedding layer due to distillation from larger-vocab teachers; LFM 2.5 keeps embeddings small for more effective reasoning params
Pre-trained 350M LFM 2.5 on 28T tokens (far past Chinchilla-optimal), validated by Roberts et al test-time scaling laws; gains continue at small scale
Doom-looping and cold-start SFT sensitivity are key small-model failure modes; on-policy length-normalized DPO and narrow-task RL training are emphasized

small-modelsarchitectureedge

Original description

A new class of small models is emerging with the ability to reliably follow instructions and call tools while running on-device under 1 GB of memory. In this talk, we'll break down how to post-train frontier small models using the LFM2.5 recipe: on-policy preference alignment, agentic reinforcement learning, and curriculum training with iterative model merging. We'll cover training challenges unique to the 1B scale, like doom loops, capability interference, and how to fix them. The goal is to give you a concrete playbook to fine-tune and deploy small models for your own use cases, from structured data extraction to multi-turn tool use.

Speaker info:
- https://x.com/maximelabonne
- https://www.linkedin.com/in/maxime-labonne/
- https://github.com/mlabonne

Timestamps:
0:00:00 - Start
0:00:14 - Introduction to frontier small models at Liquid AI
0:01:02 - Characteristics: memory-bound, task-specific, latency-sensitive
0:02:20 - Architecture: why large embedding layers are inefficient
0:04:01 - LFM2 architecture: using gated short convolutions for speed
0:06:09 - LFM 2.5 recipe: 28T tokens and post-training stages
0:08:34 - Post-training: SFT, preference alignment, and RL best practices
0:10:43 - Identifying "doom loops" in reasoning models
0:11:34 - Solutions: mitigating loops via preference alignment and RL
0:15:29 - Future focus: using agentic tools to overcome memory limits
0:17:58 - Q&A: real-world applications for small vs. large models