🧠 Foundation Models
Frontier LLM training, architecture choices, scaling, post-training (SFT/RLHF/DPO), evaluation, releases from OpenAI, Anthropic, Google, Meta, Mistral, etc.
The workflow
flowchart LR
A[Pre-training<br/>tokens & data mix] --> B[Architecture<br/>transformer variants]
B --> C[Scaling laws<br/>compute & params]
C --> D[Post-training<br/>SFT + RLHF + DPO]
D --> E[Capabilities<br/>reasoning, tools, code]
E --> F[Evaluations<br/>& release]
Pre-training is most of the cost; post-training is most of the differentiation.
Key takeaways
Videos (30)
2025 in LLMs so far, illustrated by Pelicans on Bicycles — Simon Willison
Frontier capability is rapidly commoditizing across labs and shrinking model sizes, while pricing collapses except for a few overpriced outliers.
How Google DeepMind is researching the next Frontier of AI for Gemini — Raia Hadsell, VP of Research
Frontier AI progress isn't just larger LLMs—omnimodal embeddings and probabilistic graph nets are quietly setting new SOTA in retrieval and weather forecasting.
Minimax M2: Building the #1 Open Model – Olive Song, MiniMax
MiniMax M2 shows that a small, cheap open model trained on perturbed agent scaffolds with expert-developer reward signals can rival larger closed models for coding and tool-use.
Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI
Small frontier edge models require architecture-level rethinking around on-device latency, embedding budget, and task-narrow post-training rather than scaling shrinkage.
Low Level Technicals of LLMs: Daniel Han
Open-model releases routinely ship subtle architecture and tokenizer bugs — learn to read the modeling code yourself and reach for SVD when reasoning about LLM internals.
Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence
Robotics' breakthrough is VLA models trained on teleoperated dexterous data at scale, mirroring VLMs with a multi-year lag.
Gemma 4 Deep Dive — Cassidy Hardin, Researcher, Google DeepMind
Gemma 4 brings frontier-tier reasoning, MoE efficiency, and 256K context to fully open Apache 2.0 weights that run on consumer hardware.
Thinking Deeper in Gemini — Jack Rae, Google DeepMind
Variable test-time compute via 'thinking' is the current bottleneck-breaker for LLM intelligence, the same way attention broke RNNs.
Jack Morris: Stuffing Context is not Memory, Updating Weights is
For durable, scalable knowledge injection, train it into the weights — context stuffing and RAG are fundamentally bounded by attention's quadratic and context rot.
Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind
Gemma 4 makes frontier open-model intelligence runnable on phones and single consumer GPUs, with on-device agentic and multimodal use cases.
What's new from Anthropic and what's next: Alex Albert
Treat 3.5 Sonnet plus Artifacts/Projects as the first product surface designed natively for LLMs, and rebuild experiences around that rather than tacking AI onto existing UIs.
Building in the Gemini Era – Kat Kampf & Ammaar Reshi, Google DeepMind
Gemini 3 plus Nano Banana Pro inside AI Studio collapses prototype-to-shipped-app and design-to-image workflows into single prompts.
A Taxonomy for Next-gen Reasoning — Nathan Lambert, Allen Institute (AI2) & Interconnects.ai
Future reasoning models need calibration, strategy, and abstraction — not just more skill on math benchmarks — to power real autonomous applications.
A year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind
Gemini's last year was organizational consolidation plus omnimodal capability; the next year shifts scaffolding into the model itself and pushes proactive, agentic behavior.
Trends Across the AI Frontier — George Cameron, ArtificialAnalysis.ai
There are multiple AI frontiers (intelligence, open-weights, cost, speed) and choosing the right one for your app matters more than always reaching for the smartest model.
What Is a Humanoid Foundation Model? An Introduction to GR00T N1 - Annika & Aastha
Humanoid foundation models combine internet video + simulation + scarce teleop data with a fast/slow dual-system architecture to generalize across robot embodiments.
How LLMs work for Web Devs: GPT in 600 lines of Vanilla JS - Ishan Anand
Web devs can fully internalize transformer mechanics by reading and debugging a single-file GPT-2 implementation in their own browser.
The Future of Qwen: A Generalist Agent Model — Junyang Lin, Alibaba Qwen
Qwen 3 fuses thinking and non-thinking into one model with a tunable thinking budget, multilingual coverage and MoE architecture aimed at becoming a generalist agent.
Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix
Netflix is replacing many specialized recommendation models with one transformer foundation model over rich user-event tokens, applying LLM scaling laws to recsys.
Z.ai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai
GLM 4.6 closes the gap to frontier closed models on math/coding/agents via a multi-stage curriculum culminating in 200K-context agent training, with 100M+ open-source downloads of the series.
Decoding the Decoder LLM without de code: Ishan Anand
You can fully understand a decoder LLM by stepping through GPT-2 small implemented as a 124M-cell Excel sheet — no code required.
How Transformers Finally Ate Vision – Isaac Robinson, Roboflow
Pretraining beats inductive bias: ViTs won not because n^4 is good but because Flash Attention plus MAE/DINO pretraining made it scale.
AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind
Gemini 2.5 Flash is free via AI Studio and integrates with MCP, Google Search grounding, and the new Google GenAI SDK out of the box.
Build & deploy AI-powered apps — Paige Bailey, Google DeepMind
Gemini's multimodal-in/multimodal-out plus tool toggles in AI Studio collapses prototype-to-deployed-app into a single workflow.
Decoding Mistral AI's Large Language Models: Devendra Chaplot
Mistral's open-weights strategy treats community deployment as a free distribution and feedback channel that fuels paid upgrades, with a deliberate pipeline of dense and MoE releases.
State Space Models for Realtime Multimodal Intelligence: Karan Goel
State-space models offer subquadratic compression-based alternatives to transformers for real-time, on-device multimodal inference.
AGI: The Path Forward – Jason Warner & Eiso Kant, Poolside
Poolside is betting on RL-augmented next-token-prediction foundation models trained from scratch for high-consequence coding and long-horizon knowledge work.
WTF do people use Open Models for??
Open models in production are mostly running creative writing, roleplay, and companionship — and enterprises pin year-old models like Mistral Nemo for stability.
Unveiling the latest Gemma model advancements: Kathleen Kenealy
Gemma 2 (9B, 27B) ships as a multi-framework, safety-tuned open model competitive with 2-3x larger LLaMA/Grok models, expanding a family that already includes code, recurrent and vision-language variants.
Building State of the Art Open Weights Tool Use: The Command R Family: Sandra Kublik
Command R+ proved open-weight models can match frontier closed models on tool use and grounded RAG, with structured citation support.