← all topics

🧠 Foundation Models

Frontier LLM training, architecture choices, scaling, post-training (SFT/RLHF/DPO), evaluation, releases from OpenAI, Anthropic, Google, Meta, Mistral, etc.

30 videos · foundation-modelsgeminiopen-weightsgemmaopen-modelsbenchmarks

The workflow

flowchart LR
    A[Pre-training<br/>tokens & data mix] --> B[Architecture<br/>transformer variants]
    B --> C[Scaling laws<br/>compute & params]
    C --> D[Post-training<br/>SFT + RLHF + DPO]
    D --> E[Capabilities<br/>reasoning, tools, code]
    E --> F[Evaluations<br/>& release]

Pre-training is most of the cost; post-training is most of the differentiation.

Key takeaways

Frontier capability is rapidly commoditizing across labs and shrinking model sizes, while pricing collapses except for a few overpriced outliers.
Frontier AI progress isn't just larger LLMs—omnimodal embeddings and probabilistic graph nets are quietly setting new SOTA in retrieval and weather forecasting.
MiniMax M2 shows that a small, cheap open model trained on perturbed agent scaffolds with expert-developer reward signals can rival larger closed models for coding and tool-use.
Small frontier edge models require architecture-level rethinking around on-device latency, embedding budget, and task-narrow post-training rather than scaling shrinkage.
Open-model releases routinely ship subtle architecture and tokenizer bugs — learn to read the modeling code yourself and reach for SVD when reasoning about LLM internals.
Robotics' breakthrough is VLA models trained on teleoperated dexterous data at scale, mirroring VLMs with a multi-year lag.

Videos (30)

2025 in LLMs so far, illustrated by Pelicans on Bicycles — Simon Willison

Frontier capability is rapidly commoditizing across labs and shrinking model sizes, while pricing collapses except for a few overpriced outliers.

158.0K views · Jul 09, 2025

How Google DeepMind is researching the next Frontier of AI for Gemini — Raia Hadsell, VP of Research

Frontier AI progress isn't just larger LLMs—omnimodal embeddings and probabilistic graph nets are quietly setting new SOTA in retrieval and weather forecasting.

104.9K views · Apr 18, 2026

Minimax M2: Building the #1 Open Model – Olive Song, MiniMax

MiniMax M2 shows that a small, cheap open model trained on perturbed agent scaffolds with expert-developer reward signals can rival larger closed models for coding and tool-use.

91.2K views · Dec 13, 2025

Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

Small frontier edge models require architecture-level rethinking around on-device latency, embedding budget, and task-narrow post-training rather than scaling shrinkage.

83.8K views · Apr 29, 2026

Low Level Technicals of LLMs: Daniel Han

Open-model releases routinely ship subtle architecture and tokenizer bugs — learn to read the modeling code yourself and reach for SVD when reasoning about LLM internals.

54.2K views · Jul 31, 2024

Robotics: why now? - Quan Vuong and Jost Tobias Springberg, Physical Intelligence

Robotics' breakthrough is VLA models trained on teleoperated dexterous data at scale, mirroring VLMs with a multi-year lag.

42.9K views · Jul 26, 2025

Gemma 4 Deep Dive — Cassidy Hardin, Researcher, Google DeepMind

Gemma 4 brings frontier-tier reasoning, MoE efficiency, and 256K context to fully open Apache 2.0 weights that run on consumer hardware.

33.3K views · Apr 27, 2026

Thinking Deeper in Gemini — Jack Rae, Google DeepMind

Variable test-time compute via 'thinking' is the current bottleneck-breaker for LLM intelligence, the same way attention broke RNNs.

30.2K views · Jul 10, 2025

Jack Morris: Stuffing Context is not Memory, Updating Weights is

For durable, scalable knowledge injection, train it into the weights — context stuffing and RAG are fundamentally bounded by attention's quadratic and context rot.

29.2K views · Dec 29, 2025

Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind

Gemma 4 makes frontier open-model intelligence runnable on phones and single consumer GPUs, with on-device agentic and multimodal use cases.

24.7K views · Apr 20, 2026

What's new from Anthropic and what's next: Alex Albert

Treat 3.5 Sonnet plus Artifacts/Projects as the first product surface designed natively for LLMs, and rebuild experiences around that rather than tacking AI onto existing UIs.

20.4K views · Aug 05, 2024

Building in the Gemini Era – Kat Kampf & Ammaar Reshi, Google DeepMind

Gemini 3 plus Nano Banana Pro inside AI Studio collapses prototype-to-shipped-app and design-to-image workflows into single prompts.

16.8K views · Dec 15, 2025

A Taxonomy for Next-gen Reasoning — Nathan Lambert, Allen Institute (AI2) & Interconnects.ai

Future reasoning models need calibration, strategy, and abstraction — not just more skill on math benchmarks — to power real autonomous applications.

15.8K views · Jul 19, 2025

A year of Gemini progress + what comes next — Logan Kilpatrick, Google DeepMind

Gemini's last year was organizational consolidation plus omnimodal capability; the next year shifts scaffolding into the model itself and pushes proactive, agentic behavior.

15.5K views · Jul 10, 2025

Trends Across the AI Frontier — George Cameron, ArtificialAnalysis.ai

There are multiple AI frontiers (intelligence, open-weights, cost, speed) and choosing the right one for your app matters more than always reaching for the smartest model.

14.0K views · Jul 08, 2025

What Is a Humanoid Foundation Model? An Introduction to GR00T N1 - Annika & Aastha

Humanoid foundation models combine internet video + simulation + scarce teleop data with a fast/slow dual-system architecture to generalize across robot embodiments.

9.3K views · Jul 28, 2025

How LLMs work for Web Devs: GPT in 600 lines of Vanilla JS - Ishan Anand

Web devs can fully internalize transformer mechanics by reading and debugging a single-file GPT-2 implementation in their own browser.

8.2K views · Jul 13, 2025

The Future of Qwen: A Generalist Agent Model — Junyang Lin, Alibaba Qwen

Qwen 3 fuses thinking and non-thinking into one model with a tunable thinking budget, multilingual coverage and MoE architecture aimed at becoming a generalist agent.

8.2K views · Jun 03, 2025

Netflix's Big Bet: One model to rule recommendations: Yesu Feng, Netflix

Netflix is replacing many specialized recommendation models with one transformer foundation model over rich user-event tokens, applying LLM scaling laws to recsys.

8.0K views · Jul 16, 2025

Z.ai GLM 4.6: What We Learned From 100 Million Open Source Downloads — Yuxuan Zhang, Z.ai

GLM 4.6 closes the gap to frontier closed models on math/coding/agents via a multi-stage curriculum culminating in 200K-context agent training, with 100M+ open-source downloads of the series.

5.7K views · Nov 22, 2025

Decoding the Decoder LLM without de code: Ishan Anand

You can fully understand a decoder LLM by stepping through GPT-2 small implemented as a 124M-cell Excel sheet — no code required.

5.0K views · Aug 09, 2024

How Transformers Finally Ate Vision – Isaac Robinson, Roboflow

Pretraining beats inductive bias: ViTs won not because n^4 is good but because Flash Attention plus MAE/DINO pretraining made it scale.

4.9K views · May 08, 2026

AI Engineering with the Google Gemini 2.5 Model Family - Philipp Schmid, Google DeepMind

Gemini 2.5 Flash is free via AI Studio and integrates with MCP, Google Search grounding, and the new Google GenAI SDK out of the box.

4.8K views · Jul 11, 2025

Build & deploy AI-powered apps — Paige Bailey, Google DeepMind

Gemini's multimodal-in/multimodal-out plus tool toggles in AI Studio collapses prototype-to-deployed-app into a single workflow.

4.4K views · Apr 29, 2026

Decoding Mistral AI's Large Language Models: Devendra Chaplot

Mistral's open-weights strategy treats community deployment as a free distribution and feedback channel that fuels paid upgrades, with a deliberate pipeline of dense and MoE releases.

4.2K views · Nov 21, 2024

State Space Models for Realtime Multimodal Intelligence: Karan Goel

State-space models offer subquadratic compression-based alternatives to transformers for real-time, on-device multimodal inference.

4.0K views · Oct 29, 2024

AGI: The Path Forward – Jason Warner & Eiso Kant, Poolside

Poolside is betting on RL-augmented next-token-prediction foundation models trained from scratch for high-consequence coding and long-horizon knowledge work.

3.8K views · Dec 27, 2025

WTF do people use Open Models for??

Open models in production are mostly running creative writing, roleplay, and companionship — and enterprises pin year-old models like Mistral Nemo for stability.

2.4K views · Feb 22, 2025

Unveiling the latest Gemma model advancements: Kathleen Kenealy

Gemma 2 (9B, 27B) ships as a multi-framework, safety-tuned open model competitive with 2-3x larger LLaMA/Grok models, expanding a family that already includes code, recurrent and vision-language variants.

1.9K views · Feb 09, 2025

Building State of the Art Open Weights Tool Use: The Command R Family: Sandra Kublik

Command R+ proved open-weight models can match frontier closed models on tool use and grounded RAG, with structured citation support.

1.9K views · Aug 26, 2024