Thinking Deeper in Gemini — Jack Rae, Google DeepMind

30.2K views · Jul 10, 2025 · 18:12 min · Watch on YouTube ↗

Takeaway

Variable test-time compute via 'thinking' is the current bottleneck-breaker for LLM intelligence, the same way attention broke RNNs.

Summary

Jack Rae (DeepMind tech lead for Gemini thinking) frames history as bottleneck-solving: Shannon's 2-gram → trillion-token n-grams → RNNs → attention/transformers → today's bottleneck is fixed test-time compute.
'Thinking' lets Gemini spend variable test-time compute via internal chain-of-thought before answering, unblocking reasoning the same way attention unblocked long-context.
Each historical inflection (RNN→attention) came from identifying the precise bottleneck (fixed-size hidden state) and engineering around it.
Discusses developer-facing implications: when thinking matters, latency/cost trade-offs, and where reasoning yields the largest gains.
Previews next steps for reasoning models in the Gemini family.

reasoninggeminitest-time-compute

Original description

Progress towards general intelligence has been marked by identifying fundamental intelligence bottlenecks within existing models and developing solutions that improve the architecture or training objective. From this perspective, we discuss our work on Thinking in Gemini as a solution to a bottleneck in test-time compute. We will discuss recent progress in Thinking both from the benefit of capability and steerability, and discuss where our models are headed.

About Jack Rae
Lead of Gemini Thinking, co-lead of Gemini Pre-training

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter