← back
Thinking Deeper in Gemini — Jack Rae, Google DeepMind
Takeaway
Variable test-time compute via 'thinking' is the current bottleneck-breaker for LLM intelligence, the same way attention broke RNNs.
Summary
- Jack Rae (DeepMind tech lead for Gemini thinking) frames history as bottleneck-solving: Shannon's 2-gram → trillion-token n-grams → RNNs → attention/transformers → today's bottleneck is fixed test-time compute.
- 'Thinking' lets Gemini spend variable test-time compute via internal chain-of-thought before answering, unblocking reasoning the same way attention unblocked long-context.
- Each historical inflection (RNN→attention) came from identifying the precise bottleneck (fixed-size hidden state) and engineering around it.
- Discusses developer-facing implications: when thinking matters, latency/cost trade-offs, and where reasoning yields the largest gains.
- Previews next steps for reasoning models in the Gemini family.
reasoninggeminitest-time-compute
Original description
Progress towards general intelligence has been marked by identifying fundamental intelligence bottlenecks within existing models and developing solutions that improve the architecture or training objective. From this perspective, we discuss our work on Thinking in Gemini as a solution to a bottleneck in test-time compute. We will discuss recent progress in Thinking both from the benefit of capability and steerability, and discuss where our models are headed. About Jack Rae Lead of Gemini Thinking, co-lead of Gemini Pre-training Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter