← back
Teaching Gemini to Speak YouTube: Adapting LLMs for Video Recommendations to 2B+DAU - Devansh Tandon
Takeaway
Adapting a frontier LLM with semantically-quantized video tokens delivered novel cold-start recommendations at YouTube scale once serving cost was driven down 95%+.
Summary
- YouTube built LRM (Large Recommender Model) by adapting Gemini for recommendations across home, watch next, shorts, and personalized search
- Semantic ID tokenizes each video via RQ-VAE over multimodal features (title, transcript, audio/video frames), creating a hierarchical 'language' of YouTube videos (presented at RecSys)
- Continued pre-training links text↔SID and masked-watch-sequence prediction, enabling cross-modal reasoning (e.g. inferring 'AI video interesting to tech fans')
- Generative retrieval surfaces unique recs especially for cold-start users; achieved 95%+ TPU serving cost cut to deploy at YouTube scale, plus unpersonalized offline rec tables
recommendationsgeminisemantic-id
Original description
YouTube recommendations drive the majority of video watch time for billions of daily users. Traditionally powered by large embedding models (LEMs), we're undertaking a fundamental shift: rebuilding our recommendation stack using foundation models like Gemini. This talk dives into our engineering journey adapting general-purpose LLMs (Gemini) for the highly specialized, dynamic, and massive-scale task of YouTube recommendations. We'll discuss: SemanticID: creating a "language" for YouTube videos, from our paper last year – Better Generalization with Semantic IDs: A Case Study in Ranking for Recommendations Adapting Gemini checkpoints to understand SemanticID Generative Video Retrieval with prompts There’s a lot of attention on the LLM-led transformation of Search (with AI Overviews, Perplexity, ChatGPT-Search etc). However, across large consumer apps, it’s the recommendation systems & feeds that drive most consumer engagement, not just search. This talk is about the LLM-led transformation of recommendations & feeds – building a recommendation engine on top of Gemini. About Devansh Tandon Devansh Tandon is a Product Manager at Google, leading YouTube’s discovery system and GenAI efforts. At YouTube, Devansh leads a team of research scientists and ML engineers to develop the recommendation engine, which powers the majority of YouTube watchtime for billions of daily active users. He led Google DeepMind & YouTube partnerships, and has launched GenAI products including video summaries & AI dubbing for YouTube. At DeepMind, Devansh led the development of a new generative recommendation system – adapting Gemini to power YouTube recommendations – from research to scaled consumer launch. Previously, Devansh has led AI teams in Google Search, Google News and Google Ads. He graduated Magna Cum Laude from Yale University, with a BS in Computer Science and Economics. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter