← back

Milliseconds to Magic: Real-Time Workflows using the Gemini Live API and Pipecat

Original: Milliseconds to Magic: Real‑Time Workflows using the Gemini Live API and Pipecat

2.6K views · Jun 27, 2025 · 21:43 min · Watch on YouTube ↗
Takeaway

Voice AI's hard problems migrate down the stack over time, so build orchestration code now while leaving room for API/model improvements to absorb tomorrow's work.

Summary

  • Joint talk by DeepMind (Gemini Live API) and Daily (Pipecat) maps the voice-AI stack: models -> realtime APIs -> orchestration frameworks -> application code.
  • Speakers estimate everything in the stack is at most ~50% solved; capabilities tend to move downward over time — turn detection moved from app code -> Pipecat -> Gemini Live API.
  • Live demo shows Gemini-driven grocery and reading-list voice agent — includes failure modes ('Segmentation fault added to reading list') and unexpected pleasant behaviors.
  • Hard problems mapped: real-time responsiveness, turn detection, interruption handling, semantic VAD, dynamic UI generation per conversational turn.
  • Application-layer code keeps relearning that models drive the loop in non-traditional ways — pleasant surprises are part of the design.
voicegemini-livepipecat
Original description
The Gemini Live API GA  is now powered by Google's best cost-effective thinking model Gemini 2.5 Flash. We will do a deep dive on the capabilities that the Gemini Live API combined with Pipecat unlock for devs with special focus on session management, turn detection, tool use (including async function calls), proactivity, multilinguality and integration with telephony and other infra. We will demo some of the more innovative capabilities. We will also talk through some customer use cases - especially how customers can use Pipecat to extend these realtime multimodal capabilities to client side applications such as customer support agents, gaming agents, tutoring agents etc. In addition, we also have an experimental version of the Live API powered by with Google's native audio offering that can be tried in an experimental capacity . This experimental model  can communicate with seamless, emotive, steerable, multilingual dialogue and enhances use cases where more natural voices can be a big differentiator.

About Kwindla Kramer
Kwin works on large-scale WebRTC infrastructure at Daily. He is the originator of Pipecat, the widely used, open source, vendor neutral voice agent framework supported by NVIDIA, Google, AWS and used by hundreds of startups. Before co-fonding Daily, Kwin built the sci-fi user interfaces in Minority Report and Iron Man.

About Shrestha Basu Mallick
Shrestha Basu Mallick is Group Product Manager and product lead for Gemini API at Google DeepMind. Prior to this, Shrestha led product development for AI assistance across all Google coding surfaces. Shrestha’s first role in Alphabet was at X, the Moonshot Factory, as Head of Product for a materials discovery platform that has since graduated to become its own startup. Before Google, Shrestha has had various roles in product and strategy at Salesforce Einstein, McKinsey, and Docusign. Shrestha holds a PhD in Applied Physics from Stanford.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter