Full Workshop: Realtime Voice AI — Mark Backman, Daily

Original: Full Workshop: Realtime Voice AI — Mark Backman, Daily

14.2K views · Aug 03, 2025 · 69:40 min · Watch on YouTube ↗

Takeaway

Pipecat orchestrates pluggable voice agent pipelines and now wraps native speech-to-speech models like Gemini Live for low-latency conversational apps.

Summary

Hands-on Pipecat workshop (Daily's open-source Python framework for real-time voice/multimodal agents, ~13 months old)
Explains the cascaded pipeline (transport→STT→LLM→TTS→transport) and the rise of native speech-to-speech models like Gemini Live that collapse the boxes
Targets ~800ms end-to-end latency benchmark vs ~500ms human reference; Pipecat supports modular service swapping and parallel pipelines for failover mid-conversation
Workshop walks through Gemini bot.py quickstart from daily-co/gemini-pipecat-workshop repo

voicepipecatgemini-live

Original description

Voice AI agents today can conduct natural, human-like conversations and perform a wide variety of tasks: customer support, lead qualification, healthcare patient intake, market research, and more.

Today's best voice agents combine: realtime responsiveness, open-ended conversational intelligence, reliable instruction following, and flexible integration with existing back-end systems.

Learn how to build state of the art voice agents using Pipecat's open source, vendor neutral tooling. You can deploy Pipecat agents to your own infrastructure or to Pipecat Cloud.

Pipecat is used and supported by teams at NVIDIA, AWS, Google DeepMind, OpenAI, and hundreds of other companies.


---related links---

https://x.com/mark_backman
https://www.linkedin.com/in/mark-backman/
https://daily.co