← back

Your realtime AI is ngmi — Sean DuBois (OpenAI), Kwindla Kramer (Daily)

2.2K views · Jul 31, 2025 · 16:29 min · Watch on YouTube ↗
Takeaway

For sub-second voice agents, use WebRTC for edge audio and reserve websockets for server-to-server and small structured data.

Summary

  • Sean DuBois (OpenAI Realtime API/Pion WebRTC) and Kwindla Kramer (Daily/Pipecat) argue voicetovoice latency >1s dooms voice agents — humans expect ~500ms response.
  • Use WebRTC, not websockets, for edge-to-cloud audio: TCP/websockets guarantee in-order delivery → audio glitches, disconnects, latency on 10-15% of real-world networks; WebRTC drops late packets via jitter buffer.
  • WebRTC also handles resampling, packetization, bandwidth estimation, observability stats out of the box — websockets need all that built by hand.
  • Same WebRTC tech powers Zoom, WhatsApp, Discord, Facebook Messenger, telesurgery, and teleoperated vehicles.
  • Voice is the next platform shift — analog to 'late 2007 with the first iPhone before pull-to-refresh was invented'; OpenAI's real-time API exposes both options.
voicewebrtcpipecat
Original description
Sean DuBois of OpenAI and Pion, and Kwindla Hultman Kramer of Daily and Pipecat, will talk about why you have to design realtime AI systems from the network layer up.

Most people who build realtime AI apps and frameworks get it wrong. They build from either the model out or the app layer down. But unless you start with the network layer and build up, you'll never be able to deliver realtime audio and video streams reliably. And perhaps even worse, you'll get core primitives wrong: interruption handling, conversation state management, asynchronous function calling.

Sean and Kwin agree on most things: old-school realtime systems people against the rest of the world. But they disagree on some important things, too, and will argue about those things live on stage. Do you need to give developers "thick" client-side realtime SDKs? Can you build truly great vendor neutral APIs? (You'll be surprised which of them argues which side, on that topic.)

About Kwindla Kramer
Kwin works on large-scale WebRTC infrastructure at Daily. He is the originator of Pipecat, the widely used, open source, vendor neutral voice agent framework supported by NVIDIA, Google, AWS and used by hundreds of startups. Before co-fonding Daily, Kwin built the sci-fi user interfaces in Minority Report and Iron Man.

About Sean DuBois
Sean works on WebRTC and the Realtime API at OpenAI. He built 1-800-CHATGPT. He is the founder of Pion, the most widely used open source WebRTC project. He has previously worked at AWS, LiveKit, Apple, and Etsy.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter


00:00 [Voice Keynote] Your realtime AI is ngmi — Sean DuBois (OpenAI), Kwindla Kramer (Daily)
01:29 Introduction to Voice AI and Latency
02:46 Latency Breakdown in a Voice AI Application
03:27 WebRTC vs. WebSockets for Real-Time Audio
06:41 Advantages of WebRTC
07:49 Applications of WebRTC
08:52 Future of Voice AI and User Interfaces
09:59 Squabbert Demo
12:44 Flexibility of WebRTC Connections
13:09 Community Showcase: Yashin's Project
15:46 Call to Action and Resources