← back
Your realtime AI is ngmi — Sean DuBois (OpenAI), Kwindla Kramer (Daily)
Takeaway
For sub-second voice agents, use WebRTC for edge audio and reserve websockets for server-to-server and small structured data.
Summary
- Sean DuBois (OpenAI Realtime API/Pion WebRTC) and Kwindla Kramer (Daily/Pipecat) argue voicetovoice latency >1s dooms voice agents — humans expect ~500ms response.
- Use WebRTC, not websockets, for edge-to-cloud audio: TCP/websockets guarantee in-order delivery → audio glitches, disconnects, latency on 10-15% of real-world networks; WebRTC drops late packets via jitter buffer.
- WebRTC also handles resampling, packetization, bandwidth estimation, observability stats out of the box — websockets need all that built by hand.
- Same WebRTC tech powers Zoom, WhatsApp, Discord, Facebook Messenger, telesurgery, and teleoperated vehicles.
- Voice is the next platform shift — analog to 'late 2007 with the first iPhone before pull-to-refresh was invented'; OpenAI's real-time API exposes both options.
voicewebrtcpipecat
Original description
Sean DuBois of OpenAI and Pion, and Kwindla Hultman Kramer of Daily and Pipecat, will talk about why you have to design realtime AI systems from the network layer up. Most people who build realtime AI apps and frameworks get it wrong. They build from either the model out or the app layer down. But unless you start with the network layer and build up, you'll never be able to deliver realtime audio and video streams reliably. And perhaps even worse, you'll get core primitives wrong: interruption handling, conversation state management, asynchronous function calling. Sean and Kwin agree on most things: old-school realtime systems people against the rest of the world. But they disagree on some important things, too, and will argue about those things live on stage. Do you need to give developers "thick" client-side realtime SDKs? Can you build truly great vendor neutral APIs? (You'll be surprised which of them argues which side, on that topic.) About Kwindla Kramer Kwin works on large-scale WebRTC infrastructure at Daily. He is the originator of Pipecat, the widely used, open source, vendor neutral voice agent framework supported by NVIDIA, Google, AWS and used by hundreds of startups. Before co-fonding Daily, Kwin built the sci-fi user interfaces in Minority Report and Iron Man. About Sean DuBois Sean works on WebRTC and the Realtime API at OpenAI. He built 1-800-CHATGPT. He is the founder of Pion, the most widely used open source WebRTC project. He has previously worked at AWS, LiveKit, Apple, and Etsy. Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter 00:00 [Voice Keynote] Your realtime AI is ngmi — Sean DuBois (OpenAI), Kwindla Kramer (Daily) 01:29 Introduction to Voice AI and Latency 02:46 Latency Breakdown in a Voice AI Application 03:27 WebRTC vs. WebSockets for Real-Time Audio 06:41 Advantages of WebRTC 07:49 Applications of WebRTC 08:52 Future of Voice AI and User Interfaces 09:59 Squabbert Demo 12:44 Flexibility of WebRTC Connections 13:09 Community Showcase: Yashin's Project 15:46 Call to Action and Resources