← back

Building and Scaling an AI Agent Swarm of low latency real time voice bots: Damien Murphy

3.2K views · Oct 08, 2024 · 67:22 min · Watch on YouTube ↗
Takeaway

Modern voice agents should use unified audio-in/audio-out models on call-center-tuned acoustics for low-latency real-time deployments.

Summary

  • Deepgram's Damien Murphy walks through the evolution from STT+LLM+TTS pipelines to audio-in/audio-out voice agents.
  • Deepgram trains 1000+ custom production models, including ones tuned for 8K μ-law call-center audio that's notoriously hard to transcribe.
  • Live build: a voice-to-voice agent with function calling, a simple backend API, and a frontend to visualize agent state.
  • Closes with patterns for scaling voice agent swarms — concurrency, low-latency inference, and call routing.
voicedeepgramagents
Original description
Multimodality: AI Agents are becoming more powerful at a rapid pace! In this talk you will learn best practices and considerations to think about when building your AI Agent Swarm. All aspects from low latency Speech to Text, Large Language Model RAG and Fine Tuning and Text to Speech. I will share open source code for how to build an AI Agent you can talk to with sub second latency end to end! You will learn how you can scale your AI Agents to handle 1000s or millions of concurrent conversations.

Deepgram Account

https://console.deepgram.com/signup
Requirements

Browser
Microphone
Speaker / Headphones
Development Tools

NodeJS (http-server)
JavaScript, CSS, HTML
Repos

https://github.com/DamienDeepgram/deepgram-workshop-client (Required)
https://github.com/DamienDeepgram/deepgram-workshop-server (Optional)

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Damien
Full stack developer and solutions engineer working with customers, focused on realizing business value from low latency real time AI