← back
Building an AI assistant that makes phone calls [Convex Workshop]
Takeaway
Reactive databases like Convex make it straightforward to glue STT, LLM and TTS streams into real-time voice agents that hold actual phone conversations.
Summary
- Tom Redmond (head of DX at Convex) builds 'Floyd', an AI personal assistant inspired by Lloyd from Entourage that takes voice requests and actually places phone calls on the user's behalf.
- Stack: Google Cloud Speech-to-Text (streaming STT), OpenAI GPT-4o (reasoning + text-to-speech streaming), Twilio (phone), Convex (reactive DB for request state).
- Architecture relies on Convex's reactive queries: client transcribes voice → stores a request row → server subscribes to changes (no polling), enriches with user context, then orchestrates the live call loop streaming audio in/out through Twilio + Google + OpenAI.
- Each conversation turn is appended to the database so the client gets live transcript updates 'for free' via one-line useQuery.
- Demo: 'Hey Floyd, call the school, Mara is staying home sick' — Floyd phones a (dev-only redirected) number and conducts the conversation; speaker notes multimodal Gemini audio could collapse the STT step in future.
voiceconvexagents
Original description
In this workshop, we'll use a variety of technologies to build a better AI assistant. This workshop leverages a Convex vector database to establish a knowledge base, Google Cloud speech-to-text, GPT-4 API, text-to-speech, Twilio and audio streaming. By the end, we'll have built an AI assistant that can take basic requests by voice, interpret what to do, make a phone call and interact with a human on the other end, then come back with some action to do (like adding an event to a calendar, or sending you an SMS update!). This project is written entirely in Typescript, with Node, Express, and Next frameworks. To get started, clone this repo: https://github.com/get-convex/ai-world-fair.git To follow along with full functionality, you will need keys for the following services. (Don't worry if you don't have them all! You can still follow along the presentation, and substitute the keys in later, or use entirely different services altogether.) You will also need a free ngrok account if you wish to try it out locally. See /server/.env.template for the required keys: OpenAI required to manage the conversation, and for text-to-speech APIs OPEN_AI_KEY= Google Cloud credential file (often called service_account.json). Required for speech-to-text translation services. OAUTH client only required for integrating calendar & email. Not required for the workshop. GOOGLE_APPLICATION_CREDENTIALS=./service_account.json GOOGLE_OAUTH_CLIENT_ID= Twilio credentials. Required to make and stream phone calls TWILIO_ACCOUNT_SID= TWILIO_AUTH_TOKEN= Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025 About Tom Tom is the Head of Developer Experience at Convex, which means he's the one to blame if your support tickets takes longer than 5 minutes to get a response. He also spends a lot of time talking with developers in order to better understand how to make Convex the fastest, most reliable and easy-to-use development platform on the planet. If you have any ideas, email him! Tom got into software engineering as a kid, and it's a passion that never left. His free time is spent reading programming books (no joke), and playing covers of 90s pop songs on his guitar while singing extremely loudly and beautifully.