← back
State Space Models for Realtime Multimodal Intelligence: Karan Goel
Takeaway
State-space models offer subquadratic compression-based alternatives to transformers for real-time, on-device multimodal inference.
Summary
- Cartesia builds state-space models (Mamba lineage) for streaming real-time intelligence — conversational voice, on-device assistants, world generation, robotics.
- Argues transformer quadratic scaling burns data-center compute on long multimodal context; humans process ~1B text + 10B audio + 1T video tokens/year on brain-scale power.
- SSMs compress information as it arrives rather than retrieve over all stored context — better suited to streaming workloads.
- Vision: intelligence cheap and fast enough to be everywhere (phones, robots), not just behind cloud APIs.
state-space-modelsmambareal-time
Original description
What are the big breakthroughs required to bring realtime multimodal intelligence to every device in the world? This talk describes the work we're doing at Cartesia on bringing realtime models to life on an entirely new technology stack. I'll describe new research ideas that we developed over the last few years — state space models — that are enabling us to build audio models that are cheaper, faster and higher quality than state of the art approaches. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025 About Karan Karan is the Founder / CEO of Cartesia.ai where he builds multimodal models that can be run in real-time on any device. Before founding Cartesia, Karan pursued his PhD from Stanford, where he spent a few years developing the first state space models, building data systems, and researching new methods for robust machine learning. Karan is a recipient of the Siebel Scholarship, graduated from IIT-Delhi and CMU, and is passionate about machine learning, engineering and developer tools.