Breaking AI's 1-GHz Barrier: Sunny Madra (Groq)

4.0K views · Oct 10, 2024 · 20:11 min · Watch on YouTube ↗

Takeaway

Token-throughput speedups are not just a UX improvement — fast inference will make the LLM the kernel of new computing platforms.

Summary

Frames the moment as analogous to Intel crossing 1 GHz in 1999; Groq increased Llama-3-8B speed >50% between April and June this year alone.
State of the art: 10,000 input tokens/second processing enables tasks like globe.engineer planning a trip in <5 seconds vs. an afternoon of tab-juggling.
Argues LLMs will become a new computing primitive — Karpathy-style 'LLM as OS kernel' diagram with audio, video, file system, browser and code interpreter as peripherals.
With sub-millisecond decisions, applications can do 10,000 complex decisions/sec, enabling true personalization, universal NLP UI and instantaneous responses.
Compares the shift to the Industrial Revolution — bespoke content (one Photoshop a day) becomes mass-produced (1000 Midjourney images/min).

groqinferencethroughput

Original description

It’s been 25 years since Intel broke the 1 Ghz speed barrier for a general purpose microprocessor. What does 1 Ghz AI inference mean? What application capabilities will this enable? How will we achieve it? Groq’s mission is to build the AI computer that will power the next generation of software.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Sunny
Sunny Madra, an experienced entrepreneur, has a track record of successful exits for the startups he founded. These exits include the sales of Autonomic to Ford Xtreme Labs to Pivotal, and most recently Definitive Intelligence to Groq. Currently, Sunny is leading GroqCloud.