← back
Why MLX — Prince Canuma, Neywa Labs
Takeaway
MLX is reaching the point where production-quality vision, voice, and even hundred-billion-parameter chat run fully on-device on Apple Silicon — replacing cloud subscriptions.
Summary
- Prince Canuma's MLX work (1.5M downloads, 4K+ ported models, day-zero support for new open models like Gemma 4) makes on-device AI on Apple Silicon practical from M1 onward.
- Personal motivation: his father lost his sight in 2020; built MLX VLM so a phone camera + voice can help him navigate the world without cloud dependency.
- Demos: real-time RF-DETR object detection on a laptop with internet off; running Gemma 4 26B on iPhone via storage swapping; chatting with VLMs through Gradio fully offline.
- Audio stack: Marvis TTS (<100ms first audio), STT for dictation/coding (WhisperFlow-style), and full speech-to-speech pipelines available in both Python and Swift for native iOS/macOS apps.
on-deviceapple-siliconmlx
Original description
MLX is an array framework for Apple Silicon, essentially PyTorch for your Mac, and this is a tour of what it can run: real-time vision models that describe the world around you, sub-100ms text-to-speech, speech-to-speech pipelines, omni models that take image and audio together, and video generation from a text prompt on 16GB of VRAM. A recent breakthrough called Turbo Quant cuts KV cache by 4x and gets 1M context running fully on device. The community projects include a native voice app, a robot speaking in real time with a cloned voice, and a system that chains video generations into a coherent story — all without a cloud call. The underlying argument: the cloud assumption doesn't hold everywhere. Not for someone in Africa on an unreliable connection. Not for a local agent that needs to stay on. Not for a robot that has to hear, see, and respond without phoning home. Speaker info: - https://x.com/Prince_Canuma - https://pl.linkedin.com/in/prince-canuma Timestamp 0:00 Introduction and motivation for on-device AI 1:13 The origin story: Accessibility and Apple Silicon 2:27 Introduction to the MLX framework 3:30 Vision capabilities: Empowering accessibility 4:15 Omni models: Multimodal input support 5:25 Audio intelligence: Controlling computers via voice 6:33 Speech-to-speech and modular pipelines 7:59 Vision demo: Real-time image analysis 8:56 Background blur and object detection demo 9:31 Large language model demo: Running Gemma 4 locally 11:50 Community projects: Grounded visual reasoning 13:06 Video generation chains on-device 14:33 Native voice application showcase 15:39 Robotics: Real-time voice cloning and interaction 17:14 Q&A: Neural engine usage and CorML 18:18 Q&A: Monitoring performance with Mactop 19:34 Q&A: Available model recommendations 20:15 Q&A: Limitations and performance expectations 20:54 Q&A: Turbo Quant breakthrough and KV cache optimization