← back

Why MLX — Prince Canuma, Neywa Labs

4.2K views · May 11, 2026 · 23:09 min · Watch on YouTube ↗
Takeaway

MLX is reaching the point where production-quality vision, voice, and even hundred-billion-parameter chat run fully on-device on Apple Silicon — replacing cloud subscriptions.

Summary

  • Prince Canuma's MLX work (1.5M downloads, 4K+ ported models, day-zero support for new open models like Gemma 4) makes on-device AI on Apple Silicon practical from M1 onward.
  • Personal motivation: his father lost his sight in 2020; built MLX VLM so a phone camera + voice can help him navigate the world without cloud dependency.
  • Demos: real-time RF-DETR object detection on a laptop with internet off; running Gemma 4 26B on iPhone via storage swapping; chatting with VLMs through Gradio fully offline.
  • Audio stack: Marvis TTS (<100ms first audio), STT for dictation/coding (WhisperFlow-style), and full speech-to-speech pipelines available in both Python and Swift for native iOS/macOS apps.
on-deviceapple-siliconmlx
Original description
MLX is an array framework for Apple Silicon, essentially PyTorch for your Mac, and this is a tour of what it can run: real-time vision models that describe the world around you, sub-100ms text-to-speech, speech-to-speech pipelines, omni models that take image and audio together, and video generation from a text prompt on 16GB of VRAM. A recent breakthrough called Turbo Quant cuts KV cache by 4x and gets 1M context running fully on device. The community projects include a native voice app, a robot speaking in real time with a cloned voice, and a system that chains video generations into a coherent story — all without a cloud call.

The underlying argument: the cloud assumption doesn't hold everywhere. Not for someone in Africa on an unreliable connection. Not for a local agent that needs to stay on. Not for a robot that has to hear, see, and respond without phoning home.

Speaker info:
- https://x.com/Prince_Canuma
- https://pl.linkedin.com/in/prince-canuma

Timestamp

0:00 Introduction and motivation for on-device AI
1:13 The origin story: Accessibility and Apple Silicon
2:27 Introduction to the MLX framework
3:30 Vision capabilities: Empowering accessibility
4:15 Omni models: Multimodal input support
5:25 Audio intelligence: Controlling computers via voice
6:33 Speech-to-speech and modular pipelines
7:59 Vision demo: Real-time image analysis
8:56 Background blur and object detection demo
9:31 Large language model demo: Running Gemma 4 locally
11:50 Community projects: Grounded visual reasoning
13:06 Video generation chains on-device
14:33 Native voice application showcase
15:39 Robotics: Real-time voice cloning and interaction
17:14 Q&A: Neural engine usage and CorML
18:18 Q&A: Monitoring performance with Mactop
19:34 Q&A: Available model recommendations
20:15 Q&A: Limitations and performance expectations
20:54 Q&A: Turbo Quant breakthrough and KV cache optimization