Foundry Local: Cutting-Edge AI experiences on device with ONNX Runtime/Olive — Emma Ning, Microsoft

595 views · Jun 27, 2025 · 22:51 min · Watch on YouTube ↗

Takeaway

Foundry Local makes ONNX-optimized on-device LLM inference a one-CLI experience across Windows/macOS and NPU/GPU/CPU silicon for privacy-bound use cases.

Summary

Emma Ning (Microsoft) launches Foundry Local: on-device AI runtime on Windows/macOS leveraging ONNX Runtime (10M downloads/month) and Azure AI Foundry's 1900-model catalog.
Hardware-accelerated variants for CPU, CUDA, integrated GPU, NPU via partnerships with Nvidia, Intel, AMD, Qualcomm.
Demos CLI: foundry model list / cache / run with verbose mode showing ~90 tokens/sec on Qwen-2.5 1.5B; Phi-4-mini slower but higher quality.
Use cases driven by offline access, privacy (legal/patient data), cost (millions of inferences/day for games), latency; SDK enables cross-platform local apps.

on-deviceonnxmicrosoft

Original description

About Emma Ning 
Emma Ning is a Principal PM in the Microsoft AI Framework team, focusing on AI model operationalization and acceleration with ONNX Runtime/Olive for open and interoperable AI. She has more than five years of product experience in search engines taking advantage of machine learning techniques and spent more than six years exploring AI adoption among various businesses. She is passionate about bringing AI solutions to solve business problems as well as enhancing product experience.

Recorded at the AI Engineer World's Fair in San Francisco. Stay up to date on our upcoming events and content by joining our newsletter here: https://www.ai.engineer/newsletter