A Practical Guide to Efficient AI: Shelby Heinecke

2.9K views · Nov 18, 2024 · 17:45 min · Watch on YouTube ↗

Takeaway

Production AI economics are won on five efficiency axes — and starting with a small model plus quantization beats brute-forcing a frontier giant.

Summary

Salesforce AI research lead Heinecke maps five orthogonal efficiency axes: architecture choice (small models, MoE, attention), efficient pretraining (mixed precision), efficient finetuning (LoRA, QLoRA), efficient inference (post-training quantization, speculative decoding), and prompt compression.
Argues 'small LLMs are coming back' — <=13B parameter models consume less RAM/GPU/disk, lower latency from fewer weights in forward pass, and enable mobile/laptop/edge deployment (Apple's on-device LLMs cited).
Salesforce context: 10 years of deploying AI, 300+ patents, 227 research papers, six ethics councils; shipping Coen, Service GPT, Einstein GPT at Fortune-500 scale where cost-to-serve matters.
Practical takeaways: pick small models first, apply post-training quantization, prefer parameter-efficient fine-tuning over full FT.

small-modelsquantizationefficient-finetuning

Original description

In the past years, we’ve witnessed a whirlwind of AI breakthroughs powered by extremely large and resource-demanding models. And now, faced with actually deploying these models at scale, AI engineers and builders are left to pick up the pieces on how to improve latency and resource consumption practically. In parallel, the on-device AI movement is heating up, imposing even more physical constraints on AI model deployment. In this talk, we will first identify key sources of inefficiency in AI models. Then, we will discuss techniques and practical tools to improve efficiency, from model architecture selection, to quantization, to prompt optimization.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Shelby
Dr. Shelby Heinecke leads an AI research team at Salesforce. Shelby’s team develops cutting-edge AI for Salesforce products and academic research. Her team's work spans AI agents, LLMs, on-device AI, entity resolution, recommendation systems, and beyond. Shelby earned her Ph.D. in Mathematics from University of Illinois at Chicago, specializing in machine learning theory. She also holds an M.S. in Mathematics from Northwestern and a B.S. in Mathematics from MIT. Website: www.shelbyh.ai