Decoding Mistral AI's Large Language Models: Devendra Chaplot

4.2K views · Nov 21, 2024 · 18:16 min · Watch on YouTube ↗

Takeaway

Mistral's open-weights strategy treats community deployment as a free distribution and feedback channel that fuels paid upgrades, with a deliberate pipeline of dense and MoE releases.

Summary

Devendra Chaplot recaps Mistral's release history: Mistral 7B (Sep 2023, first 7B to hit 60% MMLU), Mixtral 8x7B sparse MoE (Dec 2023), Mistral Large flagship (Feb), Mixtral 8x22B (Apr) and Codestral 22B (June).
Frames Mistral's principles: openness (open weights for community), portability (Azure/AWS/GCP/VPC/on-prem licensing), performance/size ratio, customizability via open-source mistral-finetune and mistral-inference libraries plus a fine-tuning API.
Argues open source and profit are complementary, not competitive — open releases drive branding/marketing without an in-house marketing team, accelerate customer acquisition (upgrade path to proprietary models) and teach Mistral how its models get deployed on phones/laptops.
Walks through three LLM training stages: pretraining (next-token prediction over trillions of tokens, $10s-100s of millions per run, often one-shot budget); instruction tuning (100k+ prompt/response pairs with prompt masking, ~100 GPUs for hours/days); RLHF from preference pairs.
Notes hyperparameter selection (e.g. layer counts in LLaMA) is still intuition, not science.

mistralopen-weightsfoundation-models

Original description

In this talk, Devendra Singh Chaplot, Research Scientist at Mistral AI will explore the building blocks and training strategies that power Mistral AI’s large language models. It will feature Mistral AI's open-source models, Mixtral 8x7B and Mixtral 8x22B, which are based on a mixture-of-experts (MoE) architecture and released under the Apache 2.0 license. The presentation will also provide guidance on utilizing Mistral "La Plateforme" API endpoints and offer a preview of upcoming features.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Devendra
Devendra Singh Chaplot is a Research Scientist at Mistral AI working on building the next generation of AI models. Earlier he was a Research Scientist at the Facebook AI Research (FAIR) Lab working at the intersection of Machine Learning, Computer Vision and Robotics. He has led the design of several AI systems which won the CVPR-2019 PointNav and CVPR-2020 ObjectNav, NeurIPS-2022 Rearrangement Habitat Challenges and the Visual-Doom AI Competition 2017. Chaplot is a recipient of the Facebook Fellowship Award and his research has received Best Paper and Best Demo awards at leading AI conferences. His research has also been featured in several popular media outlets such as MIT Technology Review, TechCrunch, Engadget, Popular Science, Kotaku, and Daily Mail. Chaplot received his Ph.D. in Machine Learning from Carnegie Mellon University and Bachelor's degree in Computer Science from IIT Bombay.