Low Level Technicals of LLMs: Daniel Han

54.2K views · Jul 31, 2024 · 172:26 min · Watch on YouTube ↗

Takeaway

Open-model releases routinely ship subtle architecture and tokenizer bugs — learn to read the modeling code yourself and reach for SVD when reasoning about LLM internals.

Summary

Daniel Han (Unsloth) walks through bugs he and his brother caught in open model releases: Gemma's exact-vs-approximate GELU mismatch across implementations, Grok's odd 30*tanh(x/30) clamp, NVIDIA Nemotron 340B using squared ReLU vs SwiGLU.
Tokenizer correctness is a beast unto itself — different Mistral variants (Mistral/Mixtral) tokenize a space + sun-smiley differently; some are 'wrong because the team forgot to update the fast tokenizer'.
Promotes SVD/randomized SVD as the most foundational algorithm engineers should know for ML internals (used in PCA, LoRA, etc.).
Discusses 'LoRA learns less and forgets less' — LoRA struggles with new knowledge unless you train all linear layers with proper params, contrary to surface readings.
Workshop goal: teach the audience to read freshly released model code (architectures, activations, RoPE variants) and find bugs themselves.

llm-internalsunslothtokenization

Original description

This workshop will be split into 3x one hour blocks:

How to analyze & fix LLMs - how to find and fix bugs in Gemma, Phi-3, Llama & tokenizers
Finetuning with Unsloth - continued pretraining, reward modelling, QLoRA & more
Deep dive into LLM technicals - hand deriving derivatives, SOTA finetuning tricks
It's recommended you have Python with Pytorch and Unsloth installed (or use online Google Colab / Kaggle). College level maths and programming would be helpful.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Daniel
Hey I'm Daniel, the algos guy behind Unsloth. I love making LLM training go fast! We're the guys who fixed 8 of Google's Gemma bugs, a 2048 SWA Phi-3 issue, found tokenization issues and fixed untrained tokens with Llama-3, and I run Unsloth with my brother Michael!

Our open source package makes finetuning of LLMs 2x faster and uses 70% less VRAM with no accuracy degradation. I used to work at NVIDIA making GPU algos go fast and helped NASA engineers process data from a Mars rover faster!