← back
Low Level Technicals of LLMs: Daniel Han
Takeaway
Open-model releases routinely ship subtle architecture and tokenizer bugs — learn to read the modeling code yourself and reach for SVD when reasoning about LLM internals.
Summary
- Daniel Han (Unsloth) walks through bugs he and his brother caught in open model releases: Gemma's exact-vs-approximate GELU mismatch across implementations, Grok's odd 30*tanh(x/30) clamp, NVIDIA Nemotron 340B using squared ReLU vs SwiGLU.
- Tokenizer correctness is a beast unto itself — different Mistral variants (Mistral/Mixtral) tokenize a space + sun-smiley differently; some are 'wrong because the team forgot to update the fast tokenizer'.
- Promotes SVD/randomized SVD as the most foundational algorithm engineers should know for ML internals (used in PCA, LoRA, etc.).
- Discusses 'LoRA learns less and forgets less' — LoRA struggles with new knowledge unless you train all linear layers with proper params, contrary to surface readings.
- Workshop goal: teach the audience to read freshly released model code (architectures, activations, RoPE variants) and find bugs themselves.
llm-internalsunslothtokenization
Original description
This workshop will be split into 3x one hour blocks: How to analyze & fix LLMs - how to find and fix bugs in Gemma, Phi-3, Llama & tokenizers Finetuning with Unsloth - continued pretraining, reward modelling, QLoRA & more Deep dive into LLM technicals - hand deriving derivatives, SOTA finetuning tricks It's recommended you have Python with Pytorch and Unsloth installed (or use online Google Colab / Kaggle). College level maths and programming would be helpful. Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025 About Daniel Hey I'm Daniel, the algos guy behind Unsloth. I love making LLM training go fast! We're the guys who fixed 8 of Google's Gemma bugs, a 2048 SWA Phi-3 issue, found tokenization issues and fixed untrained tokens with Llama-3, and I run Unsloth with my brother Michael! Our open source package makes finetuning of LLMs 2x faster and uses 70% less VRAM with no accuracy degradation. I used to work at NVIDIA making GPU algos go fast and helped NASA engineers process data from a Mars rover faster!