← back

Low Level Technicals of LLMs: Daniel Han

54.2K views · Jul 31, 2024 · 172:26 min · Watch on YouTube ↗
Takeaway

Open-model releases routinely ship subtle architecture and tokenizer bugs — learn to read the modeling code yourself and reach for SVD when reasoning about LLM internals.

Summary

  • Daniel Han (Unsloth) walks through bugs he and his brother caught in open model releases: Gemma's exact-vs-approximate GELU mismatch across implementations, Grok's odd 30*tanh(x/30) clamp, NVIDIA Nemotron 340B using squared ReLU vs SwiGLU.
  • Tokenizer correctness is a beast unto itself — different Mistral variants (Mistral/Mixtral) tokenize a space + sun-smiley differently; some are 'wrong because the team forgot to update the fast tokenizer'.
  • Promotes SVD/randomized SVD as the most foundational algorithm engineers should know for ML internals (used in PCA, LoRA, etc.).
  • Discusses 'LoRA learns less and forgets less' — LoRA struggles with new knowledge unless you train all linear layers with proper params, contrary to surface readings.
  • Workshop goal: teach the audience to read freshly released model code (architectures, activations, RoPE variants) and find bugs themselves.
llm-internalsunslothtokenization
Original description
This workshop will be split into 3x one hour blocks:

How to analyze & fix LLMs - how to find and fix bugs in Gemma, Phi-3, Llama & tokenizers
Finetuning with Unsloth - continued pretraining, reward modelling, QLoRA & more
Deep dive into LLM technicals - hand deriving derivatives, SOTA finetuning tricks
It's recommended you have Python with Pytorch and Unsloth installed (or use online Google Colab / Kaggle). College level maths and programming would be helpful.

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Daniel
Hey I'm Daniel, the algos guy behind Unsloth. I love making LLM training go fast! We're the guys who fixed 8 of Google's Gemma bugs, a 2048 SWA Phi-3 issue, found tokenization issues and fixed untrained tokens with Llama-3, and I run Unsloth with my brother Michael!

Our open source package makes finetuning of LLMs 2x faster and uses 70% less VRAM with no accuracy degradation. I used to work at NVIDIA making GPU algos go fast and helped NASA engineers process data from a Mars rover faster!