Fixing bugs in Gemma, Llama, & Phi 3: Daniel Han

3.3K views · Jul 31, 2024 · 17:41 min · Watch on YouTube ↗

Takeaway

Open-weight model releases routinely ship with subtle bugs (BOS, GELU variants, sliding windows) that silently degrade fine-tunes — verify before training.

Summary

Unsloth's Daniel Han documents 8 bugs in Llama 3 alone, including double-BOS tokens that silently degrade fine-tune accuracy and untrained tokens in the Llama 3 base model that break instruct-template fine-tuning.
Gemma bug fixes: activation must be approximate GELU not exact GELU; tokenizer issues.
Phi-3 fixes: sliding window should be 2048 not 2047; QKV matrices must be unfused for proper LoRA fine-tuning.
Unsloth auto-detects and corrects double-BOS and other template issues; provides Colab notebooks for all fixes.

fine-tuningllamaunsloth

Original description

The story behind our 8 bug fixes for Gemma, multiple tokenization fixes for Llama 3, a sliding window bug fix and Mistral-fying Phi-3, and learn about how we analyse and find and fix bugs in open source models. Learn also how we make finetuning 2x faster for all these models

Recorded live in San Francisco at the AI Engineer World's Fair. See the full schedule of talks at https://www.ai.engineer/worldsfair/2024/schedule & join us at the AI Engineer World's Fair in 2025! Get your tickets today at https://ai.engineer/2025

About Daniel
Hey I'm Daniel, the algos guy behind Unsloth. I love making LLM training go fast! We're the guys who fixed 8 of Google's Gemma bugs, a 2048 SWA Phi-3 issue, found tokenization issues and fixed untrained tokens with Llama-3, and I run Unsloth with my brother Michael!

Our open source package makes finetuning of LLMs 2x faster and uses 70% less VRAM with no accuracy degradation. I used to work at NVIDIA making GPU algos go fast and helped NASA engineers process data from a Mars rover faster!