← back
Domain adaptation and fine-tuning for domain-specific LLMs: Abi Aryan
Takeaway
Pick adapter tuning for new domains, prefix tuning for behavior shaping, and LoRA/QLoRA when compute/memory is the binding constraint.
Summary
- Abi Aryan provides a literature survey on domain adaptation for LLMs covering why (under-represented domains, compliance-restricted data, cross-domain transfer via embeddings) and when fine-tuning beats RAG/prompting.
- Distinguishes full-weight fine-tuning (expensive, like teacher-student distillation 2016-2018 era) from partial-weight methods: adapter tuning, prefix tuning, parameter-efficient (LoRA, QLoRA), and instruction tuning.
- Adapter tuning adds small parameter modules to a frozen transformer — original paper matched full FT with only 0.15% extra params; best for learning a wholly new domain like biochemical engineering.
- Prefix tuning adds an embedding layer in front of the attention layer that mimics behavior (analogy: water tank vs. tap shape) without changing model weights.
- LoRA/QLoRA exploit linearly-dependent rows/columns in weight matrices to compress updates — ideal for running large models on laptops, edge devices, or AR hardware.
fine-tuninglorapeft
Original description
In this talk, we will talk about the different model adaptation methods from Prompt Engineering to RAGs to fine-tuning methods depending on the dataset and problem. We will also go into detail on some operational best practices for fine-tuning and how to evaluate them for specific business use-cases. Furthermore, we will conclude with a comparative framework, cost-benefit analysis benefits and tradeoffs of fine-tuning versus knowledge bases for improving the performance of large language models for a specific task. Recorded live in San Francisco at the AI Engineer Summit 2023. See the full schedule of talks at https://ai.engineer/summit/schedule & join us at the AI Engineer World's Fair in 2024! Get your tickets today at https://ai.engineer/worlds-fair About Abi Aryan Hi, my name is Abi. I am a computer scientist working extensively in machine learning to make the software systems smarter. Over the past seven years, my focus has been building machine learning systems for various applications including recommender systems, automated data labelling pipelines for both audio and video, audio-speech synthesis, forecasting and time-series analysis etc. In the past, I also attended Insight as a Data Science Fellow and was a Visiting Research Scholar at UCLA under Dr. Judea Pearl where I worked in AutoML, MultiAgent Systems and Emotion Recognition. I am also currently authoring LLMOps: Managing Large Language Models in Production book for O'Reilley Publications and an MLOps: Deploying ML models in production course for data scientists to learn fundamentals of data engineering and how to deploy machine learning models in production.