LoRA Fine-Tuning with Hugging Face PEFT
Configure LoRA rank, alpha, and target modules, run supervised fine-tuning with the TRL SFTTrainer, monitor training loss, and save merged and adapter-only checkpoints.
Why LoRA Instead of Full Fine-Tuning?
Full fine-tuning updates every parameter in a model. For a 7-billion parameter model at float32 precision, that requires ~28GB of GPU memory just for the weights, plus optimizer states, gradients, and activations — easily 80-120GB total. LoRA (Low-Rank Adaptation) instead adds a tiny number of trainable parameters (typically 0.1-1% of total) as low-rank matrix pairs that are applied to selected layers, reducing the GPU requirement by 10-50x while achieving comparable results.
# LoRA math intuition:
# Full fine-tuning: update W (large matrix, e.g., 4096 x 4096 = 16.7M parameters)
# LoRA: instead train W = W_0 + A @ B where:
# A has shape (4096, r) - only r*4096 params
# B has shape (r, 4096) - only r*4096 params
# r (rank) is typically 4, 8, or 16 - much smaller than 4096
# Memory comparison for 7B model:
# Full fine-tuning: ~80GB GPU RAM
# LoRA (rank=8): ~12GB GPU RAM - fits on a single A100 or 3090
print('LoRA makes fine-tuning accessible without massive GPU clusters')Installing the Required Libraries
LoRA fine-tuning with Hugging Face requires three libraries: transformers (model loading and tokenization), peft (Parameter-Efficient Fine-Tuning, which implements LoRA), and trl (Transformer Reinforcement Learning, which provides the SFTTrainer for supervised fine-tuning). Together these provide a high-level, production-ready fine-tuning workflow.
# pip install transformers peft trl accelerate bitsandbytes datasets
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer, SFTConfig
from datasets import Dataset
import torch
print('Libraries imported successfully')
print(f'GPU available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
print(f'GPU: {torch.cuda.get_device_name(0)}')
print(f'GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')All lessons in this course
- When Fine-Tuning Beats Prompting
- Preparing a High-Quality Training Dataset
- LoRA Fine-Tuning with Hugging Face PEFT
- Evaluating and Deploying Your Fine-Tuned Model