AI Engineering Academy · Lesson

LoRA Fine-Tuning with Hugging Face PEFT

Configure LoRA rank, alpha, and target modules, run supervised fine-tuning with the TRL SFTTrainer, monitor training loss, and save merged and adapter-only checkpoints.

Why LoRA Instead of Full Fine-Tuning?

Full fine-tuning updates every parameter in a model. For a 7-billion parameter model at float32 precision, that requires ~28GB of GPU memory just for the weights, plus optimizer states, gradients, and activations — easily 80-120GB total. LoRA (Low-Rank Adaptation) instead adds a tiny number of trainable parameters (typically 0.1-1% of total) as low-rank matrix pairs that are applied to selected layers, reducing the GPU requirement by 10-50x while achieving comparable results.

# LoRA math intuition:
# Full fine-tuning: update W (large matrix, e.g., 4096 x 4096 = 16.7M parameters)
# LoRA: instead train W = W_0 + A @ B where:
#   A has shape (4096, r) - only r*4096 params
#   B has shape (r, 4096) - only r*4096 params
#   r (rank) is typically 4, 8, or 16 - much smaller than 4096

# Memory comparison for 7B model:
# Full fine-tuning: ~80GB GPU RAM
# LoRA (rank=8): ~12GB GPU RAM - fits on a single A100 or 3090
print('LoRA makes fine-tuning accessible without massive GPU clusters')

Installing the Required Libraries

LoRA fine-tuning with Hugging Face requires three libraries: transformers (model loading and tokenization), peft (Parameter-Efficient Fine-Tuning, which implements LoRA), and trl (Transformer Reinforcement Learning, which provides the SFTTrainer for supervised fine-tuning). Together these provide a high-level, production-ready fine-tuning workflow.

# pip install transformers peft trl accelerate bitsandbytes datasets

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig, get_peft_model, TaskType
from trl import SFTTrainer, SFTConfig
from datasets import Dataset
import torch

print('Libraries imported successfully')
print(f'GPU available: {torch.cuda.is_available()}')
if torch.cuda.is_available():
    print(f'GPU: {torch.cuda.get_device_name(0)}')
    print(f'GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.1f} GB')

All lessons in this course

When Fine-Tuning Beats Prompting
Preparing a High-Quality Training Dataset
LoRA Fine-Tuning with Hugging Face PEFT
Evaluating and Deploying Your Fine-Tuned Model

← Back to AI Engineering Academy