AI Agents · Lesson

LoRA and QLoRA for Cost-Efficient Tuning

Train only low-rank adapters on a quantized base — fits a 70B fine-tune on a single GPU.

Why PEFT?

Full fine-tuning of Llama 70B needs 8+ H100s and updates 70 billion parameters. PEFT (Parameter-Efficient Fine-Tuning) updates < 1% of parameters with similar quality. Massive cost savings.

LoRA: Low-Rank Adaptation

LoRA (Hu et al. 2021) trains tiny "adapter" matrices alongside frozen base weights:

Add A x B matrices where A is dxr and B is rxd, r is small (8, 16, 64)
Forward pass: y = Wx + (BA)x
Only train A and B — orders of magnitude fewer parameters

All lessons in this course

When Fine-Tuning Beats Prompting
Data Collection: Trajectories and Trace Replay
LoRA and QLoRA for Cost-Efficient Tuning
Evaluating Tuned Models vs Base

← Back to AI Agents