0PricingLogin
AI Agents · Lesson

LoRA and QLoRA for Cost-Efficient Tuning

Train only low-rank adapters on a quantized base — fits a 70B fine-tune on a single GPU.

Why PEFT?

Full fine-tuning of Llama 70B needs 8+ H100s and updates 70 billion parameters. PEFT (Parameter-Efficient Fine-Tuning) updates < 1% of parameters with similar quality. Massive cost savings.

LoRA: Low-Rank Adaptation

LoRA (Hu et al. 2021) trains tiny "adapter" matrices alongside frozen base weights:

  • Add A x B matrices where A is dxr and B is rxd, r is small (8, 16, 64)
  • Forward pass: y = Wx + (BA)x
  • Only train A and B — orders of magnitude fewer parameters

All lessons in this course

  1. When Fine-Tuning Beats Prompting
  2. Data Collection: Trajectories and Trace Replay
  3. LoRA and QLoRA for Cost-Efficient Tuning
  4. Evaluating Tuned Models vs Base
← Back to AI Agents