AI Agents · Lesson

Evaluating Tuned Models vs Base

A/B against the base model on your eval set — sometimes fine-tuning hurts more than helps.

Don't Trust Training Loss

Low training loss does not mean better in production. The model may overfit, lose general capability, or hurt unseen tasks.

Always evaluate the tuned model on a held-out eval set.

Three Must-Have Eval Splits

Train — used in fine-tuning
Validation — used to pick best checkpoint
Test — never seen during training; ONLY for final eval

All lessons in this course

When Fine-Tuning Beats Prompting
Data Collection: Trajectories and Trace Replay
LoRA and QLoRA for Cost-Efficient Tuning
Evaluating Tuned Models vs Base

← Back to AI Agents