Model Routing (Cheap -> Expensive)
Try a small model first, escalate to a frontier model only when the small one fails or low-confidence.
Don't Use Big Models For Easy Tasks
Calling GPT-4o for "is this email spam?" wastes money. Route easy tasks to cheap models, hard tasks to expensive ones.
The Routing Pattern
- Try small/cheap model first
- If output is low-confidence or fails validation, escalate to big model
- Log which path was taken
All lessons in this course
- Token Budgets Per Step
- Model Routing (Cheap -> Expensive)
- Caching Prompts and Results (Anthropic, Vertex)
- Quantisation and Speculative Decoding