Trade-offs: Latency, Cost, Capability
Open weights save money but cost engineer hours; benchmark before committing.
Three Axes, Pick Two
Every model choice trades off:
- Latency — time per response
- Cost — per million tokens
- Capability — quality on hard tasks
No model is best on all three.
Capability Landscape
| Top tier | Mid tier | Small tier | |
|---|---|---|---|
| Closed | GPT-4o, Claude Sonnet 4.5 | GPT-4o-mini, Haiku | — |
| Open | Llama 3.1 405B | Llama 3.1 70B, Qwen 2.5 72B | Llama 3.1 8B, Phi-3 |
All lessons in this course
- Llama, Mistral and Qwen Overview
- Running Local Models with Ollama and llama.cpp
- Function-Calling Open Models (Hermes, Functionary)
- Trade-offs: Latency, Cost, Capability