Why Long Contexts Don't Scale
Cost grows linearly, quality degrades, and 'lost-in-the-middle' makes the model forget content buried in long prompts.
Bigger Is Not Always Better
Modern models have huge context windows — 200k, 1M, even 2M tokens. It is tempting to "just dump everything in" instead of building real memory.
That approach fails in production for three reasons.
Reason 1: Cost
Every token in every call costs money. A 100k-token system prompt with 1000 turns = 100M tokens of input, totalling tens of dollars per session.
Prompt caching helps but does not eliminate the cost.
All lessons in this course
- Short-Term Memory in the Context Window
- Why Long Contexts Don't Scale
- Summarisation as Compression
- Simple Memory Stores (Key-Value)