AI Agents · Lesson

Why Long Contexts Don't Scale

Cost grows linearly, quality degrades, and 'lost-in-the-middle' makes the model forget content buried in long prompts.

Bigger Is Not Always Better

Modern models have huge context windows — 200k, 1M, even 2M tokens. It is tempting to "just dump everything in" instead of building real memory.

That approach fails in production for three reasons.

Every token in every call costs money. A 100k-token system prompt with 1000 turns = 100M tokens of input, totalling tens of dollars per session.

Prompt caching helps but does not eliminate the cost.