Caching Prompts and Results (Anthropic, Vertex)
Anthropic prompt caching and Vertex caching cut input cost by 10x on long, repeated system prompts.
Why Cache?
Many agent calls are nearly identical: same system prompt, same few-shot examples, different user input. Without caching, you re-process the static parts every call.
Caching can cut input token cost by 10x.
Two Kinds of Caching
- Prompt caching (server-side) — provider caches the model's KV state for repeated prefixes
- Result caching (client-side) — your code caches full responses for identical inputs
All lessons in this course
- Token Budgets Per Step
- Model Routing (Cheap -> Expensive)
- Caching Prompts and Results (Anthropic, Vertex)
- Quantisation and Speculative Decoding