AI Agents · Lesson

Caching Prompts and Results (Anthropic, Vertex)

Anthropic prompt caching and Vertex caching cut input cost by 10x on long, repeated system prompts.

Why Cache?

Many agent calls are nearly identical: same system prompt, same few-shot examples, different user input. Without caching, you re-process the static parts every call.

Caching can cut input token cost by 10x.

Two Kinds of Caching

Prompt caching (server-side) — provider caches the model's KV state for repeated prefixes
Result caching (client-side) — your code caches full responses for identical inputs

All lessons in this course

Token Budgets Per Step
Model Routing (Cheap -> Expensive)
Caching Prompts and Results (Anthropic, Vertex)
Quantisation and Speculative Decoding

← Back to AI Agents