0PricingLogin
AI Agents · Lesson

Caching Prompts and Results (Anthropic, Vertex)

Anthropic prompt caching and Vertex caching cut input cost by 10x on long, repeated system prompts.

Why Cache?

Many agent calls are nearly identical: same system prompt, same few-shot examples, different user input. Without caching, you re-process the static parts every call.

Caching can cut input token cost by 10x.

Two Kinds of Caching

  1. Prompt caching (server-side) — provider caches the model's KV state for repeated prefixes
  2. Result caching (client-side) — your code caches full responses for identical inputs

All lessons in this course

  1. Token Budgets Per Step
  2. Model Routing (Cheap -> Expensive)
  3. Caching Prompts and Results (Anthropic, Vertex)
  4. Quantisation and Speculative Decoding
← Back to AI Agents