0Pricing
AI Agents · Lesson

Token Budgets Per Step

Cap input and output tokens per node so a runaway loop can't bankrupt you.

Tokens Are Money

Every token you send or receive costs money and time. Production agents track token usage at every step and enforce caps.

Cap max_tokens Everywhere

Always set max_tokens on every call. Without it, a buggy prompt can produce 10,000 token responses:

response = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=messages,
    max_tokens=1024,   # cap per response
)

All lessons in this course

  1. Token Budgets Per Step
  2. Model Routing (Cheap -> Expensive)
  3. Caching Prompts and Results (Anthropic, Vertex)
  4. Quantisation and Speculative Decoding
← Back to AI Agents