Token Budgets Per Step
Cap input and output tokens per node so a runaway loop can't bankrupt you.
Tokens Are Money
Every token you send or receive costs money and time. Production agents track token usage at every step and enforce caps.
Cap max_tokens Everywhere
Always set max_tokens on every call. Without it, a buggy prompt can produce 10,000 token responses:
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=messages,
max_tokens=1024, # cap per response
)