Context Length and Relevance
Balancing comprehensive context with token limits and relevance.
The Context Window Budget
Every model has a maximum context window — the total number of tokens it can process in one API call. This includes both input (your prompt + history) and output (the model's reply).
Understanding this budget is critical: exceeding it means truncating your prompt or losing output. Wasting it on irrelevant context means the model has less room to reason about what matters.
Context Window Sizes
Different models have different context limits. As of 2025:
- GPT-4o: 128,000 tokens
- Claude Opus 4.5: 200,000 tokens
- Gemini 1.5 Pro: 1,000,000 tokens
- GPT-3.5 Turbo: 16,385 tokens
Larger windows let you include more context — but cost more per call. For most tasks 8,000-16,000 tokens is sufficient. Bigger is not always better if it means including irrelevant content.
import tiktoken
def estimate_tokens(text, model='gpt-4o'):
encoding = tiktoken.encoding_for_model(model)
return len(encoding.encode(text))
# Quick token budget calculator
models = {
'GPT-3.5 Turbo': 16385,
'GPT-4o': 128000,
'Claude Opus 4.5': 200000,
}
prompt = 'Explain the concept of technical debt in 500 words for a non-technical CEO.'
prompt_tokens = estimate_tokens(prompt)
for model_name, limit in models.items():
reserved_for_output = 1024
available = limit - prompt_tokens - reserved_for_output
print(f'{model_name}: limit={limit:,} | prompt={prompt_tokens} | '
f'context budget={available:,} tokens')All lessons in this course
- What Context Means in AI Prompting
- Providing Background Information
- Setting the Scene Effectively
- Context Length and Relevance