AI Prompt Engineering · Lesson

Context Length and Relevance

Balancing comprehensive context with token limits and relevance.

The Context Window Budget

Every model has a maximum context window — the total number of tokens it can process in one API call. This includes both input (your prompt + history) and output (the model's reply).

Understanding this budget is critical: exceeding it means truncating your prompt or losing output. Wasting it on irrelevant context means the model has less room to reason about what matters.

Context Window Sizes

Different models have different context limits. As of 2025:

GPT-4o: 128,000 tokens
Claude Opus 4.5: 200,000 tokens
Gemini 1.5 Pro: 1,000,000 tokens
GPT-3.5 Turbo: 16,385 tokens

Larger windows let you include more context — but cost more per call. For most tasks 8,000-16,000 tokens is sufficient. Bigger is not always better if it means including irrelevant content.

import tiktoken

def estimate_tokens(text, model='gpt-4o'):
    encoding = tiktoken.encoding_for_model(model)
    return len(encoding.encode(text))

# Quick token budget calculator
models = {
    'GPT-3.5 Turbo':  16385,
    'GPT-4o':        128000,
    'Claude Opus 4.5': 200000,
}

prompt = 'Explain the concept of technical debt in 500 words for a non-technical CEO.'
prompt_tokens = estimate_tokens(prompt)

for model_name, limit in models.items():
    reserved_for_output = 1024
    available = limit - prompt_tokens - reserved_for_output
    print(f'{model_name}: limit={limit:,} | prompt={prompt_tokens} | '
          f'context budget={available:,} tokens')

All lessons in this course

← Back to AI Prompt Engineering