AI Prompt Engineering · Lesson

Maintaining Context Across Chunks

Overlap, rolling context, and metadata injection for continuity.

The Cross-Chunk Context Problem

When a document is split into chunks, information from chunk N may be needed to correctly interpret chunk N+1. For example: a term defined in chunk 3 is used in chunk 7. Without context bridging, the model processing chunk 7 does not know that definition.

Four strategies address this: overlap, rolling summary injection, metadata tags, and page number references.

Strategy 1: Token Overlap

Overlap repeats the last N tokens of chunk N at the start of chunk N+1. This ensures that a sentence or argument spanning a boundary appears fully in at least one chunk.

Typical overlap: 100–200 tokens (roughly 75–150 words). Too much overlap increases redundancy and cost; too little leaves boundary gaps.

import tiktoken

enc = tiktoken.get_encoding('cl100k_base')

def chunk_with_overlap(text, max_tokens=1000, overlap=200):
    tokens = enc.encode(text)
    chunks = []
    step = max_tokens - overlap
    i = 0
    while i < len(tokens):
        chunk = tokens[i:i + max_tokens]
        chunks.append(enc.decode(chunk))
        i += step
    return chunks

chunks = chunk_with_overlap(document, max_tokens=1000, overlap=200)
print(f'{len(chunks)} chunks with 200-token overlap')

All lessons in this course

← Back to AI Prompt Engineering