Maintaining Context Across Chunks
Overlap, rolling context, and metadata injection for continuity.
The Cross-Chunk Context Problem
When a document is split into chunks, information from chunk N may be needed to correctly interpret chunk N+1. For example: a term defined in chunk 3 is used in chunk 7. Without context bridging, the model processing chunk 7 does not know that definition.
Four strategies address this: overlap, rolling summary injection, metadata tags, and page number references.
Strategy 1: Token Overlap
Overlap repeats the last N tokens of chunk N at the start of chunk N+1. This ensures that a sentence or argument spanning a boundary appears fully in at least one chunk.
Typical overlap: 100–200 tokens (roughly 75–150 words). Too much overlap increases redundancy and cost; too little leaves boundary gaps.
import tiktoken
enc = tiktoken.get_encoding('cl100k_base')
def chunk_with_overlap(text, max_tokens=1000, overlap=200):
tokens = enc.encode(text)
chunks = []
step = max_tokens - overlap
i = 0
while i < len(tokens):
chunk = tokens[i:i + max_tokens]
chunks.append(enc.decode(chunk))
i += step
return chunks
chunks = chunk_with_overlap(document, max_tokens=1000, overlap=200)
print(f'{len(chunks)} chunks with 200-token overlap')All lessons in this course
- Chunking Strategies for Long Texts
- Map-Reduce Summarization Pattern
- Hierarchical Summarization
- Maintaining Context Across Chunks