AI Prompt Engineering · Lesson

Context Compression

Trimming context to what matters.

Why Compress Context

Retrieved chunks are noisy: a relevant chunk may be mostly boilerplate with one load-bearing sentence. Context compression trims retrieved text down to what actually answers the query before it reaches the generator.

Benefits compound: lower token cost, reduced latency, fewer distractors, and relief from lost-in-the-middle by shrinking the context the model must traverse.

def compress(query, chunks):
    # Goal: keep only spans that bear on the query,
    # dropping boilerplate, navigation, and off-topic sentences.
    return [extract_relevant(query, c) for c in chunks]

Extractive vs Abstractive

Extractive compression selects verbatim spans (sentences, passages) relevant to the query, preserving exact wording and provenance. Abstractive compression paraphrases or summarizes, achieving higher compression but risking information loss and introduced errors.

For factual RAG with citations, prefer extractive to keep claims traceable to source; use abstractive only when faithfulness can be verified.

def extractive(query, chunk):
    sents = split_sentences(chunk.text)
    scored = [(s, relevance(query, s)) for s in sents]
    kept = [s for s, r in scored if r > TAU]
    return ' '.join(kept) or top1(scored)   # never return empty

All lessons in this course

← Back to AI Prompt Engineering