AI Engineering Academy · Lesson

Summary Memory and Token-Aware Truncation

Use ConversationSummaryMemory to automatically summarize older turns, keeping the conversation condensed while preserving key facts the user mentioned earlier.

The Token Cost of Full History

Conversation Buffer Memory keeps every message ever exchanged, which quickly consumes your context window. A 100-turn conversation might use 20,000 tokens just for history, leaving little room for the actual response. Summary Memory solves this by replacing old turns with a compressed summary.

How Summary Memory Works

When total tokens exceed a threshold, ConversationSummaryMemory feeds the oldest conversation turns to an LLM and asks it to summarize the key points. The full turns are discarded and replaced by this compact summary. Future turns accumulate on top of the summary.

Old turns: replaced by summary
Recent turns: kept verbatim
Net result: meaningful compression with minimal information loss

All lessons in this course

Why Stateless LLMs Need External Memory
Buffer and Window Memory
Summary Memory and Token-Aware Truncation
Persisting Chat History in Redis and PostgreSQL

← Back to AI Engineering Academy