Summary Memory and Token-Aware Truncation
Use ConversationSummaryMemory to automatically summarize older turns, keeping the conversation condensed while preserving key facts the user mentioned earlier.
The Token Cost of Full History
Conversation Buffer Memory keeps every message ever exchanged, which quickly consumes your context window. A 100-turn conversation might use 20,000 tokens just for history, leaving little room for the actual response. Summary Memory solves this by replacing old turns with a compressed summary.
How Summary Memory Works
When total tokens exceed a threshold, ConversationSummaryMemory feeds the oldest conversation turns to an LLM and asks it to summarize the key points. The full turns are discarded and replaced by this compact summary. Future turns accumulate on top of the summary.
- Old turns: replaced by summary
- Recent turns: kept verbatim
- Net result: meaningful compression with minimal information loss
All lessons in this course
- Why Stateless LLMs Need External Memory
- Buffer and Window Memory
- Summary Memory and Token-Aware Truncation
- Persisting Chat History in Redis and PostgreSQL