AI Engineering Academy · Lesson

Buffer and Window Memory

Implement ConversationBufferMemory and ConversationBufferWindowMemory to keep the last N turns in context, and measure how window size affects coherence and cost.

Buffer Memory: Keep Everything

Buffer memory is the simplest strategy: store every message from every turn in a list and include the complete history in every API call. It preserves perfect context — the model can reference anything said at any point. The downside is linear context growth with no bound. For short sessions like a single task completion, buffer memory is perfectly appropriate and the easiest to implement.

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(
    return_messages=True,  # return Message objects, not a string
    memory_key='history'   # key to inject into prompt template
)

# Add messages manually
memory.chat_memory.add_user_message('What is a neural network?')
memory.chat_memory.add_ai_message('A neural network is a system of layers...')

# Load what will be injected
print(memory.load_memory_variables({}))

Integrating Buffer Memory with a Chain

To use buffer memory with LCEL, wire it through RunnableWithMessageHistory or use ConversationChain for the legacy approach. The memory object holds the history and the chain uses a MessagesPlaceholder in the prompt template to inject it. After each invocation, the memory automatically appends the new turn.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.output_parsers import StrOutputParser
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}

def get_history(session_id: str):
    if session_id not in store:
        store[session_id] = InMemoryChatMessageHistory()
    return store[session_id]

chain = (
    ChatPromptTemplate.from_messages([
        ('system', 'You are helpful.'),
        MessagesPlaceholder('history'),
        ('human', '{input}')
    ])
    | ChatOpenAI(model='gpt-4o-mini')
    | StrOutputParser()
)

with_memory = RunnableWithMessageHistory(
    chain, get_history,
    input_messages_key='input',
    history_messages_key='history'
)

All lessons in this course

Why Stateless LLMs Need External Memory
Buffer and Window Memory
Summary Memory and Token-Aware Truncation
Persisting Chat History in Redis and PostgreSQL

← Back to AI Engineering Academy