Short-Term Memory in the Context Window
Store conversation history in the messages array — the simplest memory there is, and why it has hard limits.
Memory Is Just a List
The "memory" of a chat-style LLM is the messages list you send on every call. The model is stateless — it knows nothing between requests.
"Short-term memory" = whatever is in that list right now.
Why It Works
You append each user message and each assistant response. By the time you ask the third question, the model sees:
[
{'role': 'system', 'content': 'You are helpful.'},
{'role': 'user', 'content': 'My name is Alice.'},
{'role': 'assistant', 'content': 'Nice to meet you, Alice.'},
{'role': 'user', 'content': 'What is my name?'}
]
# Model replies: 'Alice'All lessons in this course
- Short-Term Memory in the Context Window
- Why Long Contexts Don't Scale
- Summarisation as Compression
- Simple Memory Stores (Key-Value)