AI Engineering Academy · Lesson

Why Stateless LLMs Need External Memory

Understand why each API call starts from scratch, how naive context stuffing leads to token limit explosions, and the design space of memory strategies from simple to complex.

LLMs Have No Memory by Default

Every API call to an LLM is completely independent. The model receives the messages you send in that single request and processes them — nothing more. When you make the next call, the model has no recollection of the previous conversation. It is as if you are speaking to someone with no short-term memory. This statelessness is by design: it makes models easier to scale and deploy, but it shifts the memory burden onto your application.

The Amnesia Problem in Practice

Without memory, a chatbot will fail basic multi-turn tasks. If a user says 'My name is Alice' in turn 1, then asks 'What is my name?' in turn 3, the model will say it does not know. Every turn appears to be a fresh conversation. Users find this deeply frustrating. The solution is for your application to maintain conversation history and include it in every API call.

# Naive stateless approach — model forgets everything
from openai import OpenAI

client = OpenAI()

def chat_stateless(user_message: str) -> str:
    response = client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[  # Only the new message — no history!
            {'role': 'user', 'content': user_message}
        ]
    )
    return response.choices[0].message.content

chat_stateless('My name is Alice.')   # Model: 'Hello Alice!'
chat_stateless('What is my name?')    # Model: 'I do not know your name.'

All lessons in this course

Why Stateless LLMs Need External Memory
Buffer and Window Memory
Summary Memory and Token-Aware Truncation
Persisting Chat History in Redis and PostgreSQL

← Back to AI Engineering Academy