Strategies for Staying Within Context
Implement sliding window memory, message summarization, and selective context trimming to handle long conversations without exceeding token limits.
The Growing Conversation Problem
Every message exchanged in a chat conversation grows the token count of the next API call. In a long customer support session or a multi-hour coding session, the accumulated conversation history can easily exceed 20,000-50,000 tokens — all of which you must send to the API every single turn, paying for previously processed tokens repeatedly.
Without a context management strategy, your application will either hit the context limit and crash, or you will end up silently truncating messages and the model will lose track of important earlier context. Every production LLM application needs an explicit strategy for managing context growth.
Strategy 1: Sliding Window (Last N Messages)
The simplest strategy is to keep only the last N messages in the context, discarding older ones. This is called a sliding window. It is easy to implement, predictable in cost, and sufficient for many use cases where recent context matters more than older exchanges.
import openai
client = openai.OpenAI()
class SlidingWindowConversation:
def __init__(self, system_prompt, window_size=10):
self.system = system_prompt
self.window_size = window_size
self.history = []
def chat(self, user_message):
self.history.append({'role': 'user', 'content': user_message})
# Keep only the last N messages
windowed = self.history[-self.window_size:]
messages = [{'role': 'system', 'content': self.system}] + windowed
response = client.chat.completions.create(
model='gpt-4o-mini',
messages=messages
)
reply = response.choices[0].message.content
self.history.append({'role': 'assistant', 'content': reply})
print(f'Context: {len(windowed)} messages ({len(self.history)} total)')
return reply
conv = SlidingWindowConversation('You are a helpful assistant.', window_size=6)
print(conv.chat('Hello! My name is Alice.'))
print(conv.chat('What is the capital of France?'))
print(conv.chat('What is my name?')) # Might forget if window is smallAll lessons in this course
- What Is a Token?
- Context Windows: Size and Implications
- Calculating and Predicting API Costs
- Strategies for Staying Within Context