0Pricing
AI Engineering Academy · Lesson

Strategies for Staying Within Context

Implement sliding window memory, message summarization, and selective context trimming to handle long conversations without exceeding token limits.

The Growing Conversation Problem

Every message exchanged in a chat conversation grows the token count of the next API call. In a long customer support session or a multi-hour coding session, the accumulated conversation history can easily exceed 20,000-50,000 tokens — all of which you must send to the API every single turn, paying for previously processed tokens repeatedly.

Without a context management strategy, your application will either hit the context limit and crash, or you will end up silently truncating messages and the model will lose track of important earlier context. Every production LLM application needs an explicit strategy for managing context growth.

Strategy 1: Sliding Window (Last N Messages)

The simplest strategy is to keep only the last N messages in the context, discarding older ones. This is called a sliding window. It is easy to implement, predictable in cost, and sufficient for many use cases where recent context matters more than older exchanges.

import openai

client = openai.OpenAI()

class SlidingWindowConversation:
    def __init__(self, system_prompt, window_size=10):
        self.system = system_prompt
        self.window_size = window_size
        self.history = []

    def chat(self, user_message):
        self.history.append({'role': 'user', 'content': user_message})

        # Keep only the last N messages
        windowed = self.history[-self.window_size:]
        messages = [{'role': 'system', 'content': self.system}] + windowed

        response = client.chat.completions.create(
            model='gpt-4o-mini',
            messages=messages
        )
        reply = response.choices[0].message.content
        self.history.append({'role': 'assistant', 'content': reply})
        print(f'Context: {len(windowed)} messages ({len(self.history)} total)')
        return reply

conv = SlidingWindowConversation('You are a helpful assistant.', window_size=6)
print(conv.chat('Hello! My name is Alice.'))
print(conv.chat('What is the capital of France?'))
print(conv.chat('What is my name?'))  # Might forget if window is small

All lessons in this course

  1. What Is a Token?
  2. Context Windows: Size and Implications
  3. Calculating and Predicting API Costs
  4. Strategies for Staying Within Context
← Back to AI Engineering Academy