AI Engineering Academy · Lesson

State Management Across Execution Steps

Persist variables, data frames, and imported libraries across multiple code execution steps so the agent can build on previous results without re-running earlier computations.

The Statelessness Problem

Each Docker container execution starts with a fresh Python interpreter. Variables defined in iteration 1 do not exist in iteration 2. This statelessness forces the agent to re-compute or re-load everything from scratch on every execution step — unless you implement an explicit state management strategy to persist and restore state across iterations. Without it, multi-step analyses are impossible.

# Iteration 1 - works fine
df = pd.read_csv('data.csv')  # df is in memory
df_cleaned = df.dropna()
print('Rows after cleaning:', len(df_cleaned))

# Iteration 2 - NEW container, df_cleaned is GONE
result = df_cleaned.groupby('category').sum()  # NameError: df_cleaned is not defined
print(result)  # This will fail!

File-Based State Persistence

The simplest and most portable approach is to save state to files. At the end of each code block, the agent saves DataFrames, dictionaries, or other objects to files in the workspace directory. The next iteration loads them back. Parquet is ideal for DataFrames, JSON for dictionaries, and pickle for arbitrary Python objects (though pickle from untrusted code is a security risk).

import pandas as pd
import json
from pathlib import Path

WORKSPACE = Path('/workspace')

# Iteration 1: process and SAVE
df = pd.read_csv(WORKSPACE / 'raw_data.csv')
df_cleaned = df.dropna().reset_index(drop=True)
df_cleaned.to_parquet(WORKSPACE / 'cleaned.parquet')  # save for next iteration

stats = {'rows': len(df_cleaned), 'columns': list(df_cleaned.columns)}
with open(WORKSPACE / 'stats.json', 'w') as f:
    json.dump(stats, f)

print('Saved cleaned data:', len(df_cleaned), 'rows')

# Iteration 2: LOAD and continue
df_cleaned = pd.read_parquet(WORKSPACE / 'cleaned.parquet')  # restore state
with open(WORKSPACE / 'stats.json') as f:
    stats = json.load(f)
result = df_cleaned.groupby('category')['value'].sum()
print(result)

All lessons in this course

The Code Execution Loop
Sandboxing with Docker and RestrictedPython
State Management Across Execution Steps
Building a Data Analysis Agent

← Back to AI Engineering Academy