The Code Execution Loop
Design the write-execute-observe loop where the agent generates code, a sandboxed executor runs it, stdout and stderr are captured and fed back as observations, and the agent fixes errors.
Code Agents: Writing and Running Code
A code execution agent is a special type of AI agent that solves problems by writing code, running it, observing the output, and iterating until the task is complete. Unlike agents that only use pre-defined tools, code agents create new computational tools on the fly. This makes them extraordinarily flexible: any task that can be programmed can be attempted by a code agent.
The Write-Execute-Observe Loop
The core of a code execution agent is a loop with three steps: Write — the LLM generates Python code to solve the current step of the task. Execute — the code is run in a sandboxed environment and stdout/stderr are captured. Observe — the execution output is fed back to the LLM as a new observation, which it uses to decide what to write next. This loop continues until the task is complete or an iteration limit is reached.
def code_execution_loop(task: str, max_iterations=10):
messages = [
{'role': 'system', 'content': 'You are a Python coding agent. Write code to solve tasks step by step.'},
{'role': 'user', 'content': task}
]
for i in range(max_iterations):
# WRITE: LLM generates code
response = llm.complete(messages)
code = extract_code_block(response)
if not code:
return response # LLM gave a final answer without code
# EXECUTE: run the code
output, error = execute_safely(code)
# OBSERVE: feed output back
observation = f'Output:\n{output}' if not error else f'Error:\n{error}'
messages.append({'role': 'assistant', 'content': response})
messages.append({'role': 'user', 'content': observation})
return 'Max iterations reached'