Batch Processing and Async Execution
OpenAI Batch API, async Python, and concurrent prompt execution.
Why Batch and Async?
Processing thousands of LLM requests sequentially is slow and expensive. Batch processing groups requests for 50% cost reduction. Async execution parallelizes requests to maximize throughput within rate limits. Together they reduce both cost and wall-clock time dramatically.
OpenAI Batch API: 50% Cost Reduction
The OpenAI Batch API processes requests asynchronously in the background (up to 24 hours) at 50% of the normal API price. Ideal for evaluation runs, dataset processing, and non-real-time workloads.
import openai
import json
client = openai.OpenAI(api_key='YOUR_API_KEY')
# Step 1: Create batch input file (JSONL format)
batch_requests = [
{
'custom_id': f'request-{i}',
'method': 'POST',
'url': '/v1/chat/completions',
'body': {
'model': 'gpt-4o-mini',
'messages': [
{'role': 'user', 'content': f'Summarize this document: {doc}'}
],
'max_tokens': 200
}
}
for i, doc in enumerate(['Doc A text...', 'Doc B text...', 'Doc C text...'])
]
# Write to JSONL file
with open('batch_input.jsonl', 'w') as f:
for req in batch_requests:
f.write(json.dumps(req) + '\n')
# Step 2: Upload the file
batch_file = client.files.create(
file=open('batch_input.jsonl', 'rb'),
purpose='batch'
)
print(f'Batch file uploaded: {batch_file.id}')All lessons in this course
- Caching Strategies for Prompts
- Batch Processing and Async Execution
- Load Balancing Across Models
- Monitoring and Alerting for Prompt Pipelines