AI Prompt Engineering · Lesson

Batch Processing and Async Execution

OpenAI Batch API, async Python, and concurrent prompt execution.

Why Batch and Async?

Processing thousands of LLM requests sequentially is slow and expensive. Batch processing groups requests for 50% cost reduction. Async execution parallelizes requests to maximize throughput within rate limits. Together they reduce both cost and wall-clock time dramatically.

OpenAI Batch API: 50% Cost Reduction

The OpenAI Batch API processes requests asynchronously in the background (up to 24 hours) at 50% of the normal API price. Ideal for evaluation runs, dataset processing, and non-real-time workloads.

import openai
import json

client = openai.OpenAI(api_key='YOUR_API_KEY')

# Step 1: Create batch input file (JSONL format)
batch_requests = [
    {
        'custom_id': f'request-{i}',
        'method': 'POST',
        'url': '/v1/chat/completions',
        'body': {
            'model': 'gpt-4o-mini',
            'messages': [
                {'role': 'user', 'content': f'Summarize this document: {doc}'}
            ],
            'max_tokens': 200
        }
    }
    for i, doc in enumerate(['Doc A text...', 'Doc B text...', 'Doc C text...'])
]

# Write to JSONL file
with open('batch_input.jsonl', 'w') as f:
    for req in batch_requests:
        f.write(json.dumps(req) + '\n')

# Step 2: Upload the file
batch_file = client.files.create(
    file=open('batch_input.jsonl', 'rb'),
    purpose='batch'
)
print(f'Batch file uploaded: {batch_file.id}')

All lessons in this course

Caching Strategies for Prompts
Batch Processing and Async Execution
Load Balancing Across Models
Monitoring and Alerting for Prompt Pipelines

← Back to AI Prompt Engineering