AI Engineering Academy · Lesson

Batch Processing with Async and Queues

Build an async extraction pipeline using asyncio and a job queue to process thousands of documents in parallel while respecting rate limits and tracking progress.

Why Batch Processing Matters

Processing thousands of documents one at a time is too slow for production. A synchronous loop that calls the OpenAI API sequentially might process 1 document per second, meaning 10,000 documents take nearly 3 hours. Async batch processing can parallelize hundreds of requests simultaneously, reducing total wall-clock time by an order of magnitude.

The asyncio Foundation

Python's asyncio event loop lets you run many I/O-bound tasks concurrently without threads. When one API call is waiting for a network response, the event loop switches to processing another. You write code with async def and await keywords, and the runtime handles the scheduling. This is ideal for LLM calls which spend most of their time waiting for the server.

import asyncio
import instructor
from openai import AsyncOpenAI

async_client = instructor.from_openai(AsyncOpenAI())

async def extract_one(text: str) -> PersonExtract:
    return await async_client.chat.completions.create(
        model='gpt-4o-mini',
        response_model=PersonExtract,
        messages=[{'role': 'user', 'content': text}]
    )

All lessons in this course

Instructor: Typed Extraction with Pydantic
Handling Partial and Missing Data
Batch Processing with Async and Queues
Schema Evolution and Backward Compatibility

← Back to AI Engineering Academy