AI Engineering Academy · Lesson

Consuming Streams with the Python SDK

Use the OpenAI async client with async for to consume streaming completions, accumulate the full response, and handle mid-stream errors without losing partial output.

Sync vs Async Streaming Clients

The OpenAI Python SDK provides both a synchronous OpenAI client and an asynchronous AsyncOpenAI client. For command-line scripts and simple applications, the synchronous client is easier to use. For web servers, APIs, and applications that handle multiple concurrent requests, the async client is essential — it does not block the event loop while waiting for tokens, allowing other requests to be served concurrently.

# Synchronous client (simple scripts)
from openai import OpenAI
client = OpenAI()

# Asynchronous client (web servers, concurrent workloads)
from openai import AsyncOpenAI
async_client = AsyncOpenAI()

# The async client has the same API surface as the sync client
# but all methods are coroutines that must be awaited

Async Streaming with AsyncOpenAI

With the AsyncOpenAI client, the streaming call becomes a coroutine. You use async for to iterate over chunks instead of a regular for loop. The event loop can schedule other coroutines between each chunk arrival, enabling your server to handle other requests while waiting for the next token from the LLM — this is the key advantage over synchronous streaming in a web context.

import asyncio
from openai import AsyncOpenAI

async_client = AsyncOpenAI()

async def async_stream_completion(prompt: str) -> str:
    stream = await async_client.chat.completions.create(
        model='gpt-4o-mini',
        messages=[{'role': 'user', 'content': prompt}],
        stream=True,
    )

    full_text = ''
    async for chunk in stream:
        delta = chunk.choices[0].delta.content
        if delta:
            print(delta, end='', flush=True)
            full_text += delta
    print()
    return full_text

# Run the coroutine
asyncio.run(async_stream_completion('Explain what async/await does in Python'))

All lessons in this course

Understanding Token Streaming
Consuming Streams with the Python SDK
Streaming in FastAPI with Server-Sent Events
Handling Tool Calls in Streamed Responses

← Back to AI Engineering Academy