Consuming Streams with the Python SDK
Use the OpenAI async client with async for to consume streaming completions, accumulate the full response, and handle mid-stream errors without losing partial output.
Sync vs Async Streaming Clients
The OpenAI Python SDK provides both a synchronous OpenAI client and an asynchronous AsyncOpenAI client. For command-line scripts and simple applications, the synchronous client is easier to use. For web servers, APIs, and applications that handle multiple concurrent requests, the async client is essential — it does not block the event loop while waiting for tokens, allowing other requests to be served concurrently.
# Synchronous client (simple scripts)
from openai import OpenAI
client = OpenAI()
# Asynchronous client (web servers, concurrent workloads)
from openai import AsyncOpenAI
async_client = AsyncOpenAI()
# The async client has the same API surface as the sync client
# but all methods are coroutines that must be awaitedAsync Streaming with AsyncOpenAI
With the AsyncOpenAI client, the streaming call becomes a coroutine. You use async for to iterate over chunks instead of a regular for loop. The event loop can schedule other coroutines between each chunk arrival, enabling your server to handle other requests while waiting for the next token from the LLM — this is the key advantage over synchronous streaming in a web context.
import asyncio
from openai import AsyncOpenAI
async_client = AsyncOpenAI()
async def async_stream_completion(prompt: str) -> str:
stream = await async_client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': prompt}],
stream=True,
)
full_text = ''
async for chunk in stream:
delta = chunk.choices[0].delta.content
if delta:
print(delta, end='', flush=True)
full_text += delta
print()
return full_text
# Run the coroutine
asyncio.run(async_stream_completion('Explain what async/await does in Python'))All lessons in this course
- Understanding Token Streaming
- Consuming Streams with the Python SDK
- Streaming in FastAPI with Server-Sent Events
- Handling Tool Calls in Streamed Responses