AI Agents · Lesson

Streaming Output in CLI Agents

Printing streamed tokens character-by-character in terminal interfaces.

Why Streaming Matters for CLI Agents

Without streaming, your CLI agent prints nothing until the full LLM response is ready — this can take 5-30 seconds. Users stare at a blank terminal wondering if the program crashed.

With streaming, tokens appear as they are generated, providing immediate feedback and a much better experience.

Enabling Streaming in the OpenAI SDK

Pass stream=True to chat.completions.create(). The call returns a generator instead of a complete response object. Iterate over it to process chunks as they arrive.

import openai

client = openai.OpenAI(api_key='YOUR_API_KEY')

stream = client.chat.completions.create(
    model='gpt-4o-mini',
    messages=[{'role': 'user', 'content': 'Explain Python generators in 3 sentences.'}],
    stream=True  # <-- enable streaming
)

# Each chunk arrives as it is generated
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end='', flush=True)

print()  # newline after the response is complete

All lessons in this course

← Back to AI Agents