Streaming Output in CLI Agents
Printing streamed tokens character-by-character in terminal interfaces.
Why Streaming Matters for CLI Agents
Without streaming, your CLI agent prints nothing until the full LLM response is ready — this can take 5-30 seconds. Users stare at a blank terminal wondering if the program crashed.
With streaming, tokens appear as they are generated, providing immediate feedback and a much better experience.
Enabling Streaming in the OpenAI SDK
Pass stream=True to chat.completions.create(). The call returns a generator instead of a complete response object. Iterate over it to process chunks as they arrive.
import openai
client = openai.OpenAI(api_key='YOUR_API_KEY')
stream = client.chat.completions.create(
model='gpt-4o-mini',
messages=[{'role': 'user', 'content': 'Explain Python generators in 3 sentences.'}],
stream=True # <-- enable streaming
)
# Each chunk arrives as it is generated
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end='', flush=True)
print() # newline after the response is completeAll lessons in this course
- Building Command-Line Agent Interfaces
- Interactive REPL-Style Agents
- Argument Parsing and Help Text
- Streaming Output in CLI Agents