AI Engineering Academy · Lesson

Streaming Output in LangChain

Implement token streaming through LCEL chains so your application displays each word as it arrives rather than waiting for the full response, improving perceived latency.

Why Streaming Matters

Without streaming, users stare at a blank screen while waiting for the LLM to finish generating — which can take 5–30 seconds for long responses. With streaming, tokens appear as they are generated, providing immediate feedback. This dramatically improves perceived responsiveness. LangChain's LCEL propagates streaming through the entire chain automatically when you call .stream().

Basic Streaming with .stream()

Every LCEL chain exposes a .stream() method that returns an iterator of chunks. For a chain ending in StrOutputParser, each chunk is a string fragment. You iterate over the chunks and print or yield them as they arrive. The streaming happens at the HTTP level — each token from the OpenAI API is forwarded through the parser as soon as it arrives.

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

chain = (
    ChatPromptTemplate.from_template('Explain {topic} in detail.')
    | ChatOpenAI(model='gpt-4o-mini')
    | StrOutputParser()
)

# Stream tokens to stdout
for chunk in chain.stream({'topic': 'quantum entanglement'}):
    print(chunk, end='', flush=True)
print()  # final newline

All lessons in this course

LangChain Architecture and Core Abstractions
Building Chains with LCEL
Branching and Parallel Chains
Streaming Output in LangChain

← Back to AI Engineering Academy