Streaming Output in LangChain
Implement token streaming through LCEL chains so your application displays each word as it arrives rather than waiting for the full response, improving perceived latency.
Why Streaming Matters
Without streaming, users stare at a blank screen while waiting for the LLM to finish generating — which can take 5–30 seconds for long responses. With streaming, tokens appear as they are generated, providing immediate feedback. This dramatically improves perceived responsiveness. LangChain's LCEL propagates streaming through the entire chain automatically when you call .stream().
Basic Streaming with .stream()
Every LCEL chain exposes a .stream() method that returns an iterator of chunks. For a chain ending in StrOutputParser, each chunk is a string fragment. You iterate over the chunks and print or yield them as they arrive. The streaming happens at the HTTP level — each token from the OpenAI API is forwarded through the parser as soon as it arrives.
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
chain = (
ChatPromptTemplate.from_template('Explain {topic} in detail.')
| ChatOpenAI(model='gpt-4o-mini')
| StrOutputParser()
)
# Stream tokens to stdout
for chunk in chain.stream({'topic': 'quantum entanglement'}):
print(chunk, end='', flush=True)
print() # final newlineAll lessons in this course
- LangChain Architecture and Core Abstractions
- Building Chains with LCEL
- Branching and Parallel Chains
- Streaming Output in LangChain