Handling Tool Calls in Streamed Responses
Parse streaming responses that contain function call arguments arriving token by token, buffer the JSON fragments, and trigger tool execution only when the call is complete.
Tool Calls Arrive Differently in Streams
When an LLM decides to call a function, the response structure changes. Instead of a content string, the delta contains a tool_calls array. But in a streamed response, the function call arguments arrive token by token as a partial JSON string — you do not receive a complete JSON object in a single chunk. You must buffer these fragments and reassemble the complete JSON before you can parse and execute the tool call.
# In a non-streaming response, tool call is complete:
# choice.message.tool_calls[0].function.arguments = '{"city": "Paris"}'
# In a streaming response, arguments arrive in pieces:
# chunk 1: delta.tool_calls[0].function.arguments = '{'
# chunk 2: delta.tool_calls[0].function.arguments = '"city"'
# chunk 3: delta.tool_calls[0].function.arguments = ': "'
# chunk 4: delta.tool_calls[0].function.arguments = 'Paris'
# chunk 5: delta.tool_calls[0].function.arguments = '"}'
# You must concatenate these before JSON.parse can workDetecting a Tool Call in the Stream
Check each chunk's finish_reason to know when to expect tool calls. When finish_reason is 'tool_calls', the model has decided to call a function and the stream is ending. When finish_reason is 'stop', the model produced a normal text response. While streaming, check whether chunk.choices[0].delta.tool_calls is non-None to identify tool call argument fragments.
async def detect_stream_type(messages, tools):
stream = await async_client.chat.completions.create(
model='gpt-4o-mini',
messages=messages,
tools=tools,
stream=True,
)
response_type = 'text'
async for chunk in stream:
choice = chunk.choices[0]
if choice.delta.tool_calls: # tool call fragment arriving
response_type = 'tool_call'
if choice.finish_reason == 'tool_calls':
print('Model wants to call a function')
elif choice.finish_reason == 'stop':
print('Normal text response')
return response_typeAll lessons in this course
- Understanding Token Streaming
- Consuming Streams with the Python SDK
- Streaming in FastAPI with Server-Sent Events
- Handling Tool Calls in Streamed Responses