AI Engineering Academy · Lesson

Handling Tool Calls in Streamed Responses

Parse streaming responses that contain function call arguments arriving token by token, buffer the JSON fragments, and trigger tool execution only when the call is complete.

Tool Calls Arrive Differently in Streams

When an LLM decides to call a function, the response structure changes. Instead of a content string, the delta contains a tool_calls array. But in a streamed response, the function call arguments arrive token by token as a partial JSON string — you do not receive a complete JSON object in a single chunk. You must buffer these fragments and reassemble the complete JSON before you can parse and execute the tool call.

# In a non-streaming response, tool call is complete:
# choice.message.tool_calls[0].function.arguments = '{"city": "Paris"}'

# In a streaming response, arguments arrive in pieces:
# chunk 1: delta.tool_calls[0].function.arguments = '{'
# chunk 2: delta.tool_calls[0].function.arguments = '"city"'
# chunk 3: delta.tool_calls[0].function.arguments = ': "'
# chunk 4: delta.tool_calls[0].function.arguments = 'Paris'
# chunk 5: delta.tool_calls[0].function.arguments = '"}'
# You must concatenate these before JSON.parse can work

Detecting a Tool Call in the Stream

Check each chunk's finish_reason to know when to expect tool calls. When finish_reason is 'tool_calls', the model has decided to call a function and the stream is ending. When finish_reason is 'stop', the model produced a normal text response. While streaming, check whether chunk.choices[0].delta.tool_calls is non-None to identify tool call argument fragments.

async def detect_stream_type(messages, tools):
    stream = await async_client.chat.completions.create(
        model='gpt-4o-mini',
        messages=messages,
        tools=tools,
        stream=True,
    )
    response_type = 'text'
    async for chunk in stream:
        choice = chunk.choices[0]
        if choice.delta.tool_calls:  # tool call fragment arriving
            response_type = 'tool_call'
        if choice.finish_reason == 'tool_calls':
            print('Model wants to call a function')
        elif choice.finish_reason == 'stop':
            print('Normal text response')
    return response_type

All lessons in this course

Understanding Token Streaming
Consuming Streams with the Python SDK
Streaming in FastAPI with Server-Sent Events
Handling Tool Calls in Streamed Responses

← Back to AI Engineering Academy