ollama - 💡(How to fix) Fix qwen3-coder streaming: multiple tool_calls all emitted with index=0; finish

When streaming a chat completion from qwen3-coder:30b with many tools present (~20+), Ollama's qwen3coder parser emits every tool_call delta with index: 0 instead of incrementing 0, 1, 2, …, and closes the response with finish_reason: "stop" instead of "tool_calls". Per the OpenAI streaming spec, distinct tool calls must use distinct index values so the client can route delta chunks. Clients (including GitHub Copilot CLI and VS Code Copilot Chat) cannot reassemble the calls; they treat the response as a prose-only assistant message and stop.

The same request body sent with stream: false returns a clean, correct response containing the same tool calls as distinct entries in choices[0].message.tool_calls, confirming the bug is in the streaming emission code in model/parsers/qwen3coder/, not in the model output itself.

Code Example

import os
from openai import OpenAI
client = OpenAI(base_url=os.environ["OPENAI_BASE_URL"], api_key=os.environ["OPENAI_API_KEY"])

def t(name, desc, props=None):
    return {"type":"function","function":{"name":name,"description":desc,
        "parameters":{"type":"object","properties":props or {"path":{"type":"string"}}}}}

# ~20 tools; doesn't matter what they do — only the count + presence
tools = [t(f"tool_{i}", f"placeholder {i}") for i in range(20)]

stream = client.chat.completions.create(
    model="qwen3-coder:30b",
    messages=[{"role":"system","content":"You are a coding agent."},
              {"role":"user","content":"List files using two different tools then finish."}],
    tools=tools, stream=True,
)
seen = []
for chunk in stream:
    for ch in chunk.choices:
        d = ch.delta
        for tc in (d.tool_calls or []):
            seen.append((tc.index, getattr(tc.function, "name", None)))
        if ch.finish_reason:
            print("finish_reason:", ch.finish_reason)
print("tool_call indexes seen:", seen)

---

finish_reason: stop
tool_call indexes seen: [(0, 'report_intent'), (0, 'glob'), (0, 'glob'), (0, 'view')]

---

finish_reason: tool_calls
tool_calls: [4 distinct entries, each with its own id and arguments]

---

level=WARN source=qwen3coder.go:64
msg="qwen tool call parsing failed"
error="XML syntax error on line N: unexpected EOF"

Summary

Environment

Ollama 0.23.2 (server, Linux, CUDA 13.0, RTX 4090)
Model: qwen3-coder:30b (qwen3moe, 30.5B, Q4_K_M; PARAMETER num_ctx 98304, num_gpu 49, KV q8_0)
Hits via LiteLLM proxy (ollama_chat); also reproducible going direct to the Ollama /api/chat endpoint as long as the request shape matches.

Reproducer

Any chat completion with tools: [...] of ~20+ tools and stream: true exhibits it. Minimal repro (Python, requires pip install openai):

import os
from openai import OpenAI
client = OpenAI(base_url=os.environ["OPENAI_BASE_URL"], api_key=os.environ["OPENAI_API_KEY"])

def t(name, desc, props=None):
    return {"type":"function","function":{"name":name,"description":desc,
        "parameters":{"type":"object","properties":props or {"path":{"type":"string"}}}}}

# ~20 tools; doesn't matter what they do — only the count + presence
tools = [t(f"tool_{i}", f"placeholder {i}") for i in range(20)]

stream = client.chat.completions.create(
    model="qwen3-coder:30b",
    messages=[{"role":"system","content":"You are a coding agent."},
              {"role":"user","content":"List files using two different tools then finish."}],
    tools=tools, stream=True,
)
seen = []
for chunk in stream:
    for ch in chunk.choices:
        d = ch.delta
        for tc in (d.tool_calls or []):
            seen.append((tc.index, getattr(tc.function, "name", None)))
        if ch.finish_reason:
            print("finish_reason:", ch.finish_reason)
print("tool_call indexes seen:", seen)

Observed against qwen3-coder:30b with ~20+ tools (real case in our env triggered with Copilot CLI sending 26 tools):

finish_reason: stop
tool_call indexes seen: [(0, 'report_intent'), (0, 'glob'), (0, 'glob'), (0, 'view')]

Same request body with stream=False:

finish_reason: tool_calls
tool_calls: [4 distinct entries, each with its own id and arguments]

Expected behaviour

Streaming response should match the OpenAI streaming spec: each tool call gets a distinct index (0, 1, 2, 3, …), finish_reason is "tool_calls" on the closing chunk.

Ollama log signal

The parser logs accompany the failure:

level=WARN source=qwen3coder.go:64
msg="qwen tool call parsing failed"
error="XML syntax error on line N: unexpected EOF"

I believe the streaming index-0 emission is a separate symptom in the same parser. The XML EOF error may be the parser failing to mark the boundary between successive tool calls during streaming.

Adjacent issues

#14834 — same parser, "unexpected EOF" symptom; OPEN
#14570 — parser returns 500 on truncated model output; OPEN
#12557 — broader "Ollama Tool Calling + Streaming" tracking; OPEN
#13093 — Goose users hit related parser fragility; OPEN
#15705 (PR) — partial fix ("prevent tag regex from matching across newlines"); OPEN

Local workaround

Force stream: false at the proxy layer when tools are present, then re-emit the non-stream JSON response as a small SSE transcript on the way back to the client (assigning distinct index values per tool call). Works end-to-end with VS Code Copilot Chat and Copilot CLI. Implementation reference (open-source): https://github.com/scope-lab-vu/LLATCH/blob/bench-harness/llatch-stream-wrap/server.py

Why this matters

Any client that uses stream: true with multi-tool requests (Copilot, Aider, Goose, OpenHands, etc.) currently can't drive qwen3-coder through Ollama for agentic tasks unless they implement their own non-stream fallback.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

ollama - 💡(How to fix) Fix qwen3-coder streaming: multiple tool_calls all emitted with index=0; finish_reason=stop

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Local workaround

Code Example

Summary

Environment

Reproducer

Expected behaviour

Ollama log signal

Adjacent issues

Local workaround

Why this matters

Still need to ship something?

TRENDING

ollama - 💡(How to fix) Fix qwen3-coder streaming: multiple tool_calls all emitted with index=0; finish_reason=stop

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Local workaround

Code Example

Summary

Environment

Reproducer

Expected behaviour

Ollama log signal

Adjacent issues

Local workaround

Why this matters

Still need to ship something?

RELATED_DISCOVERY

TRENDING