ollama - 💡(How to fix) Fix qwen3-coder streaming: multiple tool_calls all emitted with index=0; finish_reason=stop

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When streaming a chat completion from qwen3-coder:30b with many tools present (~20+), Ollama's qwen3coder parser emits every tool_call delta with index: 0 instead of incrementing 0, 1, 2, …, and closes the response with finish_reason: "stop" instead of "tool_calls". Per the OpenAI streaming spec, distinct tool calls must use distinct index values so the client can route delta chunks. Clients (including GitHub Copilot CLI and VS Code Copilot Chat) cannot reassemble the calls; they treat the response as a prose-only assistant message and stop.

The same request body sent with stream: false returns a clean, correct response containing the same tool calls as distinct entries in choices[0].message.tool_calls, confirming the bug is in the streaming emission code in model/parsers/qwen3coder/, not in the model output itself.

Error Message

level=WARN source=qwen3coder.go:64 error="XML syntax error on line N: unexpected EOF" I believe the streaming index-0 emission is a separate symptom in the same parser. The XML EOF error may be the parser failing to mark the boundary between successive tool calls during streaming.

Root Cause

Any client that uses stream: true with multi-tool requests (Copilot, Aider, Goose, OpenHands, etc.) currently can't drive qwen3-coder through Ollama for agentic tasks unless they implement their own non-stream fallback.

Fix Action

Fix / Workaround

Local workaround

Code Example

import os
from openai import OpenAI
client = OpenAI(base_url=os.environ["OPENAI_BASE_URL"], api_key=os.environ["OPENAI_API_KEY"])

def t(name, desc, props=None):
    return {"type":"function","function":{"name":name,"description":desc,
        "parameters":{"type":"object","properties":props or {"path":{"type":"string"}}}}}

# ~20 tools; doesn't matter what they do — only the count + presence
tools = [t(f"tool_{i}", f"placeholder {i}") for i in range(20)]

stream = client.chat.completions.create(
    model="qwen3-coder:30b",
    messages=[{"role":"system","content":"You are a coding agent."},
              {"role":"user","content":"List files using two different tools then finish."}],
    tools=tools, stream=True,
)
seen = []
for chunk in stream:
    for ch in chunk.choices:
        d = ch.delta
        for tc in (d.tool_calls or []):
            seen.append((tc.index, getattr(tc.function, "name", None)))
        if ch.finish_reason:
            print("finish_reason:", ch.finish_reason)
print("tool_call indexes seen:", seen)

---

finish_reason: stop
tool_call indexes seen: [(0, 'report_intent'), (0, 'glob'), (0, 'glob'), (0, 'view')]

---

finish_reason: tool_calls
tool_calls: [4 distinct entries, each with its own id and arguments]

---

level=WARN source=qwen3coder.go:64
msg="qwen tool call parsing failed"
error="XML syntax error on line N: unexpected EOF"
RAW_BUFFERClick to expand / collapse

Summary

When streaming a chat completion from qwen3-coder:30b with many tools present (~20+), Ollama's qwen3coder parser emits every tool_call delta with index: 0 instead of incrementing 0, 1, 2, …, and closes the response with finish_reason: "stop" instead of "tool_calls". Per the OpenAI streaming spec, distinct tool calls must use distinct index values so the client can route delta chunks. Clients (including GitHub Copilot CLI and VS Code Copilot Chat) cannot reassemble the calls; they treat the response as a prose-only assistant message and stop.

The same request body sent with stream: false returns a clean, correct response containing the same tool calls as distinct entries in choices[0].message.tool_calls, confirming the bug is in the streaming emission code in model/parsers/qwen3coder/, not in the model output itself.

Environment

  • Ollama 0.23.2 (server, Linux, CUDA 13.0, RTX 4090)
  • Model: qwen3-coder:30b (qwen3moe, 30.5B, Q4_K_M; PARAMETER num_ctx 98304, num_gpu 49, KV q8_0)
  • Hits via LiteLLM proxy (ollama_chat); also reproducible going direct to the Ollama /api/chat endpoint as long as the request shape matches.

Reproducer

Any chat completion with tools: [...] of ~20+ tools and stream: true exhibits it. Minimal repro (Python, requires pip install openai):

import os
from openai import OpenAI
client = OpenAI(base_url=os.environ["OPENAI_BASE_URL"], api_key=os.environ["OPENAI_API_KEY"])

def t(name, desc, props=None):
    return {"type":"function","function":{"name":name,"description":desc,
        "parameters":{"type":"object","properties":props or {"path":{"type":"string"}}}}}

# ~20 tools; doesn't matter what they do — only the count + presence
tools = [t(f"tool_{i}", f"placeholder {i}") for i in range(20)]

stream = client.chat.completions.create(
    model="qwen3-coder:30b",
    messages=[{"role":"system","content":"You are a coding agent."},
              {"role":"user","content":"List files using two different tools then finish."}],
    tools=tools, stream=True,
)
seen = []
for chunk in stream:
    for ch in chunk.choices:
        d = ch.delta
        for tc in (d.tool_calls or []):
            seen.append((tc.index, getattr(tc.function, "name", None)))
        if ch.finish_reason:
            print("finish_reason:", ch.finish_reason)
print("tool_call indexes seen:", seen)

Observed against qwen3-coder:30b with ~20+ tools (real case in our env triggered with Copilot CLI sending 26 tools):

finish_reason: stop
tool_call indexes seen: [(0, 'report_intent'), (0, 'glob'), (0, 'glob'), (0, 'view')]

Same request body with stream=False:

finish_reason: tool_calls
tool_calls: [4 distinct entries, each with its own id and arguments]

Expected behaviour

Streaming response should match the OpenAI streaming spec: each tool call gets a distinct index (0, 1, 2, 3, …), finish_reason is "tool_calls" on the closing chunk.

Ollama log signal

The parser logs accompany the failure:

level=WARN source=qwen3coder.go:64
msg="qwen tool call parsing failed"
error="XML syntax error on line N: unexpected EOF"

I believe the streaming index-0 emission is a separate symptom in the same parser. The XML EOF error may be the parser failing to mark the boundary between successive tool calls during streaming.

Adjacent issues

  • #14834 — same parser, "unexpected EOF" symptom; OPEN
  • #14570 — parser returns 500 on truncated model output; OPEN
  • #12557 — broader "Ollama Tool Calling + Streaming" tracking; OPEN
  • #13093 — Goose users hit related parser fragility; OPEN
  • #15705 (PR) — partial fix ("prevent tag regex from matching across newlines"); OPEN

Local workaround

Force stream: false at the proxy layer when tools are present, then re-emit the non-stream JSON response as a small SSE transcript on the way back to the client (assigning distinct index values per tool call). Works end-to-end with VS Code Copilot Chat and Copilot CLI. Implementation reference (open-source): https://github.com/scope-lab-vu/LLATCH/blob/bench-harness/llatch-stream-wrap/server.py

Why this matters

Any client that uses stream: true with multi-tool requests (Copilot, Aider, Goose, OpenHands, etc.) currently can't drive qwen3-coder through Ollama for agentic tasks unless they implement their own non-stream fallback.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

ollama - 💡(How to fix) Fix qwen3-coder streaming: multiple tool_calls all emitted with index=0; finish_reason=stop