vllm - ✅(Solved) Fix vLLM 0.19 may lose tool calls for Qwen/Qwen3.5-35B-A3B-FP8 when XML tool_call is emitted inside <think> [1 pull requests, 9 comments, 6 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#39056Fetched 2026-04-08 02:52:46
View on GitHub
Comments
9
Participants
6
Timeline
14
Reactions
2
Timeline (top)
commented ×9subscribed ×2cross-referenced ×1mentioned ×1

Fix Action

Fix / Workaround

That patch promotes embedded XML tool-call blocks out of reasoning into content in qwen3_reasoning_parser, so the existing qwen3_coder tool parser can still parse them.

PR fix notes

PR #39055: Fix Qwen3 reasoning tool calls embedded inside think

Description (problem / solution / changelog)

Summary

This PR fixes a Qwen3/Qwen3.5 non-streaming compatibility issue when using:

  • --reasoning-parser qwen3
  • --tool-call-parser qwen3_coder

Qwen models can emit XML tool calls inside <think> ... </think>. The current non-streaming pipeline extracts reasoning first and only parses tool calls from content, so valid XML tool calls embedded in reasoning are lost.

This patch updates qwen3_reasoning_parser to promote valid XML tool-call blocks out of reasoning into content, allowing the existing qwen3_coder tool parser to recover them without changing the generic serving stack.

Why this scope

This PR fixes parser recovery, not model generation behavior. It does not try to prevent Qwen3.5 from emitting tool calls inside <think>; it makes vLLM robust when that output pattern appears.

Tests

Added tests cover:

  • unchanged behavior for normal reasoning extraction
  • embedded tool call promotion from reasoning to content
  • successful parsing by qwen3_coder
  • truncated reasoning recovery without </think>
  • preservation of post-</think> content

Limitation

This change fixes the non-streaming path. Streaming recovery would require additional serving-layer changes and is intentionally left out of this minimal patch.

Changed files

  • docs/design/qwen3_reasoning_tool_call_recovery.md (added, +88/-0)
  • tests/reasoning/test_qwen3_reasoning_parser.py (modified, +114/-0)
  • vllm/reasoning/qwen3_reasoning_parser.py (modified, +49/-2)

Code Example

<think>
...
<tool_call>
<function=Finish>
<parameter=answer>
204
</parameter>
</function>
</tool_call>
</think>
RAW_BUFFERClick to expand / collapse

Problem

With vLLM 0.19, when serving the model:

  • Qwen/Qwen3.5-35B-A3B-FP8

using:

  • --reasoning-parser qwen3
  • --tool-call-parser qwen3_coder

non-streaming tool-call parsing can fail if the model emits XML tool-call markup inside the reasoning region, for example:

<think>
...
<tool_call>
<function=Finish>
<parameter=answer>
204
</parameter>
</function>
</tool_call>
</think>

In this case, the model may have produced a valid tool call, but the OpenAI-compatible response ends up with populated reasoning and empty tool_calls.

Observed model

The issue was observed and reproduced with:

  • Qwen/Qwen3.5-35B-A3B-FP8

It may also affect other Qwen3 / Qwen3.5 models that use the same parser combination, but the confirmed reproduction here is specifically on Qwen/Qwen3.5-35B-A3B-FP8.

Why this happens

The issue appears to come from the interaction between the reasoning parser and the tool parser:

  1. qwen3_reasoning_parser extracts everything before </think> into reasoning.
  2. downstream tool parsing only inspects content.
  3. if <tool_call>...</tool_call> remains inside reasoning, it never reaches qwen3_coder.

So the bug is not that vLLM makes Qwen3.5 generate tool calls inside <think>. The bug is that vLLM currently does not recover those tool calls when that output pattern occurs.

Scope

This report is specifically about the non-streaming path, which can break benchmark / agent flows that rely on structured tool_calls being returned.

Proposed fix

I have opened a PR with a minimal parser-side fix here:

That patch promotes embedded XML tool-call blocks out of reasoning into content in qwen3_reasoning_parser, so the existing qwen3_coder tool parser can still parse them.

Request

Please consider reviewing and merging the PR if the approach looks acceptable.

extent analysis

TL;DR

The most likely fix for the issue is to apply the proposed parser-side fix in the PR, which promotes embedded XML tool-call blocks out of reasoning into content in qwen3_reasoning_parser.

Guidance

  • Review the PR (https://github.com/vllm-project/vllm/pull/39055) to ensure the proposed fix addresses the issue without introducing new problems.
  • Verify that the fix works by testing the non-streaming tool-call parsing with the Qwen/Qwen3.5-35B-A3B-FP8 model and the --reasoning-parser qwen3 and --tool-call-parser qwen3_coder options.
  • Consider testing other Qwen3 / Qwen3.5 models that use the same parser combination to ensure the fix does not introduce issues with other models.
  • If the PR is merged, re-test the benchmark / agent flows that rely on structured tool_calls being returned to ensure they are working as expected.

Example

No code snippet is provided as the issue does not require a code example to understand the fix.

Notes

The proposed fix only addresses the non-streaming path, so it may not resolve issues with streaming tool-call parsing. Additionally, the fix may not work for all models, so thorough testing is necessary to ensure its effectiveness.

Recommendation

Apply the workaround by merging the proposed PR (https://github.com/vllm-project/vllm/pull/39055), as it provides a targeted fix for the issue and has been tested with the Qwen/Qwen3.5-35B-A3B-FP8 model.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING