vllm - 💡(How to fix) Fix [Bug]:Qwen3.5 with reasoning-parser returns non-standard tool call format when enable_thinking=true

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

</details>


### 🐛 Describe the bug


## 🐛 Bug Description

When using Qwen3.5-27B with `--reasoning-parser qwen3` and `--tool-call-parser qwen3_coder` (or `hermes`), enabling thinking mode (`enable_thinking=true`) causes tool calls to be returned in non-standard XML format within the `content` field, instead of the standard OpenAI-compatible `tool_calls` array.

When `enable_thinking=false`, tool calls are correctly parsed and returned in the `tool_calls` field.



## 🚀 Launch Command

---

## 📤 Request (enable_thinking=true)

---

## 📥 Actual Response (Incorrect)

---

**Problem**: Tool call is returned as XML in `content` field, `tool_calls` array is empty.

## ✅ Expected Response

Tool calls should be parsed and returned in the standard `tool_calls` array:
RAW_BUFFERClick to expand / collapse

Your current environment

🔍 Environment

  • vLLM version: 0.18.0
  • Model: Qwen3.5-27B
  • Deployment: vLLM serve with tensor parallelism
  • Python version: 3.11.14
  • OS: Linux

</details>


### 🐛 Describe the bug


## 🐛 Bug Description

When using Qwen3.5-27B with `--reasoning-parser qwen3` and `--tool-call-parser qwen3_coder` (or `hermes`), enabling thinking mode (`enable_thinking=true`) causes tool calls to be returned in non-standard XML format within the `content` field, instead of the standard OpenAI-compatible `tool_calls` array.

When `enable_thinking=false`, tool calls are correctly parsed and returned in the `tool_calls` field.



## 🚀 Launch Command

```bash
export VLLM_LOGGING_CONFIG_PATH=/home/up_docker/src/logging_config.json
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export OMP_NUM_THREADS=1
export LD_PRELOAD=/usr/lib64/libjemalloc.so.2:$LD_PRELOAD
export TASK_QUEUE_ENABLE=1

vllm serve /data/Qwen3.5-27B \
  --served-model-name Qwen3.5-27B \
  --host 0.0.0.0 \
  --port 8101 \
  --data-parallel-size 1 \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.9 \
  --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}' \
  --async-scheduling \
  --additional-config '{"enable_cpu_binding":true}' \
  --speculative_config '{"method": "qwen3_5_mtp", "num_speculative_tokens": 3, "enforce_eager": true}' \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --enable-chunked-prefill \
  --enable-prefix-caching \
  --reasoning-parser qwen3 \
  --max-num-batched-tokens 16384

📤 Request (enable_thinking=true)

{
    "model": "Qwen3.5-27B",
    "temperature": 0.1,
    "top_p": 0.1,
    "stream": false,
    "chat_template_kwargs": {"enable_thinking": true},
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "navigator_output",
            "strict": true,
            "schema": {
                "$schema": "https://json-schema.org/draft/2020-12/schema",
                "type": "object",
                "properties": {
                    "current_state": {
                        "type": "object",
                        "properties": {
                            "evaluation_previous_goal": {"type": "string"},
                            "memory": {"type": "string"},
                            "next_goal": {"type": "string"}
                        },
                        "required": ["evaluation_previous_goal", "memory", "next_goal"]
                    },
                    "action": {
                        "type": "array",
                        "items": {"type": "object"}
                    }
                },
                "required": ["current_state", "action"]
            }
        }
    },
    "messages": [
        {"role": "user", "content": "请帮我填写日志"}
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "AgentOutput",
                "description": "Agent output structure",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "current_state": {
                            "type": "object",
                            "properties": {
                                "evaluation_previous_goal": {"type": "string"},
                                "memory": {"type": "string"},
                                "next_goal": {"type": "string"}
                            }
                        },
                        "action": {
                            "type": "array",
                            "items": {"type": "object"}
                        }
                    },
                    "required": ["current_state", "action"]
                }
            }
        }
    ]
}

📥 Actual Response (Incorrect)

{
    "id": "chatcmpl-8e0d6f4d5cbaf6ab",
    "object": "chat.completion",
    "created": 1778048205,
    "model": "Qwen3.5-27B",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\n<tool_call>\n<function=AgentOutput>\n<parameter=current_state>\n{\"evaluation_previous_goal\": \"Success - 浏览器已启动...\", \"memory\": \"用户需要填写三个时间段...\", \"next_goal\": \"点击'+ 新增'按钮...\"}\n</parameter>\n<parameter=action>\n[{\"click_element\": {\"intent\": \"点击新增按钮添加日志\", \"index\": 4}}]\n</parameter>\n</function>\n</tool_call>",
                "tool_calls": [],
                "reasoning": "当前页面已经显示了我的日志页面..."
            },
            "finish_reason": "stop"
        }
    ]
}

Problem: Tool call is returned as XML in content field, tool_calls array is empty.

✅ Expected Response

Tool calls should be parsed and returned in the standard tool_calls array:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_xxx",
                        "type": "function",
                        "function": {
                            "name": "AgentOutput",
                            "arguments": "{\"current_state\": {...}, \"action\": [...]}"
                        }
                    }
                ],
                "reasoning": "当前页面已经显示了我的日志页面..."
            }
        }
    ]
}

🔧 Steps to Reproduce

  1. Start vLLM with Qwen3.5-27B using the launch command above
  2. Send a request with enable_thinking=true and tools defined
  3. Observe that tool calls are returned as XML in content instead of tool_calls array
  4. Change to enable_thinking=false and observe tool calls work correctly

🧪 Additional Tests

  • Tested with --tool-call-parser hermes: Same issue persists
  • Tested with enable_thinking=false: Tool calls work correctly, returned in tool_calls array

🤔 Analysis

The issue appears to be a conflict between:

  1. --reasoning-parser qwen3 - parsing thinking/reasoning content
  2. --tool-call-parser qwen3_coder or hermes - parsing tool calls

When thinking is enabled, the model outputs tool calls in XML format within its reasoning process, but the tool call parser fails to extract them when the reasoning parser is also active.

📋 Possible Related Issues

  • (Please search if there are similar issues with reasoning + tool call parsers)

💡 Suggested Fix

The tool call parser should be able to handle tool calls embedded in XML format when reasoning/thinking mode is enabled, or there should be coordination between the reasoning parser and tool call parser to properly extract tool calls from the model's output.


Labels: bug, tool-calling, reasoning, qwen

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]:Qwen3.5 with reasoning-parser returns non-standard tool call format when enable_thinking=true