vllm - 💡(How to fix) Fix [Bug]:Qwen3.5 with reasoning-parser returns non-standard tool call format when enable

Code Example

</details>


### 🐛 Describe the bug


## 🐛 Bug Description

When using Qwen3.5-27B with `--reasoning-parser qwen3` and `--tool-call-parser qwen3_coder` (or `hermes`), enabling thinking mode (`enable_thinking=true`) causes tool calls to be returned in non-standard XML format within the `content` field, instead of the standard OpenAI-compatible `tool_calls` array.

When `enable_thinking=false`, tool calls are correctly parsed and returned in the `tool_calls` field.



## 🚀 Launch Command

---

## 📤 Request (enable_thinking=true)

---

## 📥 Actual Response (Incorrect)

---

**Problem**: Tool call is returned as XML in `content` field, `tool_calls` array is empty.

## ✅ Expected Response

Tool calls should be parsed and returned in the standard `tool_calls` array:

Your current environment

🔍 Environment

vLLM version: 0.18.0
Model: Qwen3.5-27B
Deployment: vLLM serve with tensor parallelism
Python version: 3.11.14
OS: Linux


</details>


### 🐛 Describe the bug


## 🐛 Bug Description

When using Qwen3.5-27B with `--reasoning-parser qwen3` and `--tool-call-parser qwen3_coder` (or `hermes`), enabling thinking mode (`enable_thinking=true`) causes tool calls to be returned in non-standard XML format within the `content` field, instead of the standard OpenAI-compatible `tool_calls` array.

When `enable_thinking=false`, tool calls are correctly parsed and returned in the `tool_calls` field.



## 🚀 Launch Command

```bash
export VLLM_LOGGING_CONFIG_PATH=/home/up_docker/src/logging_config.json
export PYTORCH_NPU_ALLOC_CONF="expandable_segments:True"
export OMP_NUM_THREADS=1
export LD_PRELOAD=/usr/lib64/libjemalloc.so.2:$LD_PRELOAD
export TASK_QUEUE_ENABLE=1

vllm serve /data/Qwen3.5-27B \
  --served-model-name Qwen3.5-27B \
  --host 0.0.0.0 \
  --port 8101 \
  --data-parallel-size 1 \
  --tensor-parallel-size 4 \
  --gpu-memory-utilization 0.9 \
  --compilation-config '{"cudagraph_mode":"FULL_DECODE_ONLY"}' \
  --async-scheduling \
  --additional-config '{"enable_cpu_binding":true}' \
  --speculative_config '{"method": "qwen3_5_mtp", "num_speculative_tokens": 3, "enforce_eager": true}' \
  --enable-auto-tool-choice \
  --tool-call-parser qwen3_coder \
  --enable-chunked-prefill \
  --enable-prefix-caching \
  --reasoning-parser qwen3 \
  --max-num-batched-tokens 16384

📤 Request (enable_thinking=true)

{
    "model": "Qwen3.5-27B",
    "temperature": 0.1,
    "top_p": 0.1,
    "stream": false,
    "chat_template_kwargs": {"enable_thinking": true},
    "response_format": {
        "type": "json_schema",
        "json_schema": {
            "name": "navigator_output",
            "strict": true,
            "schema": {
                "$schema": "https://json-schema.org/draft/2020-12/schema",
                "type": "object",
                "properties": {
                    "current_state": {
                        "type": "object",
                        "properties": {
                            "evaluation_previous_goal": {"type": "string"},
                            "memory": {"type": "string"},
                            "next_goal": {"type": "string"}
                        },
                        "required": ["evaluation_previous_goal", "memory", "next_goal"]
                    },
                    "action": {
                        "type": "array",
                        "items": {"type": "object"}
                    }
                },
                "required": ["current_state", "action"]
            }
        }
    },
    "messages": [
        {"role": "user", "content": "请帮我填写日志"}
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "AgentOutput",
                "description": "Agent output structure",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "current_state": {
                            "type": "object",
                            "properties": {
                                "evaluation_previous_goal": {"type": "string"},
                                "memory": {"type": "string"},
                                "next_goal": {"type": "string"}
                            }
                        },
                        "action": {
                            "type": "array",
                            "items": {"type": "object"}
                        }
                    },
                    "required": ["current_state", "action"]
                }
            }
        }
    ]
}

📥 Actual Response (Incorrect)

{
    "id": "chatcmpl-8e0d6f4d5cbaf6ab",
    "object": "chat.completion",
    "created": 1778048205,
    "model": "Qwen3.5-27B",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\n<tool_call>\n<function=AgentOutput>\n<parameter=current_state>\n{\"evaluation_previous_goal\": \"Success - 浏览器已启动...\", \"memory\": \"用户需要填写三个时间段...\", \"next_goal\": \"点击'+ 新增'按钮...\"}\n</parameter>\n<parameter=action>\n[{\"click_element\": {\"intent\": \"点击新增按钮添加日志\", \"index\": 4}}]\n</parameter>\n</function>\n</tool_call>",
                "tool_calls": [],
                "reasoning": "当前页面已经显示了我的日志页面..."
            },
            "finish_reason": "stop"
        }
    ]
}

Problem: Tool call is returned as XML in content field, tool_calls array is empty.

✅ Expected Response

Tool calls should be parsed and returned in the standard tool_calls array:

{
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": null,
                "tool_calls": [
                    {
                        "id": "call_xxx",
                        "type": "function",
                        "function": {
                            "name": "AgentOutput",
                            "arguments": "{\"current_state\": {...}, \"action\": [...]}"
                        }
                    }
                ],
                "reasoning": "当前页面已经显示了我的日志页面..."
            }
        }
    ]
}

🔧 Steps to Reproduce

Start vLLM with Qwen3.5-27B using the launch command above
Send a request with enable_thinking=true and tools defined
Observe that tool calls are returned as XML in content instead of tool_calls array
Change to enable_thinking=false and observe tool calls work correctly

🧪 Additional Tests

Tested with --tool-call-parser hermes: Same issue persists
Tested with enable_thinking=false: Tool calls work correctly, returned in tool_calls array

🤔 Analysis

The issue appears to be a conflict between:

--reasoning-parser qwen3 - parsing thinking/reasoning content
--tool-call-parser qwen3_coder or hermes - parsing tool calls

When thinking is enabled, the model outputs tool calls in XML format within its reasoning process, but the tool call parser fails to extract them when the reasoning parser is also active.

📋 Possible Related Issues

(Please search if there are similar issues with reasoning + tool call parsers)

💡 Suggested Fix

The tool call parser should be able to handle tool calls embedded in XML format when reasoning/thinking mode is enabled, or there should be coordination between the reasoning parser and tool call parser to properly extract tool calls from the model's output.

Labels: bug, tool-calling, reasoning, qwen

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

vllm - 💡(How to fix) Fix [Bug]:Qwen3.5 with reasoning-parser returns non-standard tool call format when enable_thinking=true

Recommended Tools

GitHub issue graph ai analysis

Code Example

Your current environment

🔍 Environment

📤 Request (enable_thinking=true)

📥 Actual Response (Incorrect)

✅ Expected Response

🔧 Steps to Reproduce

🧪 Additional Tests

🤔 Analysis

📋 Possible Related Issues

💡 Suggested Fix

Still need to ship something?

TRENDING

vllm - 💡(How to fix) Fix [Bug]:Qwen3.5 with reasoning-parser returns non-standard tool call format when enable_thinking=true

Recommended Tools

GitHub issue graph ai analysis

Code Example

Your current environment

🔍 Environment

📤 Request (enable_thinking=true)

📥 Actual Response (Incorrect)

✅ Expected Response

🔧 Steps to Reproduce

🧪 Additional Tests

🤔 Analysis

📋 Possible Related Issues

💡 Suggested Fix

Still need to ship something?

RELATED_DISCOVERY

TRENDING