vllm - 💡(How to fix) Fix [Bug]: Streaming reasoning tokens truncated when `</think>` and `<tool_call>` appear in the same delta

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Root Cause Analysis: In vllm/parser/abstract_parser.py, the DelegatingParser.parse_delta method processes reasoning extraction and tool call extraction sequentially. When both the reasoning end token and tool call token appear in the same delta:

Code Example

Thinking:
I will use the tool 
Tool:
xxxx
RAW_BUFFERClick to expand / collapse

Your current environment

OS: any vllm: main

🐛 Describe the bug

Problem Description: When using Qwen3.5 models with streaming inference, Multi-Token Prediction (MTP), thinking mode enabled, and tool calling, the last few tokens of the thinking section are occasionally truncated. Non-streaming inference works correctly.

Reproduction Steps:

  1. Enable streaming inference with Qwen3.5 model
  2. Enable thinking mode (--reasoning-parser qwen3)
  3. Enable tool calling (--tool-call-parser qwen3_coder)
  4. Use MTP (default behavior in Qwen3.5)
  5. Trigger responses where model output contains reasoning followed by tool calls

Expected Behavior: The complete reasoning content should be streamed to the client before transitioning to tool call parsing.

Actual Behavior: When MTP generates multiple tokens in a single inference step that include both the reasoning end token (</think>) and the tool call start token (<tool_call>), the reasoning tokens immediately preceding </think> are lost.

Example: config: num_speculative_tokens=3 something output like <think> I will use the tool Write.</think><tool_call> the delta_text is "Write.</think><tool_call>"

  • MTP output tokens: ["Write", ".", "</think>", "<tool_call>"]
  • Expected streaming output: reasoning="Write.", then tool call
  • Actual streaming output: reasoning is empty/partial, only tool call is received. Got Something like:
Thinking:
I will use the tool 
Tool:
xxxx

Root Cause Analysis: In vllm/parser/abstract_parser.py, the DelegatingParser.parse_delta method processes reasoning extraction and tool call extraction sequentially. When both the reasoning end token and tool call token appear in the same delta:

  1. The reasoning parser correctly extracts the reasoning content
  2. However, when the tool parser runs in the same iteration, its return value directly overwrites the delta_message variable, losing the previously extracted reasoning content

Suggested Fix: Preserve the reasoning delta message and merge results from both parsers instead of overwriting. The fix ensures that when both phases run in the same delta, the reasoning content is retained while adding tool call information.

Files Modified:

  • vllm/parser/abstract_parser.py - Fixed the delta message merging logic in DelegatingParser.parse_delta()

Additional Context: This issue only affects streaming inference because non-streaming mode processes the complete output in separate phases without the overwrite issue. The fix maintains backward compatibility and only affects the edge case where reasoning ends and tool calls begin in the same inference step.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Streaming reasoning tokens truncated when `</think>` and `<tool_call>` appear in the same delta