vllm - 💡(How to fix) Fix [Bug]: Qwen parsers broken all around with MTP and/or stream-interval > 1

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

I understand this is way too much for a single ticket, but it makes sense to consolidate this all into the single issue as it seems to be, because it all inherits from the same idea - with multiple tokens it just complete FUBAR situation as of now, and the single strong refactoring idea should fix this all, not one-by-one shooting.

RAW_BUFFERClick to expand / collapse

Your current environment

vllm-0.21.1rc1.dev180+gc68c55d43-cp38-abi3-manylinux_2_28_x86_64.whl

🐛 Describe the bug

There is a lot of fixes to parsers these days, including active ongoing work in Gemma. This all is also related to Qwen parsers, where with MTP (which implies multiple tokens at a time), or with stream-interval more than 1, or even without both, is broken as of now in multiple ways in streaming mode.

  1. last tokens of reasoning blocks got lost. It is in effect even without MTP or interval more than 1, it was here for months already.

  2. case 1 It can be so severe that </think> itself is lost, and reasoning+content parts, combined, do not form complete </think> at all - this is new to MTP, but a subset of 1 actually as it seems.

  3. reasoning -> content transition can be absolutely random: a) start of </think> in reasoning chunk, end in first content chunk b) all </think> in content chunk c) end of reasoning (!) with </think> - in first content chunk d) all the above including missing/broken </think>, see 1 & 2

  4. tool call missing argument value (parsed as empty argument) - like in Gemma parser issue nearby, absolutely the same here.

I understand this is way too much for a single ticket, but it makes sense to consolidate this all into the single issue as it seems to be, because it all inherits from the same idea - with multiple tokens it just complete FUBAR situation as of now, and the single strong refactoring idea should fix this all, not one-by-one shooting.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

vllm - 💡(How to fix) Fix [Bug]: Qwen parsers broken all around with MTP and/or stream-interval > 1