vllm - ✅(Solved) Fix [Bug]: MiniMax-M2.5 reasoning missing in chat completions stream [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#36632Fetched 2026-04-08 00:35:46
View on GitHub
Comments
2
Participants
2
Timeline
6
Reactions
0
Timeline (top)
commented ×2closed ×1cross-referenced ×1labeled ×1

Fix Action

Fixed

PR fix notes

PR #34779: [Bugfix] Fix Qwen3/Qwen3.5 Reasoning Parser

Description (problem / solution / changelog)

<!-- markdownlint-disable -->

Purpose

  • Fix the qwen3 reasoning parser to work with Qwen3.5 models (e.g., Qwen/Qwen3.5-397B-A17B) where the chat template places <think> in the prompt rather than having the model generate it. Previously the parser required both <think> and </think> in the generated output, causing it to fail for Qwen3.5 models entirely.
  • Fix the "only reasoning" streaming code path in serving.py to check prompt_is_reasoning_end_arr before calling the parser, matching the behavior already present in the tool_choice=auto and tool_choice=required paths. Without this, enable_thinking=False at inference time would misroute content as reasoning.
  • Fix the tool_choice_function_name streaming path to check prompt_is_reasoning_end_arr before calling the parser (was checked after), preventing a spurious reasoning delta on the first chunk when thinking is disabled.

Properly fixes #34684

Test Plan

All tests pass in tests/reasoning/test_qwen3_reasoning_parser.py

Test Result


<details> <summary> Essential Elements of an Effective PR Description Checklist </summary>
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.
</details>

Changed files

  • tests/reasoning/test_qwen3_reasoning_parser.py (modified, +121/-12)
  • vllm/entrypoints/openai/chat_completion/serving.py (modified, +31/-16)
  • vllm/reasoning/qwen3_reasoning_parser.py (modified, +81/-19)

Code Example

Your output of `python collect_env.py` here

---

from openai import OpenAI

client = OpenAI(api_key=api_key, base_url=base_url)

print("=== Non-Streaming ===")
response = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "hello"}],
)

message = response.choices[0].message
print(f"Reasoning: {getattr(message, 'reasoning_content', None)}")
print(f"Content: {message.content}")

print("\n=== Streaming ===")
stream = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "hello"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    reasoning = getattr(delta, "reasoning_content", None)
    content = getattr(delta, "content", None)
    if reasoning:
        print(f"Reasoning: {reasoning}")
    if content:
        print(f"Content: {content}")
RAW_BUFFERClick to expand / collapse

Your current environment

<details> <summary>The output of <code>python collect_env.py</code></summary>
Your output of `python collect_env.py` here
</details>

🐛 Describe the bug

when serving vllm using vllm/vllm-openai:v0.17.0 + lukealonso/MiniMax-M2.5-NVFP4 --reasoning-parser minimax_m2

reasoning is not present when calling /v1/chat/completions + stream=True it is working correctly with non-stream + /v1/responses

bellow is script that reproduces the issue

from openai import OpenAI

client = OpenAI(api_key=api_key, base_url=base_url)

print("=== Non-Streaming ===")
response = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "hello"}],
)

message = response.choices[0].message
print(f"Reasoning: {getattr(message, 'reasoning_content', None)}")
print(f"Content: {message.content}")

print("\n=== Streaming ===")
stream = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "hello"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    reasoning = getattr(delta, "reasoning_content", None)
    content = getattr(delta, "content", None)
    if reasoning:
        print(f"Reasoning: {reasoning}")
    if content:
        print(f"Content: {content}")

when running with same config on vllm/vllm-openai:v0.15.1 reasoning is streamed properly

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

extent analysis

Fix Plan

The issue seems to be related to the streaming functionality in the vllm/vllm-openai:v0.17.0 version. To fix this, we need to modify the streaming logic to properly handle the reasoning_content attribute.

Code Changes

Here are the steps to fix the issue:

  • Update the streaming logic to check for reasoning_content in the delta object:
print("\n=== Streaming ===")
stream = client.chat.completions.create(
    model=model,
    messages=[{"role": "user", "content": "hello"}],
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if hasattr(delta, 'reasoning_content'):
        reasoning = delta.reasoning_content
        print(f"Reasoning: {reasoning}")
    if hasattr(delta, "content"):
        content = delta.content
        print(f"Content: {content}")
  • Alternatively, you can also try downgrading to vllm/vllm-openai:v0.15.1 as mentioned in the issue description, which seems to be working correctly with streaming.

Verification

To verify that the fix worked, run the modified script and check if the reasoning_content is being streamed properly. You should see the reasoning output in the console.

Extra Tips

  • Make sure to check the API documentation for any changes in the streaming functionality between versions.
  • If the issue persists, try debugging the client.chat.completions.create method to see if the reasoning_content attribute is being sent in the response.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING