vllm - ✅(Solved) Fix MiniMaxM2ReasoningParser broken for M2.5: extract_reasoning_streaming assumes no <think> start tag [1 pull requests, 5 comments, 5 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
vllm-project/vllm#38212Fetched 2026-04-08 01:31:36
View on GitHub
Comments
5
Participants
5
Timeline
7
Reactions
0
Timeline (top)
commented ×5cross-referenced ×1referenced ×1

MiniMaxM2ReasoningParser overrides extract_reasoning_streaming with logic that assumes the model only generates </think> (no opening <think> tag). This was true for the original M2, but M2.5 generates both <think> and </think>.

The result: reasoning content (including raw <think>...</think> tags) leaks into the content field, and reasoning / reasoning_content is always null — even with include_reasoning: true.

The base class BaseThinkingReasoningParser already handles both tags correctly. The override in MiniMaxM2ReasoningParser is the sole cause of the bug.

Root Cause

MiniMaxM2ReasoningParser.extract_reasoning_streaming overrides the base class with M2-specific logic:

  • It treats all content as reasoning until </think>
  • It never checks for or strips the <think> start tag
  • This causes the <think> tag itself to be included in the reasoning text, and downstream the entire block ends up in content

The base class BaseThinkingReasoningParser.extract_reasoning_streaming already handles both <think> and </think> correctly (checks for start_token_id in previous/delta tokens, strips both tags).

Fix Action

Workaround

Use --reasoning-parser deepseek_r1 instead, which handles both tags correctly.

PR fix notes

PR #38213: fix(reasoning): MiniMaxM2ReasoningParser broken for M2.5

Description (problem / solution / changelog)

Summary

  • Remove broken extract_reasoning_streaming override from MiniMaxM2ReasoningParser
  • M2.5 generates both <think> and </think>, but the override assumed only </think> (M2 behavior)
  • The base class BaseThinkingReasoningParser already handles both tags correctly

Problem

With --reasoning-parser minimax_m2, reasoning content (including raw <think>...</think> tags) leaks into the content field. The reasoning field is always null, even with include_reasoning: true.

Root cause: MiniMaxM2ReasoningParser.extract_reasoning_streaming treats everything as reasoning until </think>, never checking for or stripping <think>. The <think> tag gets included in the reasoning text, and downstream the entire block ends up in content.

Fix

Delete the override. BaseThinkingReasoningParser.extract_reasoning_streaming checks for start_token_id in previous/delta tokens, handles both tags, and correctly splits reasoning from content. This also maintains backward compatibility with M2 (which only generates </think>) since the base class handles that case too.

Test plan

  • Verified extract_reasoning correctly returns reasoning='thinking', content='\n\nanswer' for input <think>thinking</think>\n\nanswer
  • Verified end-to-end with vLLM v0.17.1rc1.dev150 serving MiniMax-M2.5-REAP-172B: reasoning field populated, content clean
  • Existing tests continue to pass

Fixes #38212

🤖 Generated with Claude Code

Changed files

  • vllm/reasoning/minimax_m2_reasoning_parser.py (modified, +4/-43)

Code Example

vllm serve MiniMaxAI/MiniMax-M2.5 \
  --enable-auto-tool-choice \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2

---

# reasoning is None, <think> tags leak into content
response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.5",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    extra_body={"include_reasoning": True},
)
print(response.choices[0].message.reasoning)  # None (should have content)
print(response.choices[0].message.content)    # "<think>...</think>\n\n4"
RAW_BUFFERClick to expand / collapse

Bug: MiniMaxM2ReasoningParser doesn't handle M2.5's <think> start tag

Description

MiniMaxM2ReasoningParser overrides extract_reasoning_streaming with logic that assumes the model only generates </think> (no opening <think> tag). This was true for the original M2, but M2.5 generates both <think> and </think>.

The result: reasoning content (including raw <think>...</think> tags) leaks into the content field, and reasoning / reasoning_content is always null — even with include_reasoning: true.

The base class BaseThinkingReasoningParser already handles both tags correctly. The override in MiniMaxM2ReasoningParser is the sole cause of the bug.

How to reproduce

vllm serve MiniMaxAI/MiniMax-M2.5 \
  --enable-auto-tool-choice \
  --tool-call-parser minimax_m2 \
  --reasoning-parser minimax_m2
# reasoning is None, <think> tags leak into content
response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.5",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    extra_body={"include_reasoning": True},
)
print(response.choices[0].message.reasoning)  # None (should have content)
print(response.choices[0].message.content)    # "<think>...</think>\n\n4"

Root cause

MiniMaxM2ReasoningParser.extract_reasoning_streaming overrides the base class with M2-specific logic:

  • It treats all content as reasoning until </think>
  • It never checks for or strips the <think> start tag
  • This causes the <think> tag itself to be included in the reasoning text, and downstream the entire block ends up in content

The base class BaseThinkingReasoningParser.extract_reasoning_streaming already handles both <think> and </think> correctly (checks for start_token_id in previous/delta tokens, strips both tags).

Proposed fix

Remove the extract_reasoning_streaming override from MiniMaxM2ReasoningParser so it inherits the correct implementation from BaseThinkingReasoningParser.

Related

  • #34625 — users report minimax_m2_append_think doesn't separate reasoning for M2.5, workaround is deepseek_r1
  • The minimax_m2_append_think parser has a separate but related issue: its extract_reasoning intentionally returns (None, "<think>" + model_output), making reasoning separation impossible by design

Environment

  • vLLM: v0.17.1rc1.dev150
  • Model: MiniMax-M2.5 (also reproduced with REAP-172B quantized variant)

Workaround

Use --reasoning-parser deepseek_r1 instead, which handles both tags correctly.

extent analysis

Fix Plan

To resolve the issue, we need to remove the extract_reasoning_streaming override from MiniMaxM2ReasoningParser. This will allow it to inherit the correct implementation from BaseThinkingReasoningParser, which handles both <think> and </think> tags correctly.

Steps:

  • Remove the extract_reasoning_streaming method from MiniMaxM2ReasoningParser.
  • Ensure that MiniMaxM2ReasoningParser inherits from BaseThinkingReasoningParser and does not override the extract_reasoning_streaming method.

Example Code:

# Remove the override
class MiniMaxM2ReasoningParser(BaseThinkingReasoningParser):
    # Remove the extract_reasoning_streaming method
    pass

Alternatively, if you want to keep the method for future customization, you can call the parent class's method:

class MiniMaxM2ReasoningParser(BaseThinkingReasoningParser):
    def extract_reasoning_streaming(self, *args, **kwargs):
        return super().extract_reasoning_streaming(*args, **kwargs)

Verification

To verify that the fix worked, you can run the following test:

response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.5",
    messages=[{"role": "user", "content": "What is 2+2?"}],
    extra_body={"include_reasoning": True},
)
print(response.choices[0].message.reasoning)  # Should have content
print(response.choices[0].message.content)    # Should not have <think> tags

Extra Tips

  • Make sure to test the fix with different models and inputs to ensure that it works as expected.
  • Consider adding tests to prevent similar issues in the future.
  • If you encounter any issues with the deepseek_r1 parser, you can try using the minimax_m2_append_think parser as a workaround. However, note that this parser has a separate issue that makes reasoning separation impossible by design.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING