hermes - ✅(Solved) Fix [Bug] MiniMax reasoning content leaks to CLI output even with reasoning_effort: none [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17924Fetched 2026-05-01 05:55:06
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×1

Root Cause

The _strip_think_blocks function in run_agent.py only runs on final_response after the streaming completes. During streaming, _stream_delta in cli.py calls _clean_for_display which does NOT filter <think> blocks — so thinking content leaks through token by token.

Issue #9685 fixed this for gateway/streaming platforms, but the CLI side _stream_delta may not have received the equivalent fix.

Fix Action

Fixed

PR fix notes

PR #17943: fix(cli): suppress unterminated streamed reasoning blocks

Description (problem / solution / changelog)

Summary

  • Suppress unterminated CLI-streamed reasoning blocks that start with a real block-boundary reasoning tag.
  • Preserve the existing legacy/manual false-positive recovery path for states that were not opened by _stream_delta seeing a reasoning tag.

Root cause

MiniMax-style providers can stream scratchpad text through normal content and omit the closing </think> tag. The CLI already suppresses closed <think>...</think> blocks during streaming, but _flush_stream() recovered any still-buffered reasoning-block content as regular response text at end-of-stream, leaking the tail of hidden reasoning.

Fix

Track when _stream_delta() entered a reasoning block because it actually saw a block-boundary opening tag. On final flush, discard that unfinished block when show_reasoning is disabled instead of emitting it as response content.

Regression coverage

Added a CLI streaming regression test for <think> content that never receives a closing tag before _flush_stream().

Testing

  • scripts/run_tests.sh tests/cli/test_stream_delta_think_tag.py -q
  • scripts/run_tests.sh tests/cli/test_stream_delta_think_tag.py tests/run_agent/test_run_agent.py -q

Closes #17924

Changed files

  • cli.py (modified, +15/-4)
  • tests/cli/test_stream_delta_think_tag.py (modified, +20/-0)
RAW_BUFFERClick to expand / collapse

Bug Description

MiniMax-M2.7 (via minimax-cn provider) streams thinking/reasoning content directly into delta.content instead of delta.reasoning_content. Setting reasoning_effort: none or show_reasoning: false does NOT suppress the thinking output in the terminal.

This is different from how Claude handles thinking (arrives via delta.thinking_delta to reasoning_callback and never reaches display). MiniMax/DeepSeek/Qwen3 write their scratchpad directly into the plain content stream.

Expected Behavior

When reasoning_effort: none is set, no thinking content should appear in the terminal output.

Actual Behavior

Thinking tags like <think>...</think> still appear in the CLI output, making it impossible to get clean responses from MiniMax models.

Root Cause

The _strip_think_blocks function in run_agent.py only runs on final_response after the streaming completes. During streaming, _stream_delta in cli.py calls _clean_for_display which does NOT filter <think> blocks — so thinking content leaks through token by token.

Issue #9685 fixed this for gateway/streaming platforms, but the CLI side _stream_delta may not have received the equivalent fix.

Configuration

agent: reasoning_effort: none

display: show_reasoning: false tool_progress: off

Model: MiniMax-M2.7 (minimax-cn provider)

Suggested Fix

Apply the same think-block stripping logic from GatewayStreamConsumer._clean_for_display to cli.py _stream_delta:

  1. Strip complete blocks: re.sub(r'<(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]>.?</\1>', '', cleaned, flags=re.DOTALL | re.IGNORECASE)
  2. Strip in-progress blocks (closing tag not arrived yet): re.sub(r'<(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]>.', '', cleaned, flags=re.DOTALL | re.IGNORECASE)
  3. Strip orphaned closing tags: re.sub(r'</(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>\s*', '', cleaned, flags=re.IGNORECASE)

Fast-path check: if '<think' in text.lower() to avoid regex overhead on non-reasoning responses.

Environment

  • Hermes Agent: latest (v0.9.x)
  • Provider: minimax-cn (MiniMax-M2.7)
  • Platform: macOS CLI

extent analysis

TL;DR

Apply the think-block stripping logic to the _stream_delta function in cli.py to prevent thinking content from leaking into the terminal output.

Guidance

  • Verify that the _strip_think_blocks function in run_agent.py is correctly implemented and only runs on the final response.
  • Apply the suggested fix by modifying the _stream_delta function in cli.py to strip think blocks using regular expressions, as described in the issue.
  • Test the fix with the provided configuration and MiniMax-M2.7 model to ensure that thinking content is no longer displayed in the terminal output.
  • Consider adding a fast-path check to avoid regex overhead on non-reasoning responses.

Example

import re

def _stream_delta(delta):
    # ...
    cleaned = re.sub(r'<(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]*>.*?</\1>', '', delta.content, flags=re.DOTALL | re.IGNORECASE)
    cleaned = re.sub(r'<(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]*>.*', '', cleaned, flags=re.DOTALL | re.IGNORECASE)
    cleaned = re.sub(r'</(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>\s*', '', cleaned, flags=re.IGNORECASE)
    # ...

Notes

The fix may not be applicable to other models or providers, and additional testing may be required to ensure compatibility.

Recommendation

Apply the workaround by modifying the _stream_delta function in cli.py to strip think blocks, as this is a targeted fix for the specific issue described.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug] MiniMax reasoning content leaks to CLI output even with reasoning_effort: none [1 pull requests, 1 participants]