hermes - ✅(Solved) Fix [Bug] MiniMax reasoning content leaks to CLI output even with reasoning_effort: none [1 pull requests, 1 participants]

hermes2026-04-30 11:38:58

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#17924•Fetched 2026-05-01 05:55:06

View on GitHub

Comments

Participants

Timeline

Reactions

Author

maxxqf-ai

Participants

maxxqf-ai

Timeline (top)

labeled ×4cross-referenced ×1

Root Cause

The _strip_think_blocks function in run_agent.py only runs on final_response after the streaming completes. During streaming, _stream_delta in cli.py calls _clean_for_display which does NOT filter <think> blocks — so thinking content leaks through token by token.

Issue #9685 fixed this for gateway/streaming platforms, but the CLI side _stream_delta may not have received the equivalent fix.

Fix Action

Fixed

Fixed by PR: fix(cli): suppress unterminated streamed reasoning blocks (https://github.com/NousResearch/hermes-agent/pull/17943)

PR fix notes

PR #17943: fix(cli): suppress unterminated streamed reasoning blocks

Repository: NousResearch/hermes-agent
Author: liuhao1024
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17943

Description (problem / solution / changelog)

Summary

Suppress unterminated CLI-streamed reasoning blocks that start with a real block-boundary reasoning tag.
Preserve the existing legacy/manual false-positive recovery path for states that were not opened by _stream_delta seeing a reasoning tag.

Root cause

MiniMax-style providers can stream scratchpad text through normal content and omit the closing </think> tag. The CLI already suppresses closed <think>...</think> blocks during streaming, but _flush_stream() recovered any still-buffered reasoning-block content as regular response text at end-of-stream, leaking the tail of hidden reasoning.

Fix

Track when _stream_delta() entered a reasoning block because it actually saw a block-boundary opening tag. On final flush, discard that unfinished block when show_reasoning is disabled instead of emitting it as response content.

Regression coverage

Added a CLI streaming regression test for <think> content that never receives a closing tag before _flush_stream().

Testing

scripts/run_tests.sh tests/cli/test_stream_delta_think_tag.py -q
scripts/run_tests.sh tests/cli/test_stream_delta_think_tag.py tests/run_agent/test_run_agent.py -q

Closes #17924

Changed files

cli.py (modified, +15/-4)
tests/cli/test_stream_delta_think_tag.py (modified, +20/-0)

RAW_BUFFERClick to expand / collapse

Bug Description

MiniMax-M2.7 (via minimax-cn provider) streams thinking/reasoning content directly into delta.content instead of delta.reasoning_content. Setting reasoning_effort: none or show_reasoning: false does NOT suppress the thinking output in the terminal.

This is different from how Claude handles thinking (arrives via delta.thinking_delta to reasoning_callback and never reaches display). MiniMax/DeepSeek/Qwen3 write their scratchpad directly into the plain content stream.

Expected Behavior

When reasoning_effort: none is set, no thinking content should appear in the terminal output.

Actual Behavior

Thinking tags like <think>...</think> still appear in the CLI output, making it impossible to get clean responses from MiniMax models.

Root Cause

Issue #9685 fixed this for gateway/streaming platforms, but the CLI side _stream_delta may not have received the equivalent fix.

Configuration

agent: reasoning_effort: none

display: show_reasoning: false tool_progress: off

Model: MiniMax-M2.7 (minimax-cn provider)

Suggested Fix

Apply the same think-block stripping logic from GatewayStreamConsumer._clean_for_display to cli.py _stream_delta:

Strip complete blocks: re.sub(r'<(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]>.?</\1>', '', cleaned, flags=re.DOTALL | re.IGNORECASE)
Strip in-progress blocks (closing tag not arrived yet): re.sub(r'<(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]>.', '', cleaned, flags=re.DOTALL | re.IGNORECASE)
Strip orphaned closing tags: re.sub(r'</(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>\s*', '', cleaned, flags=re.IGNORECASE)

Fast-path check: if '<think' in text.lower() to avoid regex overhead on non-reasoning responses.

Environment

Hermes Agent: latest (v0.9.x)
Provider: minimax-cn (MiniMax-M2.7)
Platform: macOS CLI

extent analysis

TL;DR

Apply the think-block stripping logic to the _stream_delta function in cli.py to prevent thinking content from leaking into the terminal output.

Guidance

Verify that the _strip_think_blocks function in run_agent.py is correctly implemented and only runs on the final response.
Apply the suggested fix by modifying the _stream_delta function in cli.py to strip think blocks using regular expressions, as described in the issue.
Test the fix with the provided configuration and MiniMax-M2.7 model to ensure that thinking content is no longer displayed in the terminal output.
Consider adding a fast-path check to avoid regex overhead on non-reasoning responses.

Example

import re

def _stream_delta(delta):
    # ...
    cleaned = re.sub(r'<(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]*>.*?</\1>', '', delta.content, flags=re.DOTALL | re.IGNORECASE)
    cleaned = re.sub(r'<(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)\b[^>]*>.*', '', cleaned, flags=re.DOTALL | re.IGNORECASE)
    cleaned = re.sub(r'</(think|thinking|reasoning|thought|REASONING_SCRATCHPAD)>\s*', '', cleaned, flags=re.IGNORECASE)
    # ...

Notes

The fix may not be applicable to other models or providers, and additional testing may be required to ensure compatibility.

Recommendation

Apply the workaround by modifying the _stream_delta function in cli.py to strip think blocks, as this is a targeted fix for the specific issue described.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#index setup #retrieval issue #search optimization #API routing #API middleware

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Bug] MiniMax reasoning content leaks to CLI output even with reasoning_effort: none [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #17943: fix(cli): suppress unterminated streamed reasoning blocks

Description (problem / solution / changelog)

Summary

Root cause

Fix

Regression coverage

Testing

Changed files

Bug Description

Expected Behavior

Actual Behavior

Root Cause

Configuration

Suggested Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING