hermes - 💡(How to fix) Fix feat: make reasoning_content echo-back detection dynamic (protocol-level, not provider-name-based) [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

  • Qwen-TP users (Alibaba Cloud's model-as-a-service at maas.aliyuncs.com) get thinking-mode support without any code changes to Hermes Agent.
  • Custom gateway users behind OpenAI-compatible reverse proxies automatically get the right behavior.
  • Future providers don't require a pull request to add a new _needs_*_tool_reasoning() method.
  • Self-hosted / enterprise deployments with custom thinking backends "just work."

Fix Action

Fixed

Code Example

# After extracting raw_reasoning_content...
if raw_reasoning_content is not None:
    msg["reasoning_content"] = _sanitize_surrogates(raw_reasoning_content)
    # DYNAMIC ECHO DETECTION: if we haven't set this flag yet,
    # the API just told us it uses reasoning_content echo protocol
    if not hasattr(self, '_requires_reasoning_echo') or not self._requires_reasoning_echo:
        self._requires_reasoning_echo = True
        logger.debug("Auto-detected thinking-mode echo-back protocol from API response")

---

def _needs_thinking_reasoning_pad(self) -> bool:
    # Primary: dynamic detection from actual API behavior
    if getattr(self, '_requires_reasoning_echo', False):
        return True
    # Fallback: static provider-name checks for history replay / edge cases
    return (
        self._needs_deepseek_tool_reasoning()
        or self._needs_kimi_tool_reasoning()
        or self._needs_mimo_tool_reasoning()
    )
RAW_BUFFERClick to expand / collapse

Feature Request: Make reasoning_content Echo-Back Detection Dynamic (Protocol-Level, Not Provider-Name-Based)

The Problem

Currently, Hermes Agent detects which providers require reasoning_content echo-back (re-including reasoning_content on every assistant tool-call message in the API replay history) via hardcoded provider-name checks in three methods:

  • _needs_deepseek_tool_reasoning() — matches provider == "deepseek", "deepseek" in model, or base_url_host_matches(self.base_url, "api.deepseek.com")
  • _needs_kimi_tool_reasoning() — matches provider in a set, or base_url_host_matches against kimi/moonshot domains
  • _needs_mimo_tool_reasoning() — matches provider == "xiaomi", "mimo" in model, or base_url_host_matches against xiaomimimo.com domains

These are aggregated by _needs_thinking_reasoning_pad() which returns True if any of the three match (see run_agent.py lines ~10421–10476).

This approach has two major problems:

  1. Every new thinking-mode provider requires a code change. Qwen-TP (Alibaba Cloud's maas.aliyuncs.com), GLM, MiniMax, and any future provider that enforces the same protocol constraint all need a new hardcoded check.

  2. It fails for users behind custom API gateways / proxies. Many users in China (and elsewhere) route through reverse proxies, model-as-a-service gateways (e.g. Qwen-TP on maas.aliyuncs.com), or custom OpenAI-compatible backends. The code matches by provider name or known domain, not by actual API behavior. A user running qwen3.6-plus through maas.aliyuncs.com with thinking mode enabled gets HTTP 400 errors because reasoning_content is not echoed back — but the code has no rule for aliyuncs.com.

The Root Cause

The reasoning_content echo-back requirement is not a per-provider quirk. It is a protocol-level constraint inherent to any thinking-mode API backend that returns reasoning_content in the first streaming chunk / response. The semantics are:

If the API response includes reasoning_content in the first assistant message, all subsequent assistant tool-call messages in that session MUST also include reasoning_content.

This is how DeepSeek V4 thinking mode, Kimi/Moonshot thinking, Xiaomi MiMo thinking, and Qwen thinking mode all behave. It's the natural constraint of any stateful reasoning API that tracks reasoning state across turns.

Proposed Solution

Replace the current provider-name-based detection with a dynamic, response-pattern-based detection that works for any provider automatically:

  1. Detect at first API response: When the first assistant message in a conversation turn has a non-null reasoning_content field (either via SDK attribute or model_extra), set a session-level flag like self._requires_reasoning_echo = True.

  2. Apply globally for the session: Once set, all subsequent assistant tool-call messages in that conversation automatically include reasoning_content padding (the existing space-padding logic from _copy_reasoning_content_for_api).

  3. Keep provider-name checks as fallback: The existing hardcoded checks (_needs_deepseek_tool_reasoning, etc.) remain as a safety net for edge cases where:

    • The first response's reasoning_content might not be present due to streaming quirks
    • The user is replaying old history on a new provider that needs echo-back
    • The current _copy_reasoning_content_for_api logic in the replay path can still use the static checks
  4. Simplify to a single combined method: _needs_thinking_reasoning_pad() would then check (self._requires_reasoning_echo or any(safety_fallback_checks)).

Detailed Implementation Sketch

In _build_assistant_message() (around line 10294), when processing the first response from the API:

# After extracting raw_reasoning_content...
if raw_reasoning_content is not None:
    msg["reasoning_content"] = _sanitize_surrogates(raw_reasoning_content)
    # DYNAMIC ECHO DETECTION: if we haven't set this flag yet,
    # the API just told us it uses reasoning_content echo protocol
    if not hasattr(self, '_requires_reasoning_echo') or not self._requires_reasoning_echo:
        self._requires_reasoning_echo = True
        logger.debug("Auto-detected thinking-mode echo-back protocol from API response")

Then update _needs_thinking_reasoning_pad():

def _needs_thinking_reasoning_pad(self) -> bool:
    # Primary: dynamic detection from actual API behavior
    if getattr(self, '_requires_reasoning_echo', False):
        return True
    # Fallback: static provider-name checks for history replay / edge cases
    return (
        self._needs_deepseek_tool_reasoning()
        or self._needs_kimi_tool_reasoning()
        or self._needs_mimo_tool_reasoning()
    )

Why This Matters

  • Qwen-TP users (Alibaba Cloud's model-as-a-service at maas.aliyuncs.com) get thinking-mode support without any code changes to Hermes Agent.
  • Custom gateway users behind OpenAI-compatible reverse proxies automatically get the right behavior.
  • Future providers don't require a pull request to add a new _needs_*_tool_reasoning() method.
  • Self-hosted / enterprise deployments with custom thinking backends "just work."

Related Code Locations

  • run_agent.py lines 10280–10350: _build_assistant_message() — where reasoning_content is written to messages
  • run_agent.py lines 10421–10476: Current _needs_* methods
  • run_agent.py lines 10479–10530: _copy_reasoning_content_for_api() — the replay / echo path
  • Issue refs mentioned in code: #15250 (DeepSeek), #17400 (Kimi/Moonshot), #17341 (empty string fix), #15748 (cross-provider poisoning), #16844/#16884 (streaming fallback)

Affected Users

  • Anyone using Qwen thinking models via Alibaba Cloud DashScope / Qwen-TP (maas.aliyuncs.com)
  • Users behind custom API gateways that proxy thinking-mode models (common in China corporate environments)
  • Future users of any new thinking-mode provider that enforces the same protocol

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat: make reasoning_content echo-back detection dynamic (protocol-level, not provider-name-based) [1 pull requests]