hermes - 💡(How to fix) Fix feat: make reasoning_content echo-back detection dynamic (protocol-level, not provider-name-based) [1 pull requests]

Root Cause

Qwen-TP users (Alibaba Cloud's model-as-a-service at maas.aliyuncs.com) get thinking-mode support without any code changes to Hermes Agent.
Custom gateway users behind OpenAI-compatible reverse proxies automatically get the right behavior.
Future providers don't require a pull request to add a new _needs_*_tool_reasoning() method.
Self-hosted / enterprise deployments with custom thinking backends "just work."

Code Example

# After extracting raw_reasoning_content...
if raw_reasoning_content is not None:
    msg["reasoning_content"] = _sanitize_surrogates(raw_reasoning_content)
    # DYNAMIC ECHO DETECTION: if we haven't set this flag yet,
    # the API just told us it uses reasoning_content echo protocol
    if not hasattr(self, '_requires_reasoning_echo') or not self._requires_reasoning_echo:
        self._requires_reasoning_echo = True
        logger.debug("Auto-detected thinking-mode echo-back protocol from API response")

---

def _needs_thinking_reasoning_pad(self) -> bool:
    # Primary: dynamic detection from actual API behavior
    if getattr(self, '_requires_reasoning_echo', False):
        return True
    # Fallback: static provider-name checks for history replay / edge cases
    return (
        self._needs_deepseek_tool_reasoning()
        or self._needs_kimi_tool_reasoning()
        or self._needs_mimo_tool_reasoning()
    )

Feature Request: Make `reasoning_content` Echo-Back Detection Dynamic (Protocol-Level, Not Provider-Name-Based)

The Problem

Currently, Hermes Agent detects which providers require reasoning_content echo-back (re-including reasoning_content on every assistant tool-call message in the API replay history) via hardcoded provider-name checks in three methods:

_needs_deepseek_tool_reasoning() — matches provider == "deepseek", "deepseek" in model, or base_url_host_matches(self.base_url, "api.deepseek.com")
_needs_kimi_tool_reasoning() — matches provider in a set, or base_url_host_matches against kimi/moonshot domains
_needs_mimo_tool_reasoning() — matches provider == "xiaomi", "mimo" in model, or base_url_host_matches against xiaomimimo.com domains

These are aggregated by _needs_thinking_reasoning_pad() which returns True if any of the three match (see run_agent.py lines ~10421–10476).

This approach has two major problems:

Every new thinking-mode provider requires a code change. Qwen-TP (Alibaba Cloud's maas.aliyuncs.com), GLM, MiniMax, and any future provider that enforces the same protocol constraint all need a new hardcoded check.
It fails for users behind custom API gateways / proxies. Many users in China (and elsewhere) route through reverse proxies, model-as-a-service gateways (e.g. Qwen-TP on maas.aliyuncs.com), or custom OpenAI-compatible backends. The code matches by provider name or known domain, not by actual API behavior. A user running qwen3.6-plus through maas.aliyuncs.com with thinking mode enabled gets HTTP 400 errors because reasoning_content is not echoed back — but the code has no rule for aliyuncs.com.

The Root Cause

The reasoning_content echo-back requirement is not a per-provider quirk. It is a protocol-level constraint inherent to any thinking-mode API backend that returns reasoning_content in the first streaming chunk / response. The semantics are:

If the API response includes reasoning_content in the first assistant message, all subsequent assistant tool-call messages in that session MUST also include reasoning_content.

This is how DeepSeek V4 thinking mode, Kimi/Moonshot thinking, Xiaomi MiMo thinking, and Qwen thinking mode all behave. It's the natural constraint of any stateful reasoning API that tracks reasoning state across turns.

Proposed Solution

Replace the current provider-name-based detection with a dynamic, response-pattern-based detection that works for any provider automatically:

Detect at first API response: When the first assistant message in a conversation turn has a non-null reasoning_content field (either via SDK attribute or model_extra), set a session-level flag like self._requires_reasoning_echo = True.
Apply globally for the session: Once set, all subsequent assistant tool-call messages in that conversation automatically include reasoning_content padding (the existing space-padding logic from _copy_reasoning_content_for_api).
Keep provider-name checks as fallback: The existing hardcoded checks (_needs_deepseek_tool_reasoning, etc.) remain as a safety net for edge cases where:
- The first response's reasoning_content might not be present due to streaming quirks
- The user is replaying old history on a new provider that needs echo-back
- The current _copy_reasoning_content_for_api logic in the replay path can still use the static checks
Simplify to a single combined method: _needs_thinking_reasoning_pad() would then check (self._requires_reasoning_echo or any(safety_fallback_checks)).

Detailed Implementation Sketch

In _build_assistant_message() (around line 10294), when processing the first response from the API:

# After extracting raw_reasoning_content...
if raw_reasoning_content is not None:
    msg["reasoning_content"] = _sanitize_surrogates(raw_reasoning_content)
    # DYNAMIC ECHO DETECTION: if we haven't set this flag yet,
    # the API just told us it uses reasoning_content echo protocol
    if not hasattr(self, '_requires_reasoning_echo') or not self._requires_reasoning_echo:
        self._requires_reasoning_echo = True
        logger.debug("Auto-detected thinking-mode echo-back protocol from API response")

Then update _needs_thinking_reasoning_pad():

def _needs_thinking_reasoning_pad(self) -> bool:
    # Primary: dynamic detection from actual API behavior
    if getattr(self, '_requires_reasoning_echo', False):
        return True
    # Fallback: static provider-name checks for history replay / edge cases
    return (
        self._needs_deepseek_tool_reasoning()
        or self._needs_kimi_tool_reasoning()
        or self._needs_mimo_tool_reasoning()
    )

Why This Matters

Qwen-TP users (Alibaba Cloud's model-as-a-service at maas.aliyuncs.com) get thinking-mode support without any code changes to Hermes Agent.
Custom gateway users behind OpenAI-compatible reverse proxies automatically get the right behavior.
Future providers don't require a pull request to add a new _needs_*_tool_reasoning() method.
Self-hosted / enterprise deployments with custom thinking backends "just work."

Related Code Locations

run_agent.py lines 10280–10350: _build_assistant_message() — where reasoning_content is written to messages
run_agent.py lines 10421–10476: Current _needs_* methods
run_agent.py lines 10479–10530: _copy_reasoning_content_for_api() — the replay / echo path
Issue refs mentioned in code: #15250 (DeepSeek), #17400 (Kimi/Moonshot), #17341 (empty string fix), #15748 (cross-provider poisoning), #16844/#16884 (streaming fallback)

Affected Users

Anyone using Qwen thinking models via Alibaba Cloud DashScope / Qwen-TP (maas.aliyuncs.com)
Users behind custom API gateways that proxy thinking-mode models (common in China corporate environments)
Future users of any new thinking-mode provider that enforces the same protocol

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix feat: make reasoning_content echo-back detection dynamic (protocol-level, not provider-name-based) [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

Code Example

Feature Request: Make `reasoning_content` Echo-Back Detection Dynamic (Protocol-Level, Not Provider-Name-Based)

The Problem

The Root Cause

Proposed Solution

Detailed Implementation Sketch

Why This Matters

Related Code Locations

Affected Users

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix feat: make reasoning_content echo-back detection dynamic (protocol-level, not provider-name-based) [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

Code Example

Feature Request: Make reasoning_content Echo-Back Detection Dynamic (Protocol-Level, Not Provider-Name-Based)

The Problem

The Root Cause

Proposed Solution

Detailed Implementation Sketch

Why This Matters

Related Code Locations

Affected Users

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Feature Request: Make `reasoning_content` Echo-Back Detection Dynamic (Protocol-Level, Not Provider-Name-Based)