hermes - 💡(How to fix) Fix [Bug]: Fallback provider activation reuses stale api_messages — reasoning_content not injected for DeepSeek/Kimi thinking mode

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

📝 Error: HTTP 400: The reasoning_content in the thinking mode must be passed back to the API. ❌ Non-retryable client error (HTTP 400). Aborting.

Additional Logs / Traceback (optional)

Root Cause

The root cause is that api_messages are built before the inner retry loop, and the fallback activation's continue statement only returns to the top of that inner loop — it never re-enters the outer code path where _copy_reasoning_content_for_api() runs. The fallback provider receives messages constructed for the primary provider, missing the reasoning_content field that DeepSeek/Kimi/MiMo require.

Code Example

--- hermes dump ---
  version:          0.14.0 (2026.5.16) [(unknown)]
  os:               Linux 6.18.18-trim x86_64
  python:           3.13.5
  openai_sdk:       2.24.0
  profile:          default
  hermes_home:      /opt/data
  model:            gemini-3.5-flash
  provider:         self
  terminal:         ssh
  features:
    toolsets:           hermes-cli, browser
    mcp_servers:        0
    memory_provider:    built-in
    gateway:            running (docker (foreground), pid 6)
    platforms:          telegram, weixin
    cron_jobs:          2 active / 2 total
    skills:             115
  config_overrides:
    compression.threshold: 0.75
    display.streaming: True
    display.show_reasoning: True
    toolsets: ['hermes-cli', 'browser']
    fallback_providers: [{'provider': 'deepseek', 'model': 'deepseek-v4-flash'}]
  --- end dump ---

---
RAW_BUFFERClick to expand / collapse

Bug Description

When the primary model hits a rate limit (HTTP 429) and _try_activate_fallback() switches to a thinking-mode provider (e.g. deepseek-v4-flash), the fallback request immediately fails with HTTP 400:

The reasoning_content in the thinking mode must be passed back to the API.

The root cause is that api_messages are built before the inner retry loop, and the fallback activation's continue statement only returns to the top of that inner loop — it never re-enters the outer code path where _copy_reasoning_content_for_api() runs. The fallback provider receives messages constructed for the primary provider, missing the reasoning_content field that DeepSeek/Kimi/MiMo require.

Steps to Reproduce

  1. Configure a primary model behind a rate-limited provider (e.g. Nous Portal with 20 req/5min limit)
  2. Configure fallback_model or fallback_providers with a DeepSeek thinking-mode model (e.g. deepseek-v4-flash via api.deepseek.com)
  3. Run a multi-turn agent session with tool calls until the primary provider returns HTTP 429
  4. Observe: fallback activates, immediately fails with HTTP 400

Expected Behavior

After fallback activation, api_messages should be rebuilt from the internal messages list so that _copy_reasoning_content_for_api() runs with the new provider context. DeepSeek should receive reasoning_content: " " on all assistant tool-call messages (the existing padding logic in step 4 of _copy_reasoning_content_for_api).

Actual Behavior

⚠️ Rate limited — switching to fallback provider... 🔄 Primary model failed — switching to fallback: deepseek-v4-flash via deepseek ⚠️ API call failed (attempt 1/3): BadRequestError [HTTP 400] 🔌 Provider: deepseek Model: deepseek-v4-flash 📝 Error: HTTP 400: The reasoning_content in the thinking mode must be passed back to the API. ❌ Non-retryable client error (HTTP 400). Aborting.

Request dump confirms: 24 assistant messages with tool_calls, all missing reasoning_content.

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Debug Report

--- hermes dump ---
  version:          0.14.0 (2026.5.16) [(unknown)]
  os:               Linux 6.18.18-trim x86_64
  python:           3.13.5
  openai_sdk:       2.24.0
  profile:          default
  hermes_home:      /opt/data
  model:            gemini-3.5-flash
  provider:         self
  terminal:         ssh
  features:
    toolsets:           hermes-cli, browser
    mcp_servers:        0
    memory_provider:    built-in
    gateway:            running (docker (foreground), pid 6)
    platforms:          telegram, weixin
    cron_jobs:          2 active / 2 total
    skills:             115
  config_overrides:
    compression.threshold: 0.75
    display.streaming: True
    display.show_reasoning: True
    toolsets: ['hermes-cli', 'browser']
    fallback_providers: [{'provider': 'deepseek', 'model': 'deepseek-v4-flash'}]
  --- end dump ---

Operating System

Debian GNU/Linux 12 (bookworm), kernel 6.18.18-trim x86_64 (hermes-agent running in Docker container on Synology NAS)

Python Version

3.13.5

Hermes Version

0.14.0 (2026.5.16)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

The issue is in run_conversation() (run_agent.py):

Line ~12685: api_messages built HERE (outer scope)

api_messages = [] for idx, msg in enumerate(messages): api_msg = msg.copy() self._copy_reasoning_content_for_api(msg, api_msg) # ← uses self.provider at build time ... api_messages.append(api_msg)

Line ~12879: inner retry loop starts

while retry_count < max_retries: ... # Line ~14303: rate limit triggers fallback if self._try_activate_fallback(reason=classified.reason): retry_count = 0 continue # ← BUG: returns to inner loop top, reuses stale api_messages

When _try_activate_fallback() sets self.provider = "deepseek" and self.model = "deepseek-v4-flash", the continue goes back to line 12879 (inner retry loop top). The stale api_messages — built when self.provider was still the primary — are reused without reasoning_content.

Compare with restart_with_compressed_messages which correctly uses break to exit the inner loop, then continues the outer loop to rebuild api_messages.

All 6 call sites of _try_activate_fallback() inside the inner retry loop have this same bug (lines ~12906, 13142, 13212, 14307, 14640, 14711).

Proposed Fix (optional)

Replace continue with break + a flag at all 6 fallback activation sites, and handle the flag outside the inner loop (same pattern as restart_with_compressed_messages):

Inside inner retry loop (6 sites):

if self._try_activate_fallback(reason=...): retry_count = 0 compression_attempts = 0 primary_recovery_attempted = False restart_with_fallback_provider = True break # exit inner loop

Outside inner loop (after restart_with_compressed_messages handler):

if restart_with_fallback_provider: api_call_count -= 1 self.iteration_budget.refund() restart_with_fallback_provider = False continue # outer loop rebuilds api_messages with new provider context

This ensures _copy_reasoning_content_for_api() re-evaluates _needs_thinking_reasoning_pad() with the updated self.provider/self.model, injecting the required reasoning_content: " " padding for DeepSeek/Kimi/MiMo.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: Fallback provider activation reuses stale api_messages — reasoning_content not injected for DeepSeek/Kimi thinking mode