hermes - ✅(Solved) Fix [Bug]: HTTP 400 "reasoning_content must be passed back" with deepseek-v4-pro in cron/auxiliary path (thinking mode works in main loop, breaks elsewhere) [4 pull requests, 6 comments, 6 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15213Fetched 2026-04-25 06:23:47
View on GitHub
Comments
6
Participants
6
Timeline
21
Reactions
0
Timeline (top)
commented ×6cross-referenced ×6labeled ×5subscribed ×2

Error Message

2026-04-24 16:33:46 INFO cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…) 2026-04-24 16:33:47 INFO agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro) 2026-04-24 16:33:47 INFO agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro) 2026-04-24 16:41:22 INFO agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro) 2026-04-24 16:41:38 INFO agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro) 2026-04-24 16:48:44 INFO agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com 2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error: Error code: 400 - {'error': {'message': 'The reasoning_content in the thinking mode must be passed back to the API.', 'type': 'invalid_request_error', 'code': 'invalid_request_error'}}

Note the 6-minute gap between the last auxiliary call (title_generation at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.

Two-turn round-trip test (model deepseek-v4-pro, thinking enabled) with history user → assistant(+reasoning_content) → user returns HTTP 200. So _copy_reasoning_content_for_api itself works — the issue is elsewhere in the request-assembly chain.

  1. agent/auxiliary_client.py — auxiliary tasks (title_generation, vision_analyze, session_search) assemble their own minimal messages payload and may not carry reasoning_content even when the selected model requires it. Likely related: #9571 (GLM 5.1 title_generation produces empty content because reasoning eats the max_tokens: 30 budget).
  2. Context compressor (agent/context_engine.py) — when rebuilding assistant messages from summaries, tool_calls may survive while the matching reasoning_content is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be thinking."
  • #9571 — title_generation breaks on reasoning model (GLM 5.1), same auxiliary path
  • #11096 — HTTP 400 on compressed assistant messages (Anthropic extended thinking)
  • #13927 — HTTP 400 with OpenRouter when tools are enabled

Root Cause

  1. agent/auxiliary_client.py — auxiliary tasks (title_generation, vision_analyze, session_search) assemble their own minimal messages payload and may not carry reasoning_content even when the selected model requires it. Likely related: #9571 (GLM 5.1 title_generation produces empty content because reasoning eats the max_tokens: 30 budget).
  2. Context compressor (agent/context_engine.py) — when rebuilding assistant messages from summaries, tool_calls may survive while the matching reasoning_content is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be thinking."

Fix Action

Fixed

PR fix notes

PR #15228: fix: support DeepSeek v4-flash/v4-pro thinking mode toggle and fix reasoning_content 400 error

Description (problem / solution / changelog)

Summary

Two fixes for DeepSeek direct API (api.deepseek.com) users who use deepseek-v4-flash / deepseek-v4-pro with thinking mode.

Changes

1. DeepSeek thinking mode toggle (is_deepseek flag + thinking extra_body)

DeepSeek v4-flash/v4-pro uses a different API format than OpenRouter for controlling thinking mode. While OpenRouter uses extra_body["reasoning"] = {"enabled": true, "effort": "medium"}, DeepSeek's native API expects:

{"thinking": {"type": "enabled"}}
{"thinking": {"type": "disabled"}}

This PR adds an is_deepseek flag threaded through run_agent.py -> chat_completions.py, which injects the correct thinking parameter. The thinking mode is controlled by the existing reasoning_effort config:

reasoning_effortResult
'' (empty) / 'medium'thinking: {"type": "enabled"}
'none'thinking: {"type": "disabled"}

2. Fix reasoning_content HTTP 400 error on multi-turn conversations

When thinking mode is enabled, DeepSeek requires reasoning_content on every assistant message, including plain-text responses. The existing _copy_reasoning_content_for_api() only patched tool-call messages (source_msg.get("tool_calls")), causing HTTP 400 errors on multi-turn conversations where the final assistant response has no tool calls.

Fix: Remove the and source_msg.get("tool_calls") condition for the DeepSeek branch, so reasoning_content is set (defaulting to empty string) on all assistant messages.

Files changed

  • run_agent.py_is_deepseek flag, pass is_deepseek to transport, fix _copy_reasoning_content_for_api for all assistant messages
  • agent/transports/chat_completions.py — accept is_deepseek param, inject thinking extra_body

Testing

  • Verified DeepSeek API accepts thinking: {"type": "enabled"} → returns reasoning_content
  • Verified DeepSeek API accepts thinking: {"type": "disabled"} → no reasoning_content
  • Verified multi-turn conversations no longer get 400 errors ✅

Changed files

  • agent/transports/chat_completions.py (modified, +16/-2)
  • run_agent.py (modified, +21/-4)

PR #15325: fix(deepseek): inject empty reasoning_content on replay for OpenRouter DeepSeek

Description (problem / solution / changelog)

DeepSeek v4 in thinking mode 400s on multi-turn replay when any assistant message in history lacks reasoning_content:

HTTP 400: The reasoning_content in the thinking mode must be passed back to the API.

Hits three paths:

  1. Compressed histories (compressor synthesizes assistant messages without the field).
  2. Sessions persisted before thinking-mode support.
  3. The _handle_max_iterations summary turn, which builds its own api_messages and was missing the _copy_reasoning_content_for_api call.

This PR covers OpenRouter-routed DeepSeek (deepseek/*) — #15228 handles direct api.deepseek.com; they co-exist.

Fix

  • run_agent.py::_copy_reasoning_content_for_api — add a DeepSeek branch that injects reasoning_content="" as a placeholder. Detection mirrors the Kimi branch's shape: host is openrouter.ai with a deepseek/ model slug, OR host is api.deepseek.com. Deliberately scoped — a bare "deepseek" in model substring would wrongly fire on Bedrock, NIM, and third-party hosts we haven't validated.
  • run_agent.py::_handle_max_iterations — missing _copy_reasoning_content_for_api call in the summary-turn api_messages builder.
  • isinstance(dict) guard on reasoning_config mirrors line 7342.

Tests

New file tests/run_agent/test_deepseek_reasoning_content.py, 12 cases:

  • Happy path (enabled injects, disabled skips, non-deepseek skips)
  • Real reasoning wins over the default
  • reasoning_config variants (None, missing enabled, non-dict)
  • tool_calls turn (key divergence from Kimi branch — #15228)
  • Direct api.deepseek.com path
  • Regression: Bedrock deepseek.v3.2 + NIM deepseek-ai/deepseek-v3.2 must NOT be injected

Full suite: 994 passed.

Refs

  • Covers OpenRouter side of #15213
  • Complements #15228 (direct API)
  • Supersedes #14973 on modern SDKs (openai 2.x + pydantic 2.x with extra='allow' already exposes model_extra via __getattr__)

Tested on

macOS 15 · Python 3.11.15 · openai 2.30.0 · pydantic 2.12.5 · live OpenRouter DeepSeek V4 Flash traffic, no 400s after patch.


Three commits on the branch for reviewability — squash at merge is fine.

Changed files

  • run_agent.py (modified, +37/-0)
  • tests/run_agent/test_deepseek_reasoning_content.py (added, +231/-0)

PR #15354: fix: add DeepSeek reasoning_content echo for tool-call messages (fixes #15353)

Description (problem / solution / changelog)

Summary

DeepSeek V4 thinking mode requires reasoning_content on every assistant message that includes tool_calls. When missing, replay causes HTTP 400.

Closes #15353 Related: #14938 #14933 #15213

Changes

1. Merge DeepSeek into needs_tool_reasoning_echo check

In _copy_reasoning_content_for_api(), replaced the Kimi-only detection with a combined check covering:

  • provider == "deepseek"
  • "deepseek" in model (case-insensitive)
  • api.deepseek.com base URL (custom provider)

This handles already-poisoned persisted sessions by injecting empty reasoning_content on replay.

2. Store reasoning_content on new tool-call messages

Added _needs_deepseek_tool_reasoning() helper method, wired into _build_assistant_message(). When a DeepSeek tool-call message is created without reasoning text (common for streaming tool-only turns), stores reasoning_content="" instead of omitting the field. Prevents future session poisoning at the source.

3. Fix _handle_max_iterations path

Added missing call to _copy_reasoning_content_for_api() in the max-iterations flush path. Previously only the main loop and flush_memories() had this call.

Test Plan

  • Verified in state.db: new tool-call messages store reasoning_content
  • Previously poisoned messages handled by replay fix
  • Tested on Rocky Linux 9.7 with deepseek-v4-pro via custom provider

Diff

1 file changed: run_agent.py (+30, -3)

Changed files

  • run_agent.py (modified, +30/-3)

PR #15478: fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages

Description (problem / solution / changelog)

Problem

DeepSeek V4 thinking mode requires reasoning_content on every assistant message, not just tool-call turns. The existing fix (#15250) only covered the tool-call path.

When an assistant message is a plain text reply (no tool_calls) and reasoning is empty, _copy_reasoning_content_for_api skips padding entirely, causing DeepSeek to reject the next request with:

The reasoning_content in the thinking mode must be passed back to the API.

Fix

Remove the source_msg.get("tool_calls") and guard in _copy_reasoning_content_for_api so all DeepSeek/Kimi assistant messages get reasoning_content="" when needed.

Changes

  • run_agent.py: broaden condition from tool_calls + provider to just provider
  • test_deepseek_reasoning_content_echo.py: update test to expect padding on plain assistant turns

Verification

pytest tests/run_agent/test_deepseek_reasoning_content_echo.py -v — 21/21 passed.

Fixes #15213

Changed files

  • run_agent.py (modified, +1/-1)
  • tests/run_agent/test_deepseek_reasoning_content_echo.py (modified, +3/-3)

Code Example

2026-04-24 16:33:46 INFO  cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:22 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:38 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:48:44 INFO  agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com
2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error:
  Error code: 400 - {'error': {'message': 'The `reasoning_content` in the thinking mode must be passed back to the API.',
                               'type': 'invalid_request_error', 'code': 'invalid_request_error'}}


Note the 6-minute gap between the last auxiliary call (`title_generation` at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.

Two-turn round-trip test (model `deepseek-v4-pro`, thinking enabled) with history `user → assistant(+reasoning_content) → user` returns HTTP 200. So `_copy_reasoning_content_for_api` itself works — the issue is elsewhere in the request-assembly chain.

1. **`agent/auxiliary_client.py`** — auxiliary tasks (`title_generation`, `vision_analyze`, `session_search`) assemble their own minimal `messages` payload and may not carry `reasoning_content` even when the selected model requires it. Likely related: #9571 (GLM 5.1 `title_generation` produces empty content because reasoning eats the `max_tokens: 30` budget).
2. **Context compressor** (`agent/context_engine.py`) — when rebuilding assistant messages from summaries, `tool_calls` may survive while the matching `reasoning_content` is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be `thinking`."

- #9571`title_generation` breaks on reasoning model (GLM 5.1), same auxiliary path
- #11096HTTP 400 on compressed assistant messages (Anthropic extended thinking)
- #13927HTTP 400 with OpenRouter when tools are enabled

---
RAW_BUFFERClick to expand / collapse

Bug Description

Hermes correctly preserves reasoning_content in the main loop (run_agent.py::_copy_reasoning_content_for_api). I verified this with a two-turn round-trip test against deepseek-v4-pro — it passes.

In production, however, a long-running cron job consistently fails with HTTP 400 reasoning_content must be passed back to the API after several auxiliary calls (title_generation, vision_analyze, auxiliary auto-detect). The error happens ~6 minutes after the last title_generation call, with no user intervention between.

This looks like the same passthrough is missing in auxiliary_client and/or the context compressor path, not in the main loop.

Steps to Reproduce

  1. Set the main model to deepseek-v4-pro via custom provider (https://api.deepseek.com, api: openai-completions). Thinking is enabled by default on v4-pro.
  2. Register a cron job that runs a non-trivial multi-step task (with at least one tool call + at least one vision/title/session-search auxiliary trigger).
  3. Let the job run for 15–30 minutes.
  4. Observe Non-retryable client error: Error code: 400 ... The reasoning_content in the thinking mode must be passed back to the API. in the log.

Expected Behavior

Either:

  • reasoning_content is preserved along the auxiliary / compression / cron paths the same way it is in the main loop, or
  • auxiliary calls default to thinking: disabled (they don't need CoT for title generation / vision descriptions / session search anyway).

Actual Behavior

HTTP 400 on a DeepSeek v4-pro call somewhere in the cron → auxiliary → main-loop chain; the session becomes "poisoned" and cannot recover without clearing history.

Affected Component

CLI (interactive chat)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

2026-04-24 16:33:46 INFO  cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:22 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:38 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:48:44 INFO  agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com
2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error:
  Error code: 400 - {'error': {'message': 'The `reasoning_content` in the thinking mode must be passed back to the API.',
                               'type': 'invalid_request_error', 'code': 'invalid_request_error'}}


Note the 6-minute gap between the last auxiliary call (`title_generation` at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.

Two-turn round-trip test (model `deepseek-v4-pro`, thinking enabled) with history `user → assistant(+reasoning_content) → user` returns HTTP 200. So `_copy_reasoning_content_for_api` itself works — the issue is elsewhere in the request-assembly chain.

1. **`agent/auxiliary_client.py`** — auxiliary tasks (`title_generation`, `vision_analyze`, `session_search`) assemble their own minimal `messages` payload and may not carry `reasoning_content` even when the selected model requires it. Likely related: #9571 (GLM 5.1 `title_generation` produces empty content because reasoning eats the `max_tokens: 30` budget).
2. **Context compressor** (`agent/context_engine.py`) — when rebuilding assistant messages from summaries, `tool_calls` may survive while the matching `reasoning_content` is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be `thinking`."

- #9571 — `title_generation` breaks on reasoning model (GLM 5.1), same auxiliary path
- #11096 — HTTP 400 on compressed assistant messages (Anthropic extended thinking)
- #13927 — HTTP 400 with OpenRouter when tools are enabled

Operating System

Ubuntu: 24.04.2

Python Version

3.11.8

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

The issue can be fixed by ensuring that reasoning_content is properly passed through auxiliary tasks and the context compressor path.

Guidance

  • Review agent/auxiliary_client.py to ensure that auxiliary tasks (title_generation, vision_analyze, session_search) include reasoning_content in their payload when the selected model requires it.
  • Investigate the context compressor (agent/context_engine.py) to prevent loss of reasoning_content when rebuilding assistant messages from summaries.
  • Verify that the reasoning_content is correctly copied in the main loop (run_agent.py::_copy_reasoning_content_for_api) and that this functionality is extended to auxiliary tasks.
  • Check for similar issues in related components, such as those mentioned in #9571 and #11096.

Example

No code snippet is provided as the issue requires a review of the existing codebase rather than introducing new code.

Notes

The issue seems to be related to the handling of reasoning_content in auxiliary tasks and the context compressor. Ensuring that this content is properly passed through these components should resolve the HTTP 400 error.

Recommendation

Apply a workaround by modifying agent/auxiliary_client.py to include reasoning_content in auxiliary task payloads when necessary, and review the context compressor to prevent loss of this content. This should mitigate the issue until a permanent fix can be implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: HTTP 400 "reasoning_content must be passed back" with deepseek-v4-pro in cron/auxiliary path (thinking mode works in main loop, breaks elsewhere) [4 pull requests, 6 comments, 6 participants]