hermes - ✅(Solved) Fix [Bug]: HTTP 400 "reasoning_content must be passed back" with deepseek-v4-pro in cron/auxiliary path (thinking mode works in main loop, breaks elsewhere) [4 pull requests, 6 comments, 6 participants]

hermes2026-04-24 15:15:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15213•Fetched 2026-04-25 06:23:47

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×6cross-referenced ×6labeled ×5subscribed ×2

Error Message

2026-04-24 16:33:46 INFO cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…) 2026-04-24 16:33:47 INFO agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro) 2026-04-24 16:33:47 INFO agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro) 2026-04-24 16:41:22 INFO agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro) 2026-04-24 16:41:38 INFO agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro) 2026-04-24 16:48:44 INFO agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com 2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error: Error code: 400 - {'error': {'message': 'The reasoning_content in the thinking mode must be passed back to the API.', 'type': 'invalid_request_error', 'code': 'invalid_request_error'}}

Note the 6-minute gap between the last auxiliary call (title_generation at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.

Two-turn round-trip test (model deepseek-v4-pro, thinking enabled) with history user → assistant(+reasoning_content) → user returns HTTP 200. So _copy_reasoning_content_for_api itself works — the issue is elsewhere in the request-assembly chain.

agent/auxiliary_client.py — auxiliary tasks (title_generation, vision_analyze, session_search) assemble their own minimal messages payload and may not carry reasoning_content even when the selected model requires it. Likely related: #9571 (GLM 5.1 title_generation produces empty content because reasoning eats the max_tokens: 30 budget).
Context compressor (agent/context_engine.py) — when rebuilding assistant messages from summaries, tool_calls may survive while the matching reasoning_content is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be thinking."

#9571 — title_generation breaks on reasoning model (GLM 5.1), same auxiliary path
#11096 — HTTP 400 on compressed assistant messages (Anthropic extended thinking)
#13927 — HTTP 400 with OpenRouter when tools are enabled

Root Cause

agent/auxiliary_client.py — auxiliary tasks (title_generation, vision_analyze, session_search) assemble their own minimal messages payload and may not carry reasoning_content even when the selected model requires it. Likely related: #9571 (GLM 5.1 title_generation produces empty content because reasoning eats the max_tokens: 30 budget).
Context compressor (agent/context_engine.py) — when rebuilding assistant messages from summaries, tool_calls may survive while the matching reasoning_content is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be thinking."

Fix Action

Fixed

Fixed by PR: fix: support DeepSeek v4-flash/v4-pro thinking mode toggle and fix reasoning_content 400 error (https://github.com/NousResearch/hermes-agent/pull/15228)
Fixed by PR: fix(deepseek): inject empty reasoning_content on replay for OpenRouter DeepSeek (https://github.com/NousResearch/hermes-agent/pull/15325)
Fixed by PR: fix: add DeepSeek reasoning_content echo for tool-call messages (fixes #15353) (https://github.com/NousResearch/hermes-agent/pull/15354)
Fixed by PR: fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages (https://github.com/NousResearch/hermes-agent/pull/15478)

PR fix notes

PR #15228: fix: support DeepSeek v4-flash/v4-pro thinking mode toggle and fix reasoning_content 400 error

Repository: NousResearch/hermes-agent
Author: ruxme
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15228

Description (problem / solution / changelog)

Summary

Two fixes for DeepSeek direct API (api.deepseek.com) users who use deepseek-v4-flash / deepseek-v4-pro with thinking mode.

Changes

1. DeepSeek thinking mode toggle (`is_deepseek` flag + `thinking` extra_body)

DeepSeek v4-flash/v4-pro uses a different API format than OpenRouter for controlling thinking mode. While OpenRouter uses extra_body["reasoning"] = {"enabled": true, "effort": "medium"}, DeepSeek's native API expects:

{"thinking": {"type": "enabled"}}
{"thinking": {"type": "disabled"}}

This PR adds an is_deepseek flag threaded through run_agent.py -> chat_completions.py, which injects the correct thinking parameter. The thinking mode is controlled by the existing reasoning_effort config:

`reasoning_effort`	Result
`''` (empty) / `'medium'`	`thinking: {"type": "enabled"}`
`'none'`	`thinking: {"type": "disabled"}`

2. Fix `reasoning_content` HTTP 400 error on multi-turn conversations

When thinking mode is enabled, DeepSeek requires reasoning_content on every assistant message, including plain-text responses. The existing _copy_reasoning_content_for_api() only patched tool-call messages (source_msg.get("tool_calls")), causing HTTP 400 errors on multi-turn conversations where the final assistant response has no tool calls.

Fix: Remove the and source_msg.get("tool_calls") condition for the DeepSeek branch, so reasoning_content is set (defaulting to empty string) on all assistant messages.

Files changed

run_agent.py — _is_deepseek flag, pass is_deepseek to transport, fix _copy_reasoning_content_for_api for all assistant messages
agent/transports/chat_completions.py — accept is_deepseek param, inject thinking extra_body

Testing

Verified DeepSeek API accepts thinking: {"type": "enabled"} → returns reasoning_content ✅
Verified DeepSeek API accepts thinking: {"type": "disabled"} → no reasoning_content ✅
Verified multi-turn conversations no longer get 400 errors ✅

Changed files

agent/transports/chat_completions.py (modified, +16/-2)
run_agent.py (modified, +21/-4)

PR #15325: fix(deepseek): inject empty reasoning_content on replay for OpenRouter DeepSeek

Repository: NousResearch/hermes-agent
Author: ukint-vs
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15325

Description (problem / solution / changelog)

DeepSeek v4 in thinking mode 400s on multi-turn replay when any assistant message in history lacks reasoning_content:

HTTP 400: The reasoning_content in the thinking mode must be passed back to the API.

Hits three paths:

Compressed histories (compressor synthesizes assistant messages without the field).
Sessions persisted before thinking-mode support.
The _handle_max_iterations summary turn, which builds its own api_messages and was missing the _copy_reasoning_content_for_api call.

This PR covers OpenRouter-routed DeepSeek (deepseek/*) — #15228 handles direct api.deepseek.com; they co-exist.

Fix

run_agent.py::_copy_reasoning_content_for_api — add a DeepSeek branch that injects reasoning_content="" as a placeholder. Detection mirrors the Kimi branch's shape: host is openrouter.ai with a deepseek/ model slug, OR host is api.deepseek.com. Deliberately scoped — a bare "deepseek" in model substring would wrongly fire on Bedrock, NIM, and third-party hosts we haven't validated.
run_agent.py::_handle_max_iterations — missing _copy_reasoning_content_for_api call in the summary-turn api_messages builder.
isinstance(dict) guard on reasoning_config mirrors line 7342.

Tests

New file tests/run_agent/test_deepseek_reasoning_content.py, 12 cases:

Happy path (enabled injects, disabled skips, non-deepseek skips)
Real reasoning wins over the default
reasoning_config variants (None, missing enabled, non-dict)
tool_calls turn (key divergence from Kimi branch — #15228)
Direct api.deepseek.com path
Regression: Bedrock deepseek.v3.2 + NIM deepseek-ai/deepseek-v3.2 must NOT be injected

Full suite: 994 passed.

Refs

Covers OpenRouter side of #15213
Complements #15228 (direct API)
Supersedes #14973 on modern SDKs (openai 2.x + pydantic 2.x with extra='allow' already exposes model_extra via __getattr__)

Tested on

macOS 15 · Python 3.11.15 · openai 2.30.0 · pydantic 2.12.5 · live OpenRouter DeepSeek V4 Flash traffic, no 400s after patch.

Three commits on the branch for reviewability — squash at merge is fine.

Changed files

run_agent.py (modified, +37/-0)
tests/run_agent/test_deepseek_reasoning_content.py (added, +231/-0)

PR #15354: fix: add DeepSeek reasoning_content echo for tool-call messages (fixes #15353)

Repository: NousResearch/hermes-agent
Author: chen1749144759
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15354

Description (problem / solution / changelog)

Summary

DeepSeek V4 thinking mode requires reasoning_content on every assistant message that includes tool_calls. When missing, replay causes HTTP 400.

Closes #15353 Related: #14938 #14933 #15213

Changes

1. Merge DeepSeek into `needs_tool_reasoning_echo` check

In _copy_reasoning_content_for_api(), replaced the Kimi-only detection with a combined check covering:

provider == "deepseek"
"deepseek" in model (case-insensitive)
api.deepseek.com base URL (custom provider)

This handles already-poisoned persisted sessions by injecting empty reasoning_content on replay.

2. Store `reasoning_content` on new tool-call messages

Added _needs_deepseek_tool_reasoning() helper method, wired into _build_assistant_message(). When a DeepSeek tool-call message is created without reasoning text (common for streaming tool-only turns), stores reasoning_content="" instead of omitting the field. Prevents future session poisoning at the source.

3. Fix `_handle_max_iterations` path

Added missing call to _copy_reasoning_content_for_api() in the max-iterations flush path. Previously only the main loop and flush_memories() had this call.

Test Plan

Verified in state.db: new tool-call messages store reasoning_content
Previously poisoned messages handled by replay fix
Tested on Rocky Linux 9.7 with deepseek-v4-pro via custom provider

Diff

1 file changed: run_agent.py (+30, -3)

Changed files

run_agent.py (modified, +30/-3)

PR #15478: fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages

Repository: NousResearch/hermes-agent
Author: yes999zc
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15478

Description (problem / solution / changelog)

Problem

DeepSeek V4 thinking mode requires reasoning_content on every assistant message, not just tool-call turns. The existing fix (#15250) only covered the tool-call path.

When an assistant message is a plain text reply (no tool_calls) and reasoning is empty, _copy_reasoning_content_for_api skips padding entirely, causing DeepSeek to reject the next request with:

The reasoning_content in the thinking mode must be passed back to the API.

Fix

Remove the source_msg.get("tool_calls") and guard in _copy_reasoning_content_for_api so all DeepSeek/Kimi assistant messages get reasoning_content="" when needed.

Changes

run_agent.py: broaden condition from tool_calls + provider to just provider
test_deepseek_reasoning_content_echo.py: update test to expect padding on plain assistant turns

Verification

pytest tests/run_agent/test_deepseek_reasoning_content_echo.py -v — 21/21 passed.

Fixes #15213

Changed files

run_agent.py (modified, +1/-1)
tests/run_agent/test_deepseek_reasoning_content_echo.py (modified, +3/-3)

Code Example

2026-04-24 16:33:46 INFO  cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:22 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:38 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:48:44 INFO  agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com
2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error:
  Error code: 400 - {'error': {'message': 'The `reasoning_content` in the thinking mode must be passed back to the API.',
                               'type': 'invalid_request_error', 'code': 'invalid_request_error'}}


Note the 6-minute gap between the last auxiliary call (`title_generation` at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.

Two-turn round-trip test (model `deepseek-v4-pro`, thinking enabled) with history `user → assistant(+reasoning_content) → user` returns HTTP 200. So `_copy_reasoning_content_for_api` itself works — the issue is elsewhere in the request-assembly chain.

1. **`agent/auxiliary_client.py`** — auxiliary tasks (`title_generation`, `vision_analyze`, `session_search`) assemble their own minimal `messages` payload and may not carry `reasoning_content` even when the selected model requires it. Likely related: #9571 (GLM 5.1 `title_generation` produces empty content because reasoning eats the `max_tokens: 30` budget).
2. **Context compressor** (`agent/context_engine.py`) — when rebuilding assistant messages from summaries, `tool_calls` may survive while the matching `reasoning_content` is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be `thinking`."

- #9571 — `title_generation` breaks on reasoning model (GLM 5.1), same auxiliary path
- #11096 — HTTP 400 on compressed assistant messages (Anthropic extended thinking)
- #13927 — HTTP 400 with OpenRouter when tools are enabled

---

RAW_BUFFERClick to expand / collapse

Bug Description

Hermes correctly preserves reasoning_content in the main loop (run_agent.py::_copy_reasoning_content_for_api). I verified this with a two-turn round-trip test against deepseek-v4-pro — it passes.

In production, however, a long-running cron job consistently fails with HTTP 400 reasoning_content must be passed back to the API after several auxiliary calls (title_generation, vision_analyze, auxiliary auto-detect). The error happens ~6 minutes after the last title_generation call, with no user intervention between.

This looks like the same passthrough is missing in auxiliary_client and/or the context compressor path, not in the main loop.

Steps to Reproduce

Set the main model to deepseek-v4-pro via custom provider (https://api.deepseek.com, api: openai-completions). Thinking is enabled by default on v4-pro.
Register a cron job that runs a non-trivial multi-step task (with at least one tool call + at least one vision/title/session-search auxiliary trigger).
Let the job run for 15–30 minutes.
Observe Non-retryable client error: Error code: 400 ... The reasoning_content in the thinking mode must be passed back to the API. in the log.

Expected Behavior

Either:

reasoning_content is preserved along the auxiliary / compression / cron paths the same way it is in the main loop, or
auxiliary calls default to thinking: disabled (they don't need CoT for title generation / vision descriptions / session search anyway).

Actual Behavior

HTTP 400 on a DeepSeek v4-pro call somewhere in the cron → auxiliary → main-loop chain; the session becomes "poisoned" and cannot recover without clearing history.

Affected Component

CLI (interactive chat)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

2026-04-24 16:33:46 INFO  cron.scheduler: Running job 'qwen-auto-setup' (d33d5e95…)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:33:47 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:22 INFO  agent.auxiliary_client: Vision auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:41:38 INFO  agent.auxiliary_client: Auxiliary auto-detect: using main provider custom (deepseek-v4-pro)
2026-04-24 16:48:44 INFO  agent.auxiliary_client: Auxiliary title_generation: using auto (deepseek-v4-pro) at https://api.deepseek.com
2026-04-24 16:55:18 ERROR [cron_d33d5e95…] root: Non-retryable client error:
  Error code: 400 - {'error': {'message': 'The `reasoning_content` in the thinking mode must be passed back to the API.',
                               'type': 'invalid_request_error', 'code': 'invalid_request_error'}}


Note the 6-minute gap between the last auxiliary call (`title_generation` at 16:48:44) and the 400 (at 16:55:18) — main-loop tool calls happened in between but are at DEBUG level, not in this excerpt. Full log can be attached on request.

Two-turn round-trip test (model `deepseek-v4-pro`, thinking enabled) with history `user → assistant(+reasoning_content) → user` returns HTTP 200. So `_copy_reasoning_content_for_api` itself works — the issue is elsewhere in the request-assembly chain.

1. **`agent/auxiliary_client.py`** — auxiliary tasks (`title_generation`, `vision_analyze`, `session_search`) assemble their own minimal `messages` payload and may not carry `reasoning_content` even when the selected model requires it. Likely related: #9571 (GLM 5.1 `title_generation` produces empty content because reasoning eats the `max_tokens: 30` budget).
2. **Context compressor** (`agent/context_engine.py`) — when rebuilding assistant messages from summaries, `tool_calls` may survive while the matching `reasoning_content` is lost. Analogous to #11096 for Anthropic extended thinking: "The final block in an assistant message cannot be `thinking`."

- #9571 — `title_generation` breaks on reasoning model (GLM 5.1), same auxiliary path
- #11096 — HTTP 400 on compressed assistant messages (Anthropic extended thinking)
- #13927 — HTTP 400 with OpenRouter when tools are enabled

Operating System

Ubuntu: 24.04.2

Python Version

3.11.8

Hermes Version

0.11.0

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

extent analysis

TL;DR

The issue can be fixed by ensuring that reasoning_content is properly passed through auxiliary tasks and the context compressor path.

Guidance

Review agent/auxiliary_client.py to ensure that auxiliary tasks (title_generation, vision_analyze, session_search) include reasoning_content in their payload when the selected model requires it.
Investigate the context compressor (agent/context_engine.py) to prevent loss of reasoning_content when rebuilding assistant messages from summaries.
Verify that the reasoning_content is correctly copied in the main loop (run_agent.py::_copy_reasoning_content_for_api) and that this functionality is extended to auxiliary tasks.
Check for similar issues in related components, such as those mentioned in #9571 and #11096.

Example

No code snippet is provided as the issue requires a review of the existing codebase rather than introducing new code.

Notes

The issue seems to be related to the handling of reasoning_content in auxiliary tasks and the context compressor. Ensuring that this content is properly passed through these components should resolve the HTTP 400 error.

Recommendation

Apply a workaround by modifying agent/auxiliary_client.py to include reasoning_content in auxiliary task payloads when necessary, and review the context compressor to prevent loss of this content. This should mitigate the issue until a permanent fix can be implemented.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Bug]: HTTP 400 "reasoning_content must be passed back" with deepseek-v4-pro in cron/auxiliary path (thinking mode works in main loop, breaks elsewhere) [4 pull requests, 6 comments, 6 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #15228: fix: support DeepSeek v4-flash/v4-pro thinking mode toggle and fix reasoning_content 400 error

Description (problem / solution / changelog)

Summary

Changes

1. DeepSeek thinking mode toggle (is_deepseek flag + thinking extra_body)

2. Fix reasoning_content HTTP 400 error on multi-turn conversations

Files changed

Testing

Changed files

PR #15325: fix(deepseek): inject empty reasoning_content on replay for OpenRouter DeepSeek

Description (problem / solution / changelog)

Fix

Tests

Refs

Tested on

Changed files

PR #15354: fix: add DeepSeek reasoning_content echo for tool-call messages (fixes #15353)

Description (problem / solution / changelog)

Summary

Changes

1. Merge DeepSeek into needs_tool_reasoning_echo check

2. Store reasoning_content on new tool-call messages

3. Fix _handle_max_iterations path

Test Plan

Diff

Changed files

PR #15478: fix: DeepSeek/Kimi thinking mode requires reasoning_content on ALL assistant messages

Description (problem / solution / changelog)

Problem

Fix

Changes

Verification

Changed files

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Debug Report

Operating System

Python Version

Hermes Version

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

1. DeepSeek thinking mode toggle (`is_deepseek` flag + `thinking` extra_body)

2. Fix `reasoning_content` HTTP 400 error on multi-turn conversations

1. Merge DeepSeek into `needs_tool_reasoning_echo` check

2. Store `reasoning_content` on new tool-call messages

3. Fix `_handle_max_iterations` path