hermes - ✅(Solved) Fix [Feature]: Native reasoning_effort support for NVIDIA NIM (integrate.api.nvidia.com) [1 pull requests, 1 participants]

hermes2026-05-04 19:40:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#19883•Fetched 2026-05-05 06:04:34

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sherman-yang

Participants

sherman-yang

Timeline (top)

labeled ×4cross-referenced ×1

NVIDIA NIM (integrate.api.nvidia.com) hosts several reasoning-capable model families (DeepSeek V4 Pro/Flash, Kimi K2 Thinking, GPT-OSS 120B, Qwen3-thinking, Nemotron 3) that gate their <think> chain on the top-level reasoning_effort request field. Hermes already detects is_nvidia_nim in chat_completions.build_kwargs() (used for the max_tokens=16384 default), but does not propagate reasoning_effort to NVIDIA NIM the way it does for Kimi, TokenHub, and LM Studio.

Net effect: every NIM-hosted reasoning model is silently locked into "Non-think" mode in Hermes, even when the user has agent.reasoning_effort: high configured. The model returns reasoning_content: null and the agent loses the structural accuracy gain it was trained for.

Root Cause

Fix Action

Fixed

Fixed by PR: feat(transport): plumb reasoning_effort to NVIDIA NIM (https://github.com/NousResearch/hermes-agent/pull/19888)

PR fix notes

PR #19888: feat(transport): plumb reasoning_effort to NVIDIA NIM

Repository: NousResearch/hermes-agent
Author: sherman-yang
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/19888

Description (problem / solution / changelog)

Summary

NVIDIA NIM hosts reasoning-capable model families (DeepSeek V4 Pro/Flash, Kimi K2 Thinking, GPT-OSS 120B, Qwen3-thinking variants, Nemotron 3) that activate thinking mode via the top-level reasoning_effort request field. Hermes already detects is_nvidia_nim and uses it for max_tokens defaulting and extra_body assembly, but the reasoning_effort plumbing was never added — so every NIM-hosted reasoning model is silently stuck in Non-think mode.

This PR adds an is_nvidia_nim branch in chat_completions.build_kwargs() that mirrors the existing Kimi block. After this change, configuring agent.reasoning_effort: high actually reaches NIM, reasoning_content populates, and display.show_reasoning: true finally renders thinking blocks for these models.

Closes #19883.

NIM `reasoning_effort` contract (verified empirically)

Verified against https://integrate.api.nvidia.com/v1/chat/completions with deepseek-ai/deepseek-v4-flash:

Request	`reasoning_content`
(no field)	`null`
`reasoning_effort: "high"`	populated
`reasoning_effort: "max"`	populated
`extra_body: {thinking: {type:"max"}}`	`null` (NIM doesn't accept)

NIM accepts: low, medium, high, max. The nested extra_body.thinking toggle that DeepSeek's native API uses is not translated by NIM — only the top-level reasoning_effort works on this route.

Hermes → NIM effort mapping

Same vocabulary translation as #14958's DeepSeek-native mapping, applied here for the NIM route:

Hermes `reasoning_effort`	→ NIM `reasoning_effort`
`low`	`low`
`medium`	`medium`
`high`	`high`
`xhigh`	`max`
`max`	`max`
(thinking disabled)	omitted

Changes

agent/transports/chat_completions.py — new is_nvidia_nim reasoning_effort branch immediately after the existing is_kimi block, structurally identical for easy review (+22 lines).
tests/agent/transports/test_chat_completions.py — new TestChatCompletionsNvidiaNimReasoning covering default high, explicit effort levels, xhigh → max mapping, enabled: False opt-out, unknown-value fallback, and no-op for non-NIM routes (+77 lines).

Test plan

pytest tests/agent/transports/test_chat_completions.py::TestChatCompletionsNvidiaNimReasoning -v — 6/6 pass
pytest tests/agent/transports/test_chat_completions.py (full file, 67 tests) — all pass; existing Kimi/TokenHub/LM-Studio/etc. paths unaffected
End-to-end smoke test with deepseek-v4-flash on NIM — reasoning_content populates, Hermes' Reasoning block renders
Confirmed extra_body.thinking does NOT work on NIM (sanity check that the top-level reasoning_effort is the correct lever)

#14958 — feat(deepseek): plumb reasoning_effort and thinking toggle to V4 API (covers the DeepSeek native API, complementary to this PR which covers the NIM hosting route)
#11243 — Same gap requested for Mistral AI's api.mistral.ai/v1
#18742 — Related symptom: Kimi K2.5 via aggregators with no max_tokens/reasoning_effort

Changed files

agent/transports/chat_completions.py (modified, +37/-0)
tests/agent/transports/test_chat_completions.py (modified, +120/-0)

Code Example

# ~/.hermes/config.yaml
model:
  default: deepseek-ai/deepseek-v4-flash
  provider: nvidia
agent:
  reasoning_effort: high
display:
  show_reasoning: true

---

hermes chat -Q -q "Solve: smallest prime greater than 17. Show your reasoning."

---

{ "role": "assistant", "reasoning_content": null, "content": "19" }

RAW_BUFFERClick to expand / collapse

Summary

Reproduction

# ~/.hermes/config.yaml
model:
  default: deepseek-ai/deepseek-v4-flash
  provider: nvidia
agent:
  reasoning_effort: high
display:
  show_reasoning: true

hermes chat -Q -q "Solve: smallest prime greater than 17. Show your reasoning."

Inspect the resulting ~/.hermes/sessions/session_*.json:

{ "role": "assistant", "reasoning_content": null, "content": "19" }

reasoning_content is empty — Hermes never sent reasoning_effort to NIM.

Verification (raw API)

Calling https://integrate.api.nvidia.com/v1/chat/completions directly with the same model:

Request body	`reasoning_content`
(no `reasoning_effort`)	`null`
`reasoning_effort: "high"`	populated (~100–200 chars)
`reasoning_effort: "max"`	populated (~140–250 chars)
`extra_body: {thinking: {type: "max"}}`	`null` (NIM rejects nested thinking toggle — only DeepSeek's native API accepts that)

So the activator on NIM is unambiguously the top-level reasoning_effort field.

Why Hermes' existing branches don't cover this

Looking at agent/transports/chat_completions.py::build_kwargs(), top-level reasoning_effort is currently emitted only for:

is_kimi (Kimi/Moonshot direct)
is_tokenhub (Tencent TokenHub)
is_lmstudio (LM Studio, gated by supports_reasoning)

is_nvidia_nim is a recognized flag (max_tokens default, extra_body assembly) but the reasoning_effort branch was never added.

Related work

#14958 (open) plumbs reasoning_effort for the DeepSeek native API (api.deepseek.com). It does not address the same model family hosted on NIM, which is a different base_url.
#11243 (open) requests the same for Mistral AI (api.mistral.ai/v1).
#18742 (open) reports a related symptom for Kimi K2.5 via aggregators.

Proposed fix

Mirror the existing is_kimi block one screen down — add is_nvidia_nim branch that emits top-level reasoning_effort from reasoning_config.effort, with sensible defaults (high) and enabled=False opt-out, accepting NIM's effort vocabulary (low / medium / high / max).

PR submitted alongside this issue.

extent analysis

TL;DR

To fix the issue, add a new branch in agent/transports/chat_completions.py::build_kwargs() to emit the top-level reasoning_effort field for is_nvidia_nim models.

Guidance

Identify the build_kwargs() function in agent/transports/chat_completions.py and locate the existing branches for is_kimi, is_tokenhub, and is_lmstudio.
Add a new branch for is_nvidia_nim that emits the top-level reasoning_effort field, using the reasoning_config.effort value with sensible defaults (high) and an enabled=False opt-out.
Ensure the new branch accepts NIM's effort vocabulary (low / medium / high / max).
Verify the fix by checking the reasoning_content field in the response from the NVIDIA NIM API.

Example

if is_nvidia_nim:
    kwargs['reasoning_effort'] = reasoning_config.effort

Notes

This fix only applies to the NVIDIA NIM API and does not address other APIs or models. The proposed fix is based on the existing implementation for other models and may require further testing and validation.

Recommendation

Apply the proposed workaround by adding the new branch in agent/transports/chat_completions.py::build_kwargs() to emit the top-level reasoning_effort field for is_nvidia_nim models. This will allow the NVIDIA NIM API to receive the correct reasoning_effort value and return the expected reasoning_content.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #authentication issue #prompt issue #agent setup #task chaining

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix [Feature]: Native reasoning_effort support for NVIDIA NIM (integrate.api.nvidia.com) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #19888: feat(transport): plumb reasoning_effort to NVIDIA NIM

Description (problem / solution / changelog)

Summary

NIM reasoning_effort contract (verified empirically)

Hermes → NIM effort mapping

Changes

Test plan

Related

Changed files

Code Example

Summary

Reproduction

Verification (raw API)

Why Hermes' existing branches don't cover this

Related work

Proposed fix

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

NIM `reasoning_effort` contract (verified empirically)