hermes - ✅(Solved) Fix [Feature]: Native reasoning_effort support for NVIDIA NIM (integrate.api.nvidia.com) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#19883Fetched 2026-05-05 06:04:34
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×1

NVIDIA NIM (integrate.api.nvidia.com) hosts several reasoning-capable model families (DeepSeek V4 Pro/Flash, Kimi K2 Thinking, GPT-OSS 120B, Qwen3-thinking, Nemotron 3) that gate their <think> chain on the top-level reasoning_effort request field. Hermes already detects is_nvidia_nim in chat_completions.build_kwargs() (used for the max_tokens=16384 default), but does not propagate reasoning_effort to NVIDIA NIM the way it does for Kimi, TokenHub, and LM Studio.

Net effect: every NIM-hosted reasoning model is silently locked into "Non-think" mode in Hermes, even when the user has agent.reasoning_effort: high configured. The model returns reasoning_content: null and the agent loses the structural accuracy gain it was trained for.

Root Cause

NVIDIA NIM (integrate.api.nvidia.com) hosts several reasoning-capable model families (DeepSeek V4 Pro/Flash, Kimi K2 Thinking, GPT-OSS 120B, Qwen3-thinking, Nemotron 3) that gate their <think> chain on the top-level reasoning_effort request field. Hermes already detects is_nvidia_nim in chat_completions.build_kwargs() (used for the max_tokens=16384 default), but does not propagate reasoning_effort to NVIDIA NIM the way it does for Kimi, TokenHub, and LM Studio.

Net effect: every NIM-hosted reasoning model is silently locked into "Non-think" mode in Hermes, even when the user has agent.reasoning_effort: high configured. The model returns reasoning_content: null and the agent loses the structural accuracy gain it was trained for.

Fix Action

Fixed

PR fix notes

PR #19888: feat(transport): plumb reasoning_effort to NVIDIA NIM

Description (problem / solution / changelog)

Summary

NVIDIA NIM hosts reasoning-capable model families (DeepSeek V4 Pro/Flash, Kimi K2 Thinking, GPT-OSS 120B, Qwen3-thinking variants, Nemotron 3) that activate thinking mode via the top-level reasoning_effort request field. Hermes already detects is_nvidia_nim and uses it for max_tokens defaulting and extra_body assembly, but the reasoning_effort plumbing was never added — so every NIM-hosted reasoning model is silently stuck in Non-think mode.

This PR adds an is_nvidia_nim branch in chat_completions.build_kwargs() that mirrors the existing Kimi block. After this change, configuring agent.reasoning_effort: high actually reaches NIM, reasoning_content populates, and display.show_reasoning: true finally renders thinking blocks for these models.

Closes #19883.

NIM reasoning_effort contract (verified empirically)

Verified against https://integrate.api.nvidia.com/v1/chat/completions with deepseek-ai/deepseek-v4-flash:

Requestreasoning_content
(no field)null
reasoning_effort: "high"populated
reasoning_effort: "max"populated
extra_body: {thinking: {type:"max"}}null (NIM doesn't accept)

NIM accepts: low, medium, high, max. The nested extra_body.thinking toggle that DeepSeek's native API uses is not translated by NIM — only the top-level reasoning_effort works on this route.

Hermes → NIM effort mapping

Same vocabulary translation as #14958's DeepSeek-native mapping, applied here for the NIM route:

Hermes reasoning_effort→ NIM reasoning_effort
lowlow
mediummedium
highhigh
xhighmax
maxmax
(thinking disabled)omitted

Changes

  • agent/transports/chat_completions.py — new is_nvidia_nim reasoning_effort branch immediately after the existing is_kimi block, structurally identical for easy review (+22 lines).
  • tests/agent/transports/test_chat_completions.py — new TestChatCompletionsNvidiaNimReasoning covering default high, explicit effort levels, xhigh → max mapping, enabled: False opt-out, unknown-value fallback, and no-op for non-NIM routes (+77 lines).

Test plan

  • pytest tests/agent/transports/test_chat_completions.py::TestChatCompletionsNvidiaNimReasoning -v — 6/6 pass
  • pytest tests/agent/transports/test_chat_completions.py (full file, 67 tests) — all pass; existing Kimi/TokenHub/LM-Studio/etc. paths unaffected
  • End-to-end smoke test with deepseek-v4-flash on NIM — reasoning_content populates, Hermes' Reasoning block renders
  • Confirmed extra_body.thinking does NOT work on NIM (sanity check that the top-level reasoning_effort is the correct lever)

Related

  • #14958 — feat(deepseek): plumb reasoning_effort and thinking toggle to V4 API (covers the DeepSeek native API, complementary to this PR which covers the NIM hosting route)
  • #11243 — Same gap requested for Mistral AI's api.mistral.ai/v1
  • #18742 — Related symptom: Kimi K2.5 via aggregators with no max_tokens/reasoning_effort

Changed files

  • agent/transports/chat_completions.py (modified, +37/-0)
  • tests/agent/transports/test_chat_completions.py (modified, +120/-0)

Code Example

# ~/.hermes/config.yaml
model:
  default: deepseek-ai/deepseek-v4-flash
  provider: nvidia
agent:
  reasoning_effort: high
display:
  show_reasoning: true

---

hermes chat -Q -q "Solve: smallest prime greater than 17. Show your reasoning."

---

{ "role": "assistant", "reasoning_content": null, "content": "19" }
RAW_BUFFERClick to expand / collapse

Summary

NVIDIA NIM (integrate.api.nvidia.com) hosts several reasoning-capable model families (DeepSeek V4 Pro/Flash, Kimi K2 Thinking, GPT-OSS 120B, Qwen3-thinking, Nemotron 3) that gate their <think> chain on the top-level reasoning_effort request field. Hermes already detects is_nvidia_nim in chat_completions.build_kwargs() (used for the max_tokens=16384 default), but does not propagate reasoning_effort to NVIDIA NIM the way it does for Kimi, TokenHub, and LM Studio.

Net effect: every NIM-hosted reasoning model is silently locked into "Non-think" mode in Hermes, even when the user has agent.reasoning_effort: high configured. The model returns reasoning_content: null and the agent loses the structural accuracy gain it was trained for.

Reproduction

# ~/.hermes/config.yaml
model:
  default: deepseek-ai/deepseek-v4-flash
  provider: nvidia
agent:
  reasoning_effort: high
display:
  show_reasoning: true
hermes chat -Q -q "Solve: smallest prime greater than 17. Show your reasoning."

Inspect the resulting ~/.hermes/sessions/session_*.json:

{ "role": "assistant", "reasoning_content": null, "content": "19" }

reasoning_content is empty — Hermes never sent reasoning_effort to NIM.

Verification (raw API)

Calling https://integrate.api.nvidia.com/v1/chat/completions directly with the same model:

Request bodyreasoning_content
(no reasoning_effort)null
reasoning_effort: "high"populated (~100–200 chars)
reasoning_effort: "max"populated (~140–250 chars)
extra_body: {thinking: {type: "max"}}null (NIM rejects nested thinking toggle — only DeepSeek's native API accepts that)

So the activator on NIM is unambiguously the top-level reasoning_effort field.

Why Hermes' existing branches don't cover this

Looking at agent/transports/chat_completions.py::build_kwargs(), top-level reasoning_effort is currently emitted only for:

  • is_kimi (Kimi/Moonshot direct)
  • is_tokenhub (Tencent TokenHub)
  • is_lmstudio (LM Studio, gated by supports_reasoning)

is_nvidia_nim is a recognized flag (max_tokens default, extra_body assembly) but the reasoning_effort branch was never added.

Related work

  • #14958 (open) plumbs reasoning_effort for the DeepSeek native API (api.deepseek.com). It does not address the same model family hosted on NIM, which is a different base_url.
  • #11243 (open) requests the same for Mistral AI (api.mistral.ai/v1).
  • #18742 (open) reports a related symptom for Kimi K2.5 via aggregators.

Proposed fix

Mirror the existing is_kimi block one screen down — add is_nvidia_nim branch that emits top-level reasoning_effort from reasoning_config.effort, with sensible defaults (high) and enabled=False opt-out, accepting NIM's effort vocabulary (low / medium / high / max).

PR submitted alongside this issue.

extent analysis

TL;DR

To fix the issue, add a new branch in agent/transports/chat_completions.py::build_kwargs() to emit the top-level reasoning_effort field for is_nvidia_nim models.

Guidance

  • Identify the build_kwargs() function in agent/transports/chat_completions.py and locate the existing branches for is_kimi, is_tokenhub, and is_lmstudio.
  • Add a new branch for is_nvidia_nim that emits the top-level reasoning_effort field, using the reasoning_config.effort value with sensible defaults (high) and an enabled=False opt-out.
  • Ensure the new branch accepts NIM's effort vocabulary (low / medium / high / max).
  • Verify the fix by checking the reasoning_content field in the response from the NVIDIA NIM API.

Example

if is_nvidia_nim:
    kwargs['reasoning_effort'] = reasoning_config.effort

Notes

This fix only applies to the NVIDIA NIM API and does not address other APIs or models. The proposed fix is based on the existing implementation for other models and may require further testing and validation.

Recommendation

Apply the proposed workaround by adding the new branch in agent/transports/chat_completions.py::build_kwargs() to emit the top-level reasoning_effort field for is_nvidia_nim models. This will allow the NVIDIA NIM API to receive the correct reasoning_effort value and return the expected reasoning_content.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING