hermes - 💡(How to fix) Fix Support provider-scoped agent.reasoning_effort overrides

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

agent:
  reasoning_effort: none | minimal | low | medium | high | xhigh

---

agent:
    reasoning_effort: none

---

agent:
    reasoning_effort: medium

---

agent:
  reasoning_effort: medium       # default for routes without an override
  reasoning_effort_by_provider:
    custom: none                 # local/vLLM/Ollama-compatible custom endpoints
    openai-codex: medium
    openai: medium

---

agent:
  reasoning_effort: medium
  reasoning_overrides:
    - provider: custom
      base_url: https://local-vllm.example.com/v1
      model_pattern: '^qwen3|^qwen36|qwen3\.6'
      reasoning_effort: none
    - provider: openai-codex
      reasoning_effort: medium
RAW_BUFFERClick to expand / collapse

Feature Description

Add a way to configure reasoning effort per provider/model route instead of only as a single profile-wide agent.reasoning_effort value.

Today Hermes reads one global profile setting:

agent:
  reasoning_effort: none | minimal | low | medium | high | xhigh

That works when a profile always uses one provider, but it becomes awkward for profiles that switch between providers or use a fallback chain with mixed reasoning semantics.

Motivation

Different providers need different reasoning behavior:

  • Qwen3/Qwen3.6 served through vLLM/custom endpoints often needs thinking disabled for normal agent usage to avoid visible <think>/thinking leakage, extra token use, and tool/output parsing issues. For these routes we want:

    agent:
      reasoning_effort: none

    and provider-specific request extras like think: false / chat_template_kwargs.enable_thinking: false.

  • OpenAI/Codex reasoning is provider-native/hidden/structured. For Codex as the main model we generally want a normal reasoning budget, e.g.:

    agent:
      reasoning_effort: medium

With the current global setting, a profile configured for Qwen/vLLM primary and Codex fallback has to choose one value. If it uses none for Qwen safety, Codex fallback or a later switch back to Codex primary inherits disabled reasoning. If it uses medium for Codex, Qwen thinking may be enabled unless custom-provider logic compensates.

Proposed Solution

Support provider-scoped and/or route-scoped reasoning overrides, with a clear precedence order.

Example shape:

agent:
  reasoning_effort: medium       # default for routes without an override
  reasoning_effort_by_provider:
    custom: none                 # local/vLLM/Ollama-compatible custom endpoints
    openai-codex: medium
    openai: medium

Potential model/route-specific variant:

agent:
  reasoning_effort: medium
  reasoning_overrides:
    - provider: custom
      base_url: https://local-vllm.example.com/v1
      model_pattern: '^qwen3|^qwen36|qwen3\.6'
      reasoning_effort: none
    - provider: openai-codex
      reasoning_effort: medium

Suggested precedence:

  1. Explicit per-request/slash-command override, if present
  2. Model/route-specific override
  3. Provider-specific override
  4. Existing global agent.reasoning_effort
  5. Provider default

Implementation Notes

  • The agent currently parses CLI_CONFIG["agent"].get("reasoning_effort", "") once into self.reasoning_config and passes that into AIAgent.
  • Runtime provider resolution already knows the effective provider/model/base URL. That is probably the right point to resolve the effective reasoning config for the selected route.
  • Fallback attempts should resolve reasoning against the fallback provider/model, not blindly inherit the primary route's reasoning setting.
  • Custom/Qwen handling should remain provider/model-specific. Qwen/vLLM chat_template_kwargs.enable_thinking=false should not be sent to OpenAI/Codex providers.

Alternatives Considered

  • Manually editing agent.reasoning_effort every time a profile switches between Qwen/vLLM and Codex. This works but is easy to forget and creates surprising behavior after provider switches.
  • Having the custom provider always disable Qwen thinking independent of agent.reasoning_effort. That is safer for Qwen, but it removes user control and still does not solve Codex fallback inheriting none.
  • Separate profiles per provider. This avoids mixed config, but defeats the value of profile fallback chains and quick provider switching.

Acceptance Criteria

  • A profile can use Qwen/vLLM/custom with reasoning disabled while using OpenAI/Codex with reasoning enabled in the same config.
  • Switching primary provider or falling back to another provider recomputes the effective reasoning config for that provider/model route.
  • Existing configs with only agent.reasoning_effort keep current behavior.
  • Qwen/vLLM-specific thinking controls are only emitted for relevant custom/Qwen routes, not for OpenAI/Codex.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING