hermes - 💡(How to fix) Fix Support provider-scoped agent.reasoning

Code Example

agent:
  reasoning_effort: none | minimal | low | medium | high | xhigh

---

agent:
    reasoning_effort: none

---

agent:
    reasoning_effort: medium

---

agent:
  reasoning_effort: medium       # default for routes without an override
  reasoning_effort_by_provider:
    custom: none                 # local/vLLM/Ollama-compatible custom endpoints
    openai-codex: medium
    openai: medium

---

agent:
  reasoning_effort: medium
  reasoning_overrides:
    - provider: custom
      base_url: https://local-vllm.example.com/v1
      model_pattern: '^qwen3|^qwen36|qwen3\.6'
      reasoning_effort: none
    - provider: openai-codex
      reasoning_effort: medium

Feature Description

Add a way to configure reasoning effort per provider/model route instead of only as a single profile-wide agent.reasoning_effort value.

Today Hermes reads one global profile setting:

agent:
  reasoning_effort: none | minimal | low | medium | high | xhigh

That works when a profile always uses one provider, but it becomes awkward for profiles that switch between providers or use a fallback chain with mixed reasoning semantics.

Motivation

Different providers need different reasoning behavior:

Qwen3/Qwen3.6 served through vLLM/custom endpoints often needs thinking disabled for normal agent usage to avoid visible <think>/thinking leakage, extra token use, and tool/output parsing issues. For these routes we want:
```
agent:
  reasoning_effort: none
```
and provider-specific request extras like think: false / chat_template_kwargs.enable_thinking: false.
OpenAI/Codex reasoning is provider-native/hidden/structured. For Codex as the main model we generally want a normal reasoning budget, e.g.:
```
agent:
  reasoning_effort: medium
```

With the current global setting, a profile configured for Qwen/vLLM primary and Codex fallback has to choose one value. If it uses none for Qwen safety, Codex fallback or a later switch back to Codex primary inherits disabled reasoning. If it uses medium for Codex, Qwen thinking may be enabled unless custom-provider logic compensates.

Proposed Solution

Support provider-scoped and/or route-scoped reasoning overrides, with a clear precedence order.

Example shape:

agent:
  reasoning_effort: medium       # default for routes without an override
  reasoning_effort_by_provider:
    custom: none                 # local/vLLM/Ollama-compatible custom endpoints
    openai-codex: medium
    openai: medium

Potential model/route-specific variant:

agent:
  reasoning_effort: medium
  reasoning_overrides:
    - provider: custom
      base_url: https://local-vllm.example.com/v1
      model_pattern: '^qwen3|^qwen36|qwen3\.6'
      reasoning_effort: none
    - provider: openai-codex
      reasoning_effort: medium

Suggested precedence:

Explicit per-request/slash-command override, if present
Model/route-specific override
Provider-specific override
Existing global agent.reasoning_effort
Provider default

Implementation Notes

The agent currently parses CLI_CONFIG["agent"].get("reasoning_effort", "") once into self.reasoning_config and passes that into AIAgent.
Runtime provider resolution already knows the effective provider/model/base URL. That is probably the right point to resolve the effective reasoning config for the selected route.
Fallback attempts should resolve reasoning against the fallback provider/model, not blindly inherit the primary route's reasoning setting.
Custom/Qwen handling should remain provider/model-specific. Qwen/vLLM chat_template_kwargs.enable_thinking=false should not be sent to OpenAI/Codex providers.

Alternatives Considered

Manually editing agent.reasoning_effort every time a profile switches between Qwen/vLLM and Codex. This works but is easy to forget and creates surprising behavior after provider switches.
Having the custom provider always disable Qwen thinking independent of agent.reasoning_effort. That is safer for Qwen, but it removes user control and still does not solve Codex fallback inheriting none.
Separate profiles per provider. This avoids mixed config, but defeats the value of profile fallback chains and quick provider switching.

Acceptance Criteria

A profile can use Qwen/vLLM/custom with reasoning disabled while using OpenAI/Codex with reasoning enabled in the same config.
Switching primary provider or falling back to another provider recomputes the effective reasoning config for that provider/model route.
Existing configs with only agent.reasoning_effort keep current behavior.
Qwen/vLLM-specific thinking controls are only emitted for relevant custom/Qwen routes, not for OpenAI/Codex.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Support provider-scoped agent.reasoning_effort overrides

Recommended Tools

GitHub issue graph ai analysis