hermes - 💡(How to fix) Fix [Feature]: per-platform LLM request_overrides (extra_body / reasoning_effort / service_tier) [1 pull requests]

hermes2026-05-28 17:24:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fixed

Fixed by PR: feat(agent): per-platform request_overrides via platform_request_overrides (https://github.com/NousResearch/hermes-agent/pull/34007)

Code Example

platform_request_overrides:
  api_server:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
  telegram:
    reasoning_effort: high

RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes currently supports a single global custom_providers[].extra_body for a given base_url, but no way to vary OpenAI-compatible request fields per platform. When the same AIAgent configuration drives multiple surfaces (e.g. cli, telegram, api_server), every platform shares the same extra_body, reasoning_effort, and service_tier.

Concrete cases this leaves stranded:

A latency-sensitive api_server integration (custom HTTP frontend, mobile app, real-time UI) wants reasoning_effort: minimal while CLI keeps the default for deeper interactive work.
Hybrid-thinking models (Qwen3 / GLM-4.6 / Hunyuan family on llama.cpp / vLLM) accept chat_template_kwargs.enable_thinking — users may want thinking ON for CLI / messaging replies but OFF for an api_server endpoint behind a low-latency client. Measured on Qwen3.6-35B-A3B + llama.cpp: a one-tool-call turn drops from ~10.1s / 244 completion tokens to ~1.5s / 25 completion tokens when enable_thinking: false is sent only to api_server.
service_tier: priority only on cron-scheduled / batched workloads.

There is no current way to express any of these without (a) running a separate Hermes process per surface or (b) editing global config and degrading the others.

Proposed Solution

Add a new top-level config key platform_request_overrides (mirroring the shape of platform_toolsets) that lets users layer request_overrides-shaped dicts per platform:

platform_request_overrides:
  api_server:
    extra_body:
      chat_template_kwargs:
        enable_thinking: false
  telegram:
    reasoning_effort: high

Resolution order (high → low):

Caller-supplied request_overrides (highest — preserves the existing contract for auxiliary clients, kanban workers, delegated subagents).
platform_request_overrides[<platform>] (new layer).
custom_providers[].extra_body (existing global, low).

extra_body shallow-merges at the second level so a platform setting one nested key (chat_template_kwargs) doesn't erase siblings (reasoning_effort) that a custom-provider entry supplied. Top-level keys (service_tier, reasoning_effort) replace wholesale. No-op when the key is absent — backward-compatible for every existing config.

Scope: applies to the main conversational LLM call (OpenAI chat completions / Anthropic messages / Codex responses) for every platform that has a matching entry. Does NOT cover auxiliary model calls (those already have auxiliary.<task>.extra_body) or non-conversational tool requests (embeddings, image-gen, TTS / STT — those are tool implementations with their own config).

Alternatives Considered

Adding to PlatformConfig.extra (the gateway-side per-platform settings slot): wrong fit — request_overrides are an agent-init concern that applies equally to CLI runs without the gateway. The existing platform_toolsets precedent (also top-level, also keyed by platform name, also agent-affecting) is the closer analogue.
Per-base_url match on custom_providers[].model (the existing extension point): doesn't cover the case where multiple platforms hit the same base_url and need different params. Per-base_url and per-platform are orthogonal axes.
Per-call caller overrides only: works for tightly-controlled code paths but pushes config sprawl into every adapter. Adjacent feature requests (#32813, #31589) confirm users want a declarative YAML knob.

Feature Type

Configuration option

Scope

Medium (few files, < 300 lines)

Contribution

I'd like to implement this myself and submit a PR

Branch feat/platform-request-overrides is ready. 3 files changed (+396 LOC: +94 prod, +302 tests + docs). 18 new tests pass via scripts/run_tests.sh. Naming + architecture follow _custom_provider_extra_body_for_agent precedent.

Related work

Open PRs in adjacent territory reviewed before designing this:

#12427 — chat_template_kwargs.enable_thinking=false for llama.cpp / vLLM (the global version of the same knob this enables per-platform)
#31589 — provider-scoped agent.reasoning_effort overrides (same shape, scoped to provider rather than platform)
#32813 — per-auxiliary reasoning_effort configuration (same shape on the auxiliary axis)
#33329 — channel-model bindings (closest architectural precedent for per-surface LLM-param config)
#21554 — config-driven providers.<name>.extra_body override (related extension of the global path)
#33249 — api_server forward request extra_body (touches the same code area)
#29858 — custom-provider extra_body pass-through (merged baseline this PR extends)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering