hermes - 💡(How to fix) Fix API server ignores per-platform model config (no way to run api_server on a different model than the global default)

hermes2026-05-29 12:39:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

The API server platform (gateway/platforms/api_server.py) always uses the global model.default — there is no way to run the API server on a different (e.g. cheaper/faster) model than the rest of the gateway. _resolve_gateway_model() ignores any per-platform configuration.

This is a feature gap rather than a crash: operators who want, say, the HTTP API server on Sonnet while CLI/Discord stay on Opus have no supported knob.

Root Cause

This is a feature gap rather than a crash: operators who want, say, the HTTP API server on Sonnet while CLI/Discord stay on Opus have no supported knob.

Fix Action

Fix / Workaround

8 new regression tests in tests/test_empty_model_fallback.py::TestResolveGatewayModelPlatformOverride covering opt-in isolation (no platform= → global default), matching/non-matching platforms, bare-string and dict override shapes, and empty/missing/malformed platform_models. One existing monkeypatch in tests/gateway/test_api_server.py was widened from lambda: to lambda *a, **k: to accept the new optional arg.

Code Example

model = _resolve_gateway_model()

---

model:
  default: claude-opus-4-8
platform_models:
  api_server:
    default: claude-sonnet-4-6   # or a bare string: api_server: claude-sonnet-4-6

---

diff --git a/gateway/platforms/api_server.py b/gateway/platforms/api_server.py
index a18630f85..92585e6bd 100644
--- a/gateway/platforms/api_server.py
+++ b/gateway/platforms/api_server.py
@@ -963,9 +963,9 @@ class APIServerAdapter(BasePlatformAdapter):
 
         runtime_kwargs = _resolve_runtime_agent_kwargs()
         reasoning_config = GatewayRunner._load_reasoning_config()
-        model = _resolve_gateway_model()
 
         user_config = _load_gateway_config()
+        model = _resolve_gateway_model(user_config, platform="api_server")
         enabled_toolsets = sorted(_get_platform_tools(user_config, "api_server"))
 
         max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
diff --git a/gateway/run.py b/gateway/run.py
index a2e41c609..f82a276a1 100644
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -1443,14 +1443,39 @@ def _load_gateway_runtime_config() -> dict:
     return expanded if isinstance(expanded, dict) else {}
 
 
-def _resolve_gateway_model(config: dict | None = None) -> str:
+def _resolve_gateway_model(config: dict | None = None, platform: str | None = None) -> str:
     """Read model from config.yaml — single source of truth.
 
     Without this, temporary AIAgent instances (e.g. /compress) fall
     back to the hardcoded default which fails when the active provider is
     openai-codex.
+
+    Per-platform override (opt-in): when ``platform`` is supplied AND
+    ``platform_models.<platform>`` is set in config.yaml, that model wins
+    over the global ``model.default``. This lets a single platform (e.g.
+    the API server) run a cheaper/faster model without affecting any other
+    platform. Callers that omit ``platform`` — every existing call site —
+    are completely unaffected and resolve the global default as before.
+
+    The override value may be a bare model string, or a mapping with a
+    ``default`` (or ``model``) key. Any ``provider`` key in the mapping is
+    NOT consumed here — provider/credentials still come from the global
+    runtime config, so a platform override must name a model that works
+    with the active provider.
     """
     cfg = config if config is not None else _load_gateway_config()
+
+    if platform:
+        platform_models = cfg.get("platform_models")
+        if isinstance(platform_models, dict):
+            override = platform_models.get(platform)
+            if isinstance(override, str) and override:
+                return override
+            if isinstance(override, dict):
+                model = override.get("default") or override.get("model")
+                if model:
+                    return model
+
     model_cfg = cfg.get("model", {})
     if isinstance(model_cfg, str):
         return model_cfg

RAW_BUFFERClick to expand / collapse

Summary

This is a feature gap rather than a crash: operators who want, say, the HTTP API server on Sonnet while CLI/Discord stay on Opus have no supported knob.

Current behaviour

gateway/platforms/api_server.py::APIServerAdapter._create_agent resolves the model with:

model = _resolve_gateway_model()

_resolve_gateway_model(config=None) (in gateway/run.py) only ever reads model.default / model.model. There is no platform dimension, so every gateway platform that constructs a temporary agent shares one model.

Proposed fix

Add an opt-in platform parameter to _resolve_gateway_model(). When supplied and platform_models.<platform> exists in config.yaml, that model wins over model.default. Every existing call site omits the argument and is byte-for-byte unchanged — only api_server._create_agent opts in.

Config shape (additive, optional):

model:
  default: claude-opus-4-8
platform_models:
  api_server:
    default: claude-sonnet-4-6   # or a bare string: api_server: claude-sonnet-4-6

Note: provider/credentials still come from the global runtime config, so the override must name a model compatible with the active provider. (A future enhancement could thread a per-platform provider too.)

Diff

diff --git a/gateway/platforms/api_server.py b/gateway/platforms/api_server.py
index a18630f85..92585e6bd 100644
--- a/gateway/platforms/api_server.py
+++ b/gateway/platforms/api_server.py
@@ -963,9 +963,9 @@ class APIServerAdapter(BasePlatformAdapter):
 
         runtime_kwargs = _resolve_runtime_agent_kwargs()
         reasoning_config = GatewayRunner._load_reasoning_config()
-        model = _resolve_gateway_model()
 
         user_config = _load_gateway_config()
+        model = _resolve_gateway_model(user_config, platform="api_server")
         enabled_toolsets = sorted(_get_platform_tools(user_config, "api_server"))
 
         max_iterations = int(os.getenv("HERMES_MAX_ITERATIONS", "90"))
diff --git a/gateway/run.py b/gateway/run.py
index a2e41c609..f82a276a1 100644
--- a/gateway/run.py
+++ b/gateway/run.py
@@ -1443,14 +1443,39 @@ def _load_gateway_runtime_config() -> dict:
     return expanded if isinstance(expanded, dict) else {}
 
 
-def _resolve_gateway_model(config: dict | None = None) -> str:
+def _resolve_gateway_model(config: dict | None = None, platform: str | None = None) -> str:
     """Read model from config.yaml — single source of truth.
 
     Without this, temporary AIAgent instances (e.g. /compress) fall
     back to the hardcoded default which fails when the active provider is
     openai-codex.
+
+    Per-platform override (opt-in): when ``platform`` is supplied AND
+    ``platform_models.<platform>`` is set in config.yaml, that model wins
+    over the global ``model.default``. This lets a single platform (e.g.
+    the API server) run a cheaper/faster model without affecting any other
+    platform. Callers that omit ``platform`` — every existing call site —
+    are completely unaffected and resolve the global default as before.
+
+    The override value may be a bare model string, or a mapping with a
+    ``default`` (or ``model``) key. Any ``provider`` key in the mapping is
+    NOT consumed here — provider/credentials still come from the global
+    runtime config, so a platform override must name a model that works
+    with the active provider.
     """
     cfg = config if config is not None else _load_gateway_config()
+
+    if platform:
+        platform_models = cfg.get("platform_models")
+        if isinstance(platform_models, dict):
+            override = platform_models.get(platform)
+            if isinstance(override, str) and override:
+                return override
+            if isinstance(override, dict):
+                model = override.get("default") or override.get("model")
+                if model:
+                    return model
+
     model_cfg = cfg.get("model", {})
     if isinstance(model_cfg, str):
         return model_cfg

Tests

Result with the fix: full tests/gateway/test_api_server.py, tests/gateway/test_api_server_toolset.py, and tests/test_empty_model_fallback.py pass (182 passed), and the other _resolve_gateway_model consumers (compress/fast/discord/session_info — 31 tests) are unaffected.

(Heads up: that test suite leaks file descriptors via aiohttp test apps and hits OSError: [Errno 24] Too many open files under a low ulimit -n. Raising ulimit -n 4096 makes it green; unrelated to this change but worth a separate look.)

Environment

Hermes Agent, local checkout of main
Python 3.11
macOS 26.5

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix API server ignores per-platform model config (no way to run api_server on a different model than the global default)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Summary

Current behaviour

Proposed fix

Diff

Tests

Environment

Still need to ship something?

TRENDING