hermes - 💡(How to fix) Fix Named custom provider stale_timeout_seconds ignored because runtime provider is normalized to `custom`

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When model.provider is a named custom provider such as custom:sub2api-openai, the configured non-stream stale timeout under providers.<name>.stale_timeout_seconds is ignored at runtime. Hermes falls back to the implicit 90s stale watchdog instead.

Root Cause

Named custom provider stale_timeout_seconds ignored because runtime provider is normalized to custom

Fix Action

Workaround

A profile-level env override works around this:

HERMES_API_CALL_STALE_TIMEOUT=150

After adding that env var and restarting the profile gateway, the resolved stale timeout base becomes 150.0 again.

Code Example

model:
  default: gpt-5.4
  provider: custom:sub2api-openai
providers:
  sub2api-openai:
    base_url: https://sub2api.tegical.com/v1
    api_mode: codex_responses
    stale_timeout_seconds: 150

---

Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.4
⚠️ No response from provider for 90s (non-streaming, model: gpt-5.4). Aborting call.

---

{
  "provider": "custom",
  "requested_provider": "custom:sub2api-openai",
  "api_mode": "codex_responses",
  "base_url": "https://sub2api.tegical.com/v1"
}

---

HERMES_API_CALL_STALE_TIMEOUT=150
RAW_BUFFERClick to expand / collapse

Named custom provider stale_timeout_seconds ignored because runtime provider is normalized to custom

Summary

When model.provider is a named custom provider such as custom:sub2api-openai, the configured non-stream stale timeout under providers.<name>.stale_timeout_seconds is ignored at runtime. Hermes falls back to the implicit 90s stale watchdog instead.

Reproduction shape

Profile config:

model:
  default: gpt-5.4
  provider: custom:sub2api-openai
providers:
  sub2api-openai:
    base_url: https://sub2api.tegical.com/v1
    api_mode: codex_responses
    stale_timeout_seconds: 150

Observed runtime behavior in gateway/agent logs:

Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.4
⚠️ No response from provider for 90s (non-streaming, model: gpt-5.4). Aborting call.

This still happens after the profile gateway is restarted and the updated config is confirmed on disk.

What I verified

I verified all of the following on a live profile before filing this:

  1. config.yaml contains providers.sub2api-openai.stale_timeout_seconds: 150.
  2. The running profile loads the correct .env and config.yaml.
  3. The post-restart logs still show threshold 90s, so this is not just stale old log output.
  4. A direct runtime reproduction shows the provider identity is being normalized:
{
  "provider": "custom",
  "requested_provider": "custom:sub2api-openai",
  "api_mode": "codex_responses",
  "base_url": "https://sub2api.tegical.com/v1"
}
  1. Initializing AIAgent from that runtime payload yields a stale-timeout base of 90.0 with uses_implicit_default=true, even though the named provider config contains 150.

Suspected root cause

The stale-timeout lookup path uses the normalized runtime provider category (custom) instead of the configured named provider id (sub2api-openai).

Relevant code paths:

  • run_agent.py
    • _resolved_api_call_stale_timeout_base() calls get_provider_stale_timeout(self.provider, self.model)
  • hermes_cli/timeouts.py
    • get_provider_stale_timeout() looks up config["providers"][provider_id]
  • hermes_cli/runtime_provider.py
    • runtime resolution returns both:
      • provider: "custom"
      • requested_provider: "custom:sub2api-openai"
  • gateway/run.py
    • _resolve_runtime_agent_kwargs() forwards runtime.get("provider") into AIAgent, but does not preserve/forward requested_provider

Because of that, the lookup appears to search providers.custom instead of providers.sub2api-openai, so the configured stale timeout is missed and the agent falls back to the implicit 90s default.

Expected behavior

If the selected provider is a named custom provider, provider-scoped runtime settings like:

  • stale_timeout_seconds
  • likely also any other providers.<named-custom-provider>.* lookups in similar paths

should resolve against the named provider entry (sub2api-openai here), not the generic runtime category custom.

Related / possibly adjacent

This may be adjacent to, but not identical to:

  • #25249
  • #28869

Those are about custom-provider timeout handling/normalization, while this report is specifically about the runtime lookup path losing the named provider identity and then resolving provider-scoped config against custom instead of the configured named provider key.

Workaround

A profile-level env override works around this:

HERMES_API_CALL_STALE_TIMEOUT=150

After adding that env var and restarting the profile gateway, the resolved stale timeout base becomes 150.0 again.

Suggested fix directions

Any of these would solve the problem:

  1. Preserve a provider config id / requested provider id on the runtime payload all the way into AIAgent, and use that for provider-config lookups.
  2. Teach timeout/config lookup helpers to prefer the named custom provider id when provider == "custom" but requested_provider is available.
  3. More generally, avoid losing the distinction between:
    • runtime transport/provider category (custom)
    • persisted config provider key (sub2api-openai)

Impact

This makes named custom providers look correctly configured on disk while still timing out at the default 90s in production, which is confusing to diagnose and leads users to believe their provider-specific timeout settings are being honored when they are not.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If the selected provider is a named custom provider, provider-scoped runtime settings like:

  • stale_timeout_seconds
  • likely also any other providers.<named-custom-provider>.* lookups in similar paths

should resolve against the named provider entry (sub2api-openai here), not the generic runtime category custom.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING