hermes - 💡(How to fix) Fix [Bug]: Local claude-cli custom provider timeout is reported as Empty response and fallback loops [4 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

{"request_id":"1dcdfd75-...","model":"claude-opus-4-7","latency_s":127.411,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"} {"request_id":"e5743941-...","model":"claude-opus-4-7","latency_s":120.014,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}

  • surface a clear provider timeout error with the provider/model name and elapsed timeout, Hermes wraps the local provider timeout as an empty response, retries several times, and can fall back to the same claude-cli provider path. This looks like Claude produced an empty answer, but the real error is RuntimeError: claude CLI turn timed out from the shim.
  1. Preserve custom-provider timeout/errors as timeout/error classes rather than normalizing them into empty-content retries.

Fix Action

Fixed

Code Example

SUBPROCESS_TIMEOUT = 120
PROCESS_IDLE_TIMEOUT = 1800
...
content, usage, finish_reason, tool_calls = self._read_response(SUBPROCESS_TIMEOUT)
...
raise RuntimeError("claude CLI turn timed out")

---

{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true}
{"request_id":"1dcdfd75-...","model":"claude-opus-4-7","latency_s":127.411,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}
{"request_id":"170feabf-...","model":"claude-opus-4-7","latency_s":3.515,"status":"ok","prompt_tokens":6,"completion_tokens":46,"has_tool_calls":false}
...
{"event":"idle_evict","session_id":"38869d6c-...","idle_s":1800}
{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true}
{"request_id":"e5743941-...","model":"claude-opus-4-7","latency_s":120.014,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}
{"request_id":"2d5bfaed-...","model":"claude-opus-4-7","latency_s":81.105,"status":"ok","prompt_tokens":14,"completion_tokens":2951,"has_tool_calls":true}
{"request_id":"8d56ffbf-...","model":"claude-opus-4-7","latency_s":52.101,"status":"ok","prompt_tokens":9,"completion_tokens":2916,"has_tool_calls":true}
RAW_BUFFERClick to expand / collapse

Bug Description

When Hermes routes a selected claude-cli model through an OpenAI-compatible local shim/custom provider, long Claude CLI turns that exceed the shim's internal 120s per-turn timeout surface to the user as Empty response from model and trigger retry/fallback behavior. In the observed setup, fallback can select the same claude-cli path again, so retries loop against the same timeout surface rather than recovering.

This is not the same class as intentional thinking-only or group-silence empty responses. The underlying provider process is producing/continuing work, but the local OpenAI-compatible shim times out first.

Environment

  • Hermes model picker entry: claude-cli / claude-opus-4-7
  • Routing path: custom_providers.claude-cli -> OpenAI-compatible endpoint -> local shim at http://127.0.0.1:7891/v1
  • Shim process: persistent Claude CLI subprocess
  • Claude CLI args observed in live child process include:
    • -p
    • --output-format stream-json
    • --input-format stream-json
    • --include-partial-messages
    • --verbose
    • --permission-mode dontAsk
    • --model claude-opus-4-7
    • --resume <session-id>

Local evidence from the shim implementation

In the local shim, the hard timeout is fixed:

SUBPROCESS_TIMEOUT = 120
PROCESS_IDLE_TIMEOUT = 1800
...
content, usage, finish_reason, tool_calls = self._read_response(SUBPROCESS_TIMEOUT)
...
raise RuntimeError("claude CLI turn timed out")

Idle eviction after 1800 seconds forces cold/resumed spawns, which are more likely to exceed the 120s per-turn budget when context is heavy or MCP/tool state reconnects.

Observed Log Pattern

The shim log shows 120s/127s timeout failures followed immediately by successful turns, which indicates the model/CLI can continue but the wrapper budget is too short:

{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true}
{"request_id":"1dcdfd75-...","model":"claude-opus-4-7","latency_s":127.411,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}
{"request_id":"170feabf-...","model":"claude-opus-4-7","latency_s":3.515,"status":"ok","prompt_tokens":6,"completion_tokens":46,"has_tool_calls":false}
...
{"event":"idle_evict","session_id":"38869d6c-...","idle_s":1800}
{"event":"spawn","model":"claude-opus-4-7","resume":true,"session_id":"38869d6c-...","has_system_prompt":true}
{"request_id":"e5743941-...","model":"claude-opus-4-7","latency_s":120.014,"status":"error","error_class":"RuntimeError","error":"claude CLI turn timed out"}
{"request_id":"2d5bfaed-...","model":"claude-opus-4-7","latency_s":81.105,"status":"ok","prompt_tokens":14,"completion_tokens":2951,"has_tool_calls":true}
{"request_id":"8d56ffbf-...","model":"claude-opus-4-7","latency_s":52.101,"status":"ok","prompt_tokens":9,"completion_tokens":2916,"has_tool_calls":true}

Steps to Reproduce

  1. Configure a local OpenAI-compatible custom provider that wraps Claude CLI with a 120s per-turn timeout.
  2. Select that provider/model via Hermes model picker.
  3. Let the shim idle long enough to evict the warm child process, or use a heavy resumed context/tool-call turn.
  4. Send a message that causes the Claude CLI turn to take longer than 120 seconds.
  5. Observe Hermes reporting Empty response from model / retrying, even though the underlying failure is a provider timeout.

Expected Behavior

Hermes should classify this as a provider timeout/failure, not as a model empty-content response. It should either:

  • surface a clear provider timeout error with the provider/model name and elapsed timeout,
  • allow custom providers to advertise/request longer per-request timeout budgets,
  • avoid switching fallback to the same provider/model path that just timed out,
  • optionally mark first turn after spawn/resume as eligible for a longer timeout budget.

Actual Behavior

Hermes wraps the local provider timeout as an empty response, retries several times, and can fall back to the same claude-cli provider path. This looks like Claude produced an empty answer, but the real error is RuntimeError: claude CLI turn timed out from the shim.

Proposed Fixes

  1. Preserve custom-provider timeout/errors as timeout/error classes rather than normalizing them into empty-content retries.
  2. Add/configure per-custom-provider request timeout metadata, for example timeout_s or request_timeout_ms, and thread it into the provider call path.
  3. Detect fallback self-selection: if current provider/model and fallback provider/model resolve to the same endpoint/model, skip or pick a different fallback.
  4. Consider first-turn-after-spawn / resumed-session timeout budgets separately from warm-turn budgets.
  5. Improve logs/user-visible errors so Empty response from model is reserved for genuinely empty model output, not transport/provider timeout.

Related

  • Different from #13248, which covers intentional empty responses in group-chat/slack addressing semantics.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING