hermes - 💡(How to fix) Fix Backend health-aware fallback for Anthropic-compatible facades and auxiliary title routing

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Hermes should treat Anthropic-compatible localhost/custom backends as explicit anthropic_messages runtimes across main chat, auxiliary calls, and delegation, and it should stop retrying a backend that already failed earlier in the same turn.

Today the failure mode looks like this:

  1. Main chat uses a Claude-compatible localhost facade, e.g. provider=anthropic, base_url=http://127.0.0.1:8082.
  2. The facade crashes with HTTP 500: Claude Code process exited with code 1.
  3. Hermes falls back for the user-visible reply.
  4. Async title generation immediately retries the same broken backend and emits ⚠ Auxiliary title generation failed ....
  5. On the next turn Hermes restores the primary and repeats the same failure cycle.

That is the wrong behavior. We need backend-health-aware routing rather than exact-model-string-centric retries.

Root Cause

Hermes should treat Anthropic-compatible localhost/custom backends as explicit anthropic_messages runtimes across main chat, auxiliary calls, and delegation, and it should stop retrying a backend that already failed earlier in the same turn.

Today the failure mode looks like this:

  1. Main chat uses a Claude-compatible localhost facade, e.g. provider=anthropic, base_url=http://127.0.0.1:8082.
  2. The facade crashes with HTTP 500: Claude Code process exited with code 1.
  3. Hermes falls back for the user-visible reply.
  4. Async title generation immediately retries the same broken backend and emits ⚠ Auxiliary title generation failed ....
  5. On the next turn Hermes restores the primary and repeats the same failure cycle.

That is the wrong behavior. We need backend-health-aware routing rather than exact-model-string-centric retries.

Code Example

@dataclass(frozen=True)
class BackendIdentity:
    provider: str
    api_mode: str
    base_url: str
    model_family: str | None = None
RAW_BUFFERClick to expand / collapse

Summary

Hermes should treat Anthropic-compatible localhost/custom backends as explicit anthropic_messages runtimes across main chat, auxiliary calls, and delegation, and it should stop retrying a backend that already failed earlier in the same turn.

Today the failure mode looks like this:

  1. Main chat uses a Claude-compatible localhost facade, e.g. provider=anthropic, base_url=http://127.0.0.1:8082.
  2. The facade crashes with HTTP 500: Claude Code process exited with code 1.
  3. Hermes falls back for the user-visible reply.
  4. Async title generation immediately retries the same broken backend and emits ⚠ Auxiliary title generation failed ....
  5. On the next turn Hermes restores the primary and repeats the same failure cycle.

That is the wrong behavior. We need backend-health-aware routing rather than exact-model-string-centric retries.

Problem

Current fallback and auxiliary routing do not share a durable notion of:

  • which backend failed
  • whether that backend should be avoided for the rest of the turn
  • whether that backend should be put on short cooldown across turns

The code already has pieces of the puzzle:

  • primary runtime snapshot / restoration in run_agent.py
  • ordered fallback chain in _try_activate_fallback()
  • auxiliary auto-resolution in agent/auxiliary_client.py
  • async title generation in agent/title_generator.py

But there is no shared backend identity / health layer tying them together.

Proposed direction: Option B

Implement capability/provider-family routing with backend health-aware fallback.

Core idea

Track health by backend identity, not exact model slug.

The meaningful unit is something like:

  • provider
  • api_mode
  • base_url
  • optional model_family

For this bug, the important distinction is that:

  • anthropic + anthropic_messages + http://127.0.0.1:8082

is one backend, regardless of whether the selected model string is claude-opus-4-7 or something else.

Concrete implementation

1. Add backend health registry

Create agent/backend_health.py with:

  • BackendIdentity
  • BackendHealth
  • process-local registry
  • helpers to:
    • build identity from runtime
    • record success/failure
    • check short cooldown / temporary down state
    • produce a compact summary for logs

Suggested identity fields:

@dataclass(frozen=True)
class BackendIdentity:
    provider: str
    api_mode: str
    base_url: str
    model_family: str | None = None

2. Mark failing backends during the turn

In AIAgent / run_agent.py:

  • add turn-local failure tracking
  • when the main runtime fails in the retry loop, record the current backend identity
  • use that turn-local set to prevent same-turn auxiliary retries into the same backend

3. Respect backend cooldown across turns

Extend _restore_primary_runtime() so it does not blindly restore a primary backend that has just been marked temporarily down.

Rate-limit cooldown already exists. This should generalize the same concept for repeated backend crashes / transport failures / repeated 5xxs.

4. Make fallback selection health-aware

Keep existing fallback config compatibility, but skip fallback entries whose backend identity is currently unhealthy.

That means _try_activate_fallback() should prefer the next healthy candidate instead of retrying a backend that is already known-bad.

5. Stop title generation from retrying a backend that already failed this turn

Plumb backend exclusion context through:

  • gateway/run.py
  • agent/title_generator.py
  • agent/auxiliary_client.py

So maybe_auto_title() / generate_title() / call_llm() can receive an exclusion list or main_runtime hint and avoid a backend that already died earlier in the turn.

6. Keep Anthropic-compatible localhost/custom routing explicit

Where api_mode=anthropic_messages is configured, auxiliary resolution should honor that explicitly instead of relying on hostname/path heuristics.

This should work consistently for:

  • main chat
  • auxiliary text tasks
  • title generation
  • delegation

Acceptance criteria

  • Claude-compatible localhost facades can remain the primary runtime.
  • Anthropic-compatible localhost/custom backends are routed consistently as anthropic_messages where configured.
  • If the facade crashes repeatedly, Hermes stops trying it every single message and uses healthy fallback while cooldown is active.
  • Title generation does not retry a backend that already failed earlier in the same turn.
  • Auxiliary auto-resolution skips excluded/down backends.
  • Fallback remains backwards-compatible with existing config.
  • Unconfigured providers are not spuriously attempted as “fallbacks”.

Suggested tests

tests/run_agent/test_provider_fallback.py

  • skips unhealthy fallback backend entries
  • falls through to next healthy entry

tests/run_agent/test_primary_runtime_restore.py

  • _restore_primary_runtime() stays on fallback while primary backend cooldown is active
  • restores primary after cooldown expires

tests/agent/test_title_generator.py

  • title generation receives backend exclusions
  • no callback warning for “backend already failed this turn” / “no healthy backend available” cosmetic title path

tests/agent/test_auxiliary_client.py

  • auto auxiliary resolution skips excluded/down backends
  • explicit anthropic_messages custom runtime is honored

new: tests/agent/test_backend_health.py

  • identity normalization
  • cooldown escalation / expiry
  • success clears failure streak

Notes

This should be implemented incrementally:

  1. backend health registry + main fallback integration
  2. title generation + auxiliary exclusion plumbing
  3. explicit anthropic-compatible auxiliary/delegation routing cleanup

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING