hermes - 💡(How to fix) Fix Backend health-aware fallback for Anthropic-compatible facades and auxiliary title routing

Hermes should treat Anthropic-compatible localhost/custom backends as explicit anthropic_messages runtimes across main chat, auxiliary calls, and delegation, and it should stop retrying a backend that already failed earlier in the same turn.

Today the failure mode looks like this:

Main chat uses a Claude-compatible localhost facade, e.g. provider=anthropic, base_url=http://127.0.0.1:8082.
The facade crashes with HTTP 500: Claude Code process exited with code 1.
Hermes falls back for the user-visible reply.
Async title generation immediately retries the same broken backend and emits ⚠ Auxiliary title generation failed ....
On the next turn Hermes restores the primary and repeats the same failure cycle.

That is the wrong behavior. We need backend-health-aware routing rather than exact-model-string-centric retries.

Root Cause

Today the failure mode looks like this:

Main chat uses a Claude-compatible localhost facade, e.g. provider=anthropic, base_url=http://127.0.0.1:8082.
The facade crashes with HTTP 500: Claude Code process exited with code 1.
Hermes falls back for the user-visible reply.
Async title generation immediately retries the same broken backend and emits ⚠ Auxiliary title generation failed ....
On the next turn Hermes restores the primary and repeats the same failure cycle.

That is the wrong behavior. We need backend-health-aware routing rather than exact-model-string-centric retries.

Summary

Today the failure mode looks like this:

Main chat uses a Claude-compatible localhost facade, e.g. provider=anthropic, base_url=http://127.0.0.1:8082.
The facade crashes with HTTP 500: Claude Code process exited with code 1.
Hermes falls back for the user-visible reply.
Async title generation immediately retries the same broken backend and emits ⚠ Auxiliary title generation failed ....
On the next turn Hermes restores the primary and repeats the same failure cycle.

That is the wrong behavior. We need backend-health-aware routing rather than exact-model-string-centric retries.

Problem

Current fallback and auxiliary routing do not share a durable notion of:

which backend failed
whether that backend should be avoided for the rest of the turn
whether that backend should be put on short cooldown across turns

The code already has pieces of the puzzle:

primary runtime snapshot / restoration in run_agent.py
ordered fallback chain in _try_activate_fallback()
auxiliary auto-resolution in agent/auxiliary_client.py
async title generation in agent/title_generator.py

But there is no shared backend identity / health layer tying them together.

Proposed direction: Option B

Implement capability/provider-family routing with backend health-aware fallback.

Core idea

Track health by backend identity, not exact model slug.

The meaningful unit is something like:

provider
api_mode
base_url
optional model_family

For this bug, the important distinction is that:

anthropic + anthropic_messages + http://127.0.0.1:8082

is one backend, regardless of whether the selected model string is claude-opus-4-7 or something else.

Concrete implementation

1. Add backend health registry

Create agent/backend_health.py with:

BackendIdentity
BackendHealth
process-local registry
helpers to:
- build identity from runtime
- record success/failure
- check short cooldown / temporary down state
- produce a compact summary for logs

Suggested identity fields:

@dataclass(frozen=True)
class BackendIdentity:
    provider: str
    api_mode: str
    base_url: str
    model_family: str | None = None

2. Mark failing backends during the turn

In AIAgent / run_agent.py:

add turn-local failure tracking
when the main runtime fails in the retry loop, record the current backend identity
use that turn-local set to prevent same-turn auxiliary retries into the same backend

3. Respect backend cooldown across turns

Extend _restore_primary_runtime() so it does not blindly restore a primary backend that has just been marked temporarily down.

Rate-limit cooldown already exists. This should generalize the same concept for repeated backend crashes / transport failures / repeated 5xxs.

4. Make fallback selection health-aware

Keep existing fallback config compatibility, but skip fallback entries whose backend identity is currently unhealthy.

That means _try_activate_fallback() should prefer the next healthy candidate instead of retrying a backend that is already known-bad.

5. Stop title generation from retrying a backend that already failed this turn

Plumb backend exclusion context through:

gateway/run.py
agent/title_generator.py
agent/auxiliary_client.py

So maybe_auto_title() / generate_title() / call_llm() can receive an exclusion list or main_runtime hint and avoid a backend that already died earlier in the turn.

6. Keep Anthropic-compatible localhost/custom routing explicit

Where api_mode=anthropic_messages is configured, auxiliary resolution should honor that explicitly instead of relying on hostname/path heuristics.

This should work consistently for:

main chat
auxiliary text tasks
title generation
delegation

Acceptance criteria

Claude-compatible localhost facades can remain the primary runtime.
Anthropic-compatible localhost/custom backends are routed consistently as anthropic_messages where configured.
If the facade crashes repeatedly, Hermes stops trying it every single message and uses healthy fallback while cooldown is active.
Title generation does not retry a backend that already failed earlier in the same turn.
Auxiliary auto-resolution skips excluded/down backends.
Fallback remains backwards-compatible with existing config.
Unconfigured providers are not spuriously attempted as “fallbacks”.

Suggested tests

`tests/run_agent/test_provider_fallback.py`

skips unhealthy fallback backend entries
falls through to next healthy entry

`tests/run_agent/test_primary_runtime_restore.py`

_restore_primary_runtime() stays on fallback while primary backend cooldown is active
restores primary after cooldown expires

`tests/agent/test_title_generator.py`

title generation receives backend exclusions
no callback warning for “backend already failed this turn” / “no healthy backend available” cosmetic title path

`tests/agent/test_auxiliary_client.py`

auto auxiliary resolution skips excluded/down backends
explicit anthropic_messages custom runtime is honored

new: `tests/agent/test_backend_health.py`

identity normalization
cooldown escalation / expiry
success clears failure streak

Notes

This should be implemented incrementally:

backend health registry + main fallback integration
title generation + auxiliary exclusion plumbing
explicit anthropic-compatible auxiliary/delegation routing cleanup

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Backend health-aware fallback for Anthropic-compatible facades and auxiliary title routing

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem

Proposed direction: Option B

Core idea

Concrete implementation

1. Add backend health registry

2. Mark failing backends during the turn

3. Respect backend cooldown across turns

4. Make fallback selection health-aware

5. Stop title generation from retrying a backend that already failed this turn

6. Keep Anthropic-compatible localhost/custom routing explicit

Acceptance criteria

Suggested tests

`tests/run_agent/test_provider_fallback.py`

`tests/run_agent/test_primary_runtime_restore.py`

`tests/agent/test_title_generator.py`

`tests/agent/test_auxiliary_client.py`

new: `tests/agent/test_backend_health.py`

Notes

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Backend health-aware fallback for Anthropic-compatible facades and auxiliary title routing

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Problem

Proposed direction: Option B

Core idea

Concrete implementation

1. Add backend health registry

2. Mark failing backends during the turn

3. Respect backend cooldown across turns

4. Make fallback selection health-aware

5. Stop title generation from retrying a backend that already failed this turn

6. Keep Anthropic-compatible localhost/custom routing explicit

Acceptance criteria

Suggested tests

tests/run_agent/test_provider_fallback.py

tests/run_agent/test_primary_runtime_restore.py

tests/agent/test_title_generator.py

tests/agent/test_auxiliary_client.py

new: tests/agent/test_backend_health.py

Notes

Still need to ship something?

TRENDING

`tests/run_agent/test_provider_fallback.py`

`tests/run_agent/test_primary_runtime_restore.py`

`tests/agent/test_title_generator.py`

`tests/agent/test_auxiliary_client.py`

new: `tests/agent/test_backend_health.py`