hermes - 💡(How to fix) Fix [Feature]: Add per-auxiliary fallback policy tiers for critical surfaces

Error Message

Do not silently downgrade. If the primary provider/model is unavailable and no approved fallback is configured, halt with an actionable error instead of emitting low-confidence authority. 3. When policy is strong_required, only use explicit fallback_providers or models marked as strong/approved; otherwise return a clear error naming the auxiliary task and missing approved fallback.

Code Example

auxiliary:
  title_generation:
    provider: openrouter
    model: google/gemini-2.5-flash-lite
    fallback_policy: cheap_ok

  compression:
    provider: openai-codex
    model: gpt-5.5
    fallback_providers:
      - provider: openrouter
        model: qwen/qwen3.7-max
    fallback_policy: strong_required

  approval:
    provider: openai-codex
    model: gpt-5.5
    fallback_policy: fail_closed

Problem or Use Case

Hermes currently treats auxiliary fallback primarily as an availability problem: if the configured auxiliary provider fails, try another provider/model or auto-chain so the task can complete.

That is necessary, but it is not sufficient for several auxiliary surfaces. Some auxiliary calls are cosmetic or draft-only, while others become authority-bearing context, durable state, or user/client-facing output.

A concrete failure mode we hit while dogfooding Hermes:

Main model: openai-codex / gpt-5.5
Auxiliary fallback: cheap Gemini Flash-class model
The fallback kept the system available, but produced low-quality / hallucinated business-report output.
For a title or low-stakes triage draft, that kind of downgrade is tolerable. For compression, curator/wiki synthesis, operator/cron reports, approval reasoning, memory persistence, or client-facing report summaries, a bad fallback is worse than no fallback.

This also interacts with the active auxiliary fallback work in #22201 / #32408 / #32411: per-task fallback_providers solves where to fail over, but users also need a way to express whether this surface may downgrade at all.

Proposed Solution

Add a small per-auxiliary fallback policy/criticality contract, orthogonal to fallback_providers.

Example shape:

auxiliary:
  title_generation:
    provider: openrouter
    model: google/gemini-2.5-flash-lite
    fallback_policy: cheap_ok

  compression:
    provider: openai-codex
    model: gpt-5.5
    fallback_providers:
      - provider: openrouter
        model: qwen/qwen3.7-max
    fallback_policy: strong_required

  approval:
    provider: openai-codex
    model: gpt-5.5
    fallback_policy: fail_closed

Suggested policy meanings:

`cheap_ok`

Fallback may use cheap/fast models. Output should be cosmetic, draft-only, or easily reversible.

Examples:

title generation
short profile descriptions
low-stakes triage/specifier drafts
web extraction summaries where sources are attached and the output is not final authority

`strong_required`

Fallback is allowed only to configured/approved stronger models. Do not silently use the global cheap auxiliary default for this surface.

Examples:

compression / pre-compaction summaries
curator or durable source synthesis
operator briefs and cron digests
vision interpretation that affects decisions
report/result summaries for business workflows
client-facing or public-facing drafts/reports

`fail_closed`

Do not silently downgrade. If the primary provider/model is unavailable and no approved fallback is configured, halt with an actionable error instead of emitting low-confidence authority.

Examples:

approval / authorization reasoning
memory flush or persistence when state is uncertain
tool actions touching external systems or secrets
final public/client outputs where hallucinated content is worse than no output

Review-friendly implementation sketch

A narrowly scoped first PR could avoid changing every auxiliary call at once:

Add an optional fallback_policy field to auxiliary task config with default cheap_ok or current behavior-preserving default.
Centralize policy evaluation in agent/auxiliary_client.py, near the per-task resolution / fallback-chain logic.
When policy is strong_required, only use explicit fallback_providers or models marked as strong/approved; otherwise return a clear error naming the auxiliary task and missing approved fallback.
When policy is fail_closed, skip generic auto/cheap fallback and surface a clear actionable failure.
Log/report which policy was applied so debugging says auxiliary=compression policy=strong_required fallback=..., not just using auto.
Add focused tests for:
- cheap surface still falls back as before
- compression refuses an unapproved cheap fallback under strong_required
- approval refuses fallback under fail_closed
- explicit strong fallback remains allowed

The minimum useful version could be config + enforcement + tests only; docs/setup UI can follow after the behavior lands.

Alternatives Considered

Only add per-task fallback_providers (#32411). This improves resilience, but still allows users to accidentally route authority-bearing surfaces to a cheap model if they use a global/default fallback chain.
Hardcode a fixed list of trusted models. Simpler, but too opinionated across providers and deployments. A policy knob plus explicit fallback list is easier to review and keeps provider choice user-controlled.
Do nothing and rely on users to configure better models manually. This misses the failure mode: users often discover the quality downgrade only after bad output has already been persisted, summarized, or sent.

Feature Type

Configuration option / Performance & reliability

Scope

Medium: few files, likely under agent/auxiliary_client.py, default config/docs, and focused tests.

Contribution

I'd like to implement this myself and submit a PR if maintainers agree on the policy shape.

Related issues / PRs

#22201 — per-auxiliary fallback providers
#32408 — consolidation proposal for auxiliary fallback work
#32411 — implementation PR for fallback_providers
#31127 — Codex usage limits as auxiliary quota exhaustion
#34024 — user-facing reset details for Codex usage_limit_reached

This proposal is meant to complement that work, not duplicate it: fallback routing chooses the next model; fallback policy decides whether a given auxiliary surface is allowed to downgrade to that model at all.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature]: Add per-auxiliary fallback policy tiers for critical surfaces

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fix / Workaround

Code Example

Problem or Use Case

Proposed Solution

`cheap_ok`

`strong_required`

`fail_closed`

Review-friendly implementation sketch

Alternatives Considered

Feature Type

Scope

Contribution

Related issues / PRs

Still need to ship something?

TRENDING