hermes - 💡(How to fix) Fix [Feature]: Add per-auxiliary fallback policy tiers for critical surfaces

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Do not silently downgrade. If the primary provider/model is unavailable and no approved fallback is configured, halt with an actionable error instead of emitting low-confidence authority. 3. When policy is strong_required, only use explicit fallback_providers or models marked as strong/approved; otherwise return a clear error naming the auxiliary task and missing approved fallback.

Fix Action

Fix / Workaround

This proposal is meant to complement that work, not duplicate it: fallback routing chooses the next model; fallback policy decides whether a given auxiliary surface is allowed to downgrade to that model at all.

Code Example

auxiliary:
  title_generation:
    provider: openrouter
    model: google/gemini-2.5-flash-lite
    fallback_policy: cheap_ok

  compression:
    provider: openai-codex
    model: gpt-5.5
    fallback_providers:
      - provider: openrouter
        model: qwen/qwen3.7-max
    fallback_policy: strong_required

  approval:
    provider: openai-codex
    model: gpt-5.5
    fallback_policy: fail_closed
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes currently treats auxiliary fallback primarily as an availability problem: if the configured auxiliary provider fails, try another provider/model or auto-chain so the task can complete.

That is necessary, but it is not sufficient for several auxiliary surfaces. Some auxiliary calls are cosmetic or draft-only, while others become authority-bearing context, durable state, or user/client-facing output.

A concrete failure mode we hit while dogfooding Hermes:

  • Main model: openai-codex / gpt-5.5
  • Auxiliary fallback: cheap Gemini Flash-class model
  • The fallback kept the system available, but produced low-quality / hallucinated business-report output.
  • For a title or low-stakes triage draft, that kind of downgrade is tolerable. For compression, curator/wiki synthesis, operator/cron reports, approval reasoning, memory persistence, or client-facing report summaries, a bad fallback is worse than no fallback.

This also interacts with the active auxiliary fallback work in #22201 / #32408 / #32411: per-task fallback_providers solves where to fail over, but users also need a way to express whether this surface may downgrade at all.

Proposed Solution

Add a small per-auxiliary fallback policy/criticality contract, orthogonal to fallback_providers.

Example shape:

auxiliary:
  title_generation:
    provider: openrouter
    model: google/gemini-2.5-flash-lite
    fallback_policy: cheap_ok

  compression:
    provider: openai-codex
    model: gpt-5.5
    fallback_providers:
      - provider: openrouter
        model: qwen/qwen3.7-max
    fallback_policy: strong_required

  approval:
    provider: openai-codex
    model: gpt-5.5
    fallback_policy: fail_closed

Suggested policy meanings:

cheap_ok

Fallback may use cheap/fast models. Output should be cosmetic, draft-only, or easily reversible.

Examples:

  • title generation
  • short profile descriptions
  • low-stakes triage/specifier drafts
  • web extraction summaries where sources are attached and the output is not final authority

strong_required

Fallback is allowed only to configured/approved stronger models. Do not silently use the global cheap auxiliary default for this surface.

Examples:

  • compression / pre-compaction summaries
  • curator or durable source synthesis
  • operator briefs and cron digests
  • vision interpretation that affects decisions
  • report/result summaries for business workflows
  • client-facing or public-facing drafts/reports

fail_closed

Do not silently downgrade. If the primary provider/model is unavailable and no approved fallback is configured, halt with an actionable error instead of emitting low-confidence authority.

Examples:

  • approval / authorization reasoning
  • memory flush or persistence when state is uncertain
  • tool actions touching external systems or secrets
  • final public/client outputs where hallucinated content is worse than no output

Review-friendly implementation sketch

A narrowly scoped first PR could avoid changing every auxiliary call at once:

  1. Add an optional fallback_policy field to auxiliary task config with default cheap_ok or current behavior-preserving default.
  2. Centralize policy evaluation in agent/auxiliary_client.py, near the per-task resolution / fallback-chain logic.
  3. When policy is strong_required, only use explicit fallback_providers or models marked as strong/approved; otherwise return a clear error naming the auxiliary task and missing approved fallback.
  4. When policy is fail_closed, skip generic auto/cheap fallback and surface a clear actionable failure.
  5. Log/report which policy was applied so debugging says auxiliary=compression policy=strong_required fallback=..., not just using auto.
  6. Add focused tests for:
    • cheap surface still falls back as before
    • compression refuses an unapproved cheap fallback under strong_required
    • approval refuses fallback under fail_closed
    • explicit strong fallback remains allowed

The minimum useful version could be config + enforcement + tests only; docs/setup UI can follow after the behavior lands.

Alternatives Considered

  • Only add per-task fallback_providers (#32411). This improves resilience, but still allows users to accidentally route authority-bearing surfaces to a cheap model if they use a global/default fallback chain.
  • Hardcode a fixed list of trusted models. Simpler, but too opinionated across providers and deployments. A policy knob plus explicit fallback list is easier to review and keeps provider choice user-controlled.
  • Do nothing and rely on users to configure better models manually. This misses the failure mode: users often discover the quality downgrade only after bad output has already been persisted, summarized, or sent.

Feature Type

Configuration option / Performance & reliability

Scope

Medium: few files, likely under agent/auxiliary_client.py, default config/docs, and focused tests.

Contribution

I'd like to implement this myself and submit a PR if maintainers agree on the policy shape.

Related issues / PRs

  • #22201 — per-auxiliary fallback providers
  • #32408 — consolidation proposal for auxiliary fallback work
  • #32411 — implementation PR for fallback_providers
  • #31127 — Codex usage limits as auxiliary quota exhaustion
  • #34024 — user-facing reset details for Codex usage_limit_reached

This proposal is meant to complement that work, not duplicate it: fallback routing chooses the next model; fallback policy decides whether a given auxiliary surface is allowed to downgrade to that model at all.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Add per-auxiliary fallback policy tiers for critical surfaces