hermes - ✅(Solved) Fix Auxiliary compression auto-routing ignores fallback_providers after main model usage limit [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15714Fetched 2026-04-26 05:25:38
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×1

Error Message

⚠ Compression summary failed: Error code: 429 - {'error': {'type': 'usage_limit_reached', 'message': 'The usage limit has been reached', 'plan_type': 'plus', 'resets_at': 1777132814, 'eligible_promo': None, 'resets_in_seconds': 1657}}. Inserted a fallback context marker. It would also be useful to treat OpenAI/Codex-style usage-limit errors as fallback-worthy. Current payment/credit detection appears to cover billing/credit terms, but this observed error uses:

Root Cause

Context compression is critical for long sessions. If it fails during provider fallback, Hermes preserves much less continuity exactly when the session is already under pressure. Aligning auxiliary auto-routing with fallback_providers should make long conversations significantly more robust.

Fix Action

Fix / Workaround

Current workaround

I recommend fixing this at the auxiliary routing layer, not only by documenting the workaround.

PR fix notes

PR #15871: fix(auxiliary): route compression fallback through config fallback_providers on usage limits (#15714)

Description (problem / solution / changelog)

Summary

  • Extend _is_payment_error to recognise usage_limit_reached (HTTP 429) as a quota-exhaustion error, not just a transient rate limit
  • Add _read_config_fallback_providers() that reads the user's fallback_providers list from config.yaml
  • Extend _try_payment_fallback to try each configured fallback_providers entry (via resolve_provider_client) after the standard OpenRouter → Nous → Codex chain, skipping any entry whose provider matches the one that failed

The bug

When auxiliary.compression.provider: auto and the main provider returns a 429 with type: usage_limit_reached, context compression fails silently with:

⚠ Compression summary failed: Error code: 429 - {'error': {'type': 'usage_limit_reached', ...}}. Inserted a fallback context marker.

Two root causes:

  1. _is_payment_error checked for "credits", "billing", etc. but not "usage_limit_reached" — so should_fallback stayed False and the fallback chain was never entered.
  2. Even when should_fallback is True, _try_payment_fallback only iterates the hardcoded provider chain (OpenRouter / Nous / local / Codex / api-key). The user's fallback_providers entries in config.yaml — which are already working for main-agent chat — are never consulted.

The fix

_is_payment_error: add "usage_limit_reached" and "usage limit" to the keyword list checked on 429 responses. Normal rate-limit 429s (e.g. "Rate limit exceeded, please slow down") still return False.

_read_config_fallback_providers: new private helper that reads config.yaml → fallback_providers, filters out malformed entries, and returns [] on any error.

_try_payment_fallback: after the standard chain exhausts, iterate _read_config_fallback_providers() and call resolve_provider_client(fb_provider, fb_model) for each. Skip any entry whose provider ID matches skip_labels (the failed provider). Returns on first successful client.

Test plan

  • Before (regression guard): git stash → new tests fail to collect with ImportError for _read_config_fallback_providers — confirms tests are not vacuous
  • After: 101/101 in tests/agent/test_auxiliary_client.py pass
  • New coverage: test_429_usage_limit_reached_type, test_429_usage_limit_phrase, test_429_plain_rate_limit_still_not_payment, test_falls_back_to_config_fallback_providers_when_standard_chain_empty, test_skips_config_fallback_provider_matching_failed_provider, TestReadConfigFallbackProviders (4 cases)
  • Adjacent suites: 1950/1966 pass on full tests/agent/ — 16 failures are pre-existing baselines on origin/main (anthropic_adapter + bedrock_adapter)

Related

  • Fixes #15714

🤖 Generated with Claude Code

Changed files

  • agent/auxiliary_client.py (modified, +48/-1)
  • tests/agent/test_auxiliary_client.py (modified, +130/-0)

Code Example

Compression summary failed: Error code: 429 - {'error': {'type': 'usage_limit_reached', 'message': 'The usage limit has been reached', 'plan_type': 'plus', 'resets_at': 1777132814, 'eligible_promo': None, 'resets_in_seconds': 1657}}. Inserted a fallback context marker.

---

model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex

fallback_providers:
  - provider: deepseek-v4
    model: deepseek-v4-flash
    reasoning_effort: xhigh

providers:
  deepseek-v4:
    name: DeepSeek V4 Official
    base_url: https://api.deepseek.com/v1
    key_env: DEEPSEEK_API_KEY
    transport: openai_chat
    default_model: deepseek-v4-flash

auxiliary:
  compression:
    provider: auto
    model: ""
    timeout: 120

---

auxiliary:
  compression:
    provider: deepseek-v4
    model: deepseek-v4-flash
    timeout: 120

---

usage_limit_reached
The usage limit has been reached
resets_in_seconds
RAW_BUFFERClick to expand / collapse

Bug description

When the main chat model hits a rate/usage limit and Hermes switches the conversation to a configured fallback_providers entry, context compression can still fail because the compression summarizer is routed through auxiliary.compression and, in auto mode, it resolves back to the main model instead of the active fallback model or configured fallback providers.

Observed warning:

⚠ Compression summary failed: Error code: 429 - {'error': {'type': 'usage_limit_reached', 'message': 'The usage limit has been reached', 'plan_type': 'plus', 'resets_at': 1777132814, 'eligible_promo': None, 'resets_in_seconds': 1657}}. Inserted a fallback context marker.

This degrades long-running sessions right when fallback is most important: the main agent can continue via fallback, but the compaction summary fails and Hermes inserts only a fallback context marker.

Environment

  • Hermes Agent: v0.11.0 (2026.4.23)
  • Main provider/model: openai-codex / gpt-5.5
  • Configured fallback provider/model: custom deepseek-v4 / deepseek-v4-flash
  • auxiliary.compression.provider: auto

Relevant config shape:

model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex

fallback_providers:
  - provider: deepseek-v4
    model: deepseek-v4-flash
    reasoning_effort: xhigh

providers:
  deepseek-v4:
    name: DeepSeek V4 Official
    base_url: https://api.deepseek.com/v1
    key_env: DEEPSEEK_API_KEY
    transport: openai_chat
    default_model: deepseek-v4-flash

auxiliary:
  compression:
    provider: auto
    model: ""
    timeout: 120

Steps to reproduce

  1. Configure a main provider/model that can return 429/usage limit errors.
  2. Configure a working fallback_providers entry.
  3. Keep auxiliary.compression.provider: auto.
  4. Run a long session until context compression is triggered while the main provider is rate/usage-limited.

Expected behavior

If the main model is rate/usage-limited and the conversation is already able to continue through fallback_providers, compression should also use an available fallback-capable route. The compression summary should be generated by the fallback provider rather than failing with a 429 from the exhausted main provider.

Actual behavior

The main conversation can fall back, but compression still attempts the main provider/model through auxiliary auto-routing and fails with usage_limit_reached. Hermes then inserts a fallback context marker instead of a real summary.

Current workaround

Pin compression explicitly to the fallback provider:

auxiliary:
  compression:
    provider: deepseek-v4
    model: deepseek-v4-flash
    timeout: 120

This works, but it is static and does not generalize to users with multiple fallback providers or changing fallback preference.

Recommended fix

I recommend fixing this at the auxiliary routing layer, not only by documenting the workaround.

Specifically, when auxiliary.<task>.provider: auto, agent/auxiliary_client.py should include configured fallback_providers in the auto/fallback resolution path for auxiliary LLM calls.

Suggested behavior:

  1. Try the active/main provider as today.
  2. If the call fails with rate/usage/quota exhaustion, try the configured fallback_providers in order.
  3. Then fall back to the existing auxiliary provider chain (openrouter, nous, local/custom, openai-codex, API-key providers).

This makes auxiliary tasks track the same resilience policy as the main agent and avoids requiring users to duplicate fallback config under every auxiliary.* task.

It would also be useful to treat OpenAI/Codex-style usage-limit errors as fallback-worthy. Current payment/credit detection appears to cover billing/credit terms, but this observed error uses:

usage_limit_reached
The usage limit has been reached
resets_in_seconds

So the fallback classifier should likely include these usage-limit/rate-limit patterns as transient/exhaustion signals that can trigger provider fallback in auto mode.

Code pointers

Likely areas:

  • agent/auxiliary_client.py
    • _resolve_auto(...)
    • _get_provider_chain()
    • _is_payment_error(...) or a broader auxiliary fallback classifier
    • call_llm(...) fallback handling
  • agent/context_compressor.py
    • _generate_summary(...) calls call_llm(task="compression", main_runtime=...)

Why this matters

Context compression is critical for long sessions. If it fails during provider fallback, Hermes preserves much less continuity exactly when the session is already under pressure. Aligning auxiliary auto-routing with fallback_providers should make long conversations significantly more robust.

extent analysis

TL;DR

The most likely fix is to update the agent/auxiliary_client.py to include configured fallback_providers in the auto/fallback resolution path for auxiliary LLM calls when auxiliary.<task>.provider: auto.

Guidance

  • Review the agent/auxiliary_client.py file, specifically the _resolve_auto(...), _get_provider_chain(), and call_llm(...) functions, to understand the current auto-routing logic.
  • Update the fallback classifier to include OpenAI/Codex-style usage-limit errors as fallback-worthy, by adding patterns like usage_limit_reached and resets_in_seconds to the error detection logic.
  • Modify the call_llm(...) function to try the configured fallback_providers in order when the main provider fails with a rate/usage/quota exhaustion error.
  • Test the updated code with multiple fallback providers and changing fallback preferences to ensure the fix is robust.

Example

def _resolve_auto(task, providers):
    # Try the active/main provider first
    if try_main_provider(task, providers):
        return providers[0]
    
    # Try the configured fallback providers in order
    for provider in providers[1:]:
        if try_provider(task, provider):
            return provider
    
    # Fall back to the existing auxiliary provider chain
    return try_auxiliary_chain(task, providers)

def try_provider(task, provider):
    # Implement logic to try the provider and handle errors
    pass

def try_auxiliary_chain(task, providers):
    # Implement logic to try the auxiliary provider chain
    pass

Notes

The fix requires updating the agent/auxiliary_client.py file, which may have dependencies on other parts of the codebase. Thorough testing is necessary to ensure the fix does not introduce new issues.

Recommendation

Apply the workaround by updating the agent/auxiliary_client.py file to include the

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If the main model is rate/usage-limited and the conversation is already able to continue through fallback_providers, compression should also use an available fallback-capable route. The compression summary should be generated by the fallback provider rather than failing with a 429 from the exhausted main provider.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Auxiliary compression auto-routing ignores fallback_providers after main model usage limit [1 pull requests, 1 participants]