hermes - ✅(Solved) Fix Auxiliary compression auto-routing ignores fallback_providers after main model usage limit [1 pull requests, 1 participants]

hermes2026-04-25 16:15:34

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15714•Fetched 2026-04-26 05:25:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

huacao59109

Participants

huacao59109

Timeline (top)

labeled ×3cross-referenced ×1

Error Message

⚠ Compression summary failed: Error code: 429 - {'error': {'type': 'usage_limit_reached', 'message': 'The usage limit has been reached', 'plan_type': 'plus', 'resets_at': 1777132814, 'eligible_promo': None, 'resets_in_seconds': 1657}}. Inserted a fallback context marker. It would also be useful to treat OpenAI/Codex-style usage-limit errors as fallback-worthy. Current payment/credit detection appears to cover billing/credit terms, but this observed error uses:

Root Cause

Context compression is critical for long sessions. If it fails during provider fallback, Hermes preserves much less continuity exactly when the session is already under pressure. Aligning auxiliary auto-routing with fallback_providers should make long conversations significantly more robust.

Fix Action

Fix / Workaround

Current workaround

I recommend fixing this at the auxiliary routing layer, not only by documenting the workaround.

PR fix notes

PR #15871: fix(auxiliary): route compression fallback through config fallback_providers on usage limits (#15714)

Repository: NousResearch/hermes-agent
Author: briandevans
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15871

Description (problem / solution / changelog)

Summary

Extend _is_payment_error to recognise usage_limit_reached (HTTP 429) as a quota-exhaustion error, not just a transient rate limit
Add _read_config_fallback_providers() that reads the user's fallback_providers list from config.yaml
Extend _try_payment_fallback to try each configured fallback_providers entry (via resolve_provider_client) after the standard OpenRouter → Nous → Codex chain, skipping any entry whose provider matches the one that failed

The bug

When auxiliary.compression.provider: auto and the main provider returns a 429 with type: usage_limit_reached, context compression fails silently with:

⚠ Compression summary failed: Error code: 429 - {'error': {'type': 'usage_limit_reached', ...}}. Inserted a fallback context marker.

Two root causes:

_is_payment_error checked for "credits", "billing", etc. but not "usage_limit_reached" — so should_fallback stayed False and the fallback chain was never entered.
Even when should_fallback is True, _try_payment_fallback only iterates the hardcoded provider chain (OpenRouter / Nous / local / Codex / api-key). The user's fallback_providers entries in config.yaml — which are already working for main-agent chat — are never consulted.

The fix

_is_payment_error: add "usage_limit_reached" and "usage limit" to the keyword list checked on 429 responses. Normal rate-limit 429s (e.g. "Rate limit exceeded, please slow down") still return False.

_read_config_fallback_providers: new private helper that reads config.yaml → fallback_providers, filters out malformed entries, and returns [] on any error.

_try_payment_fallback: after the standard chain exhausts, iterate _read_config_fallback_providers() and call resolve_provider_client(fb_provider, fb_model) for each. Skip any entry whose provider ID matches skip_labels (the failed provider). Returns on first successful client.

Test plan

Before (regression guard): git stash → new tests fail to collect with ImportError for _read_config_fallback_providers — confirms tests are not vacuous
After: 101/101 in tests/agent/test_auxiliary_client.py pass
New coverage: test_429_usage_limit_reached_type, test_429_usage_limit_phrase, test_429_plain_rate_limit_still_not_payment, test_falls_back_to_config_fallback_providers_when_standard_chain_empty, test_skips_config_fallback_provider_matching_failed_provider, TestReadConfigFallbackProviders (4 cases)
Adjacent suites: 1950/1966 pass on full tests/agent/ — 16 failures are pre-existing baselines on origin/main (anthropic_adapter + bedrock_adapter)

Fixes #15714

🤖 Generated with Claude Code

Changed files

agent/auxiliary_client.py (modified, +48/-1)
tests/agent/test_auxiliary_client.py (modified, +130/-0)

Code Example

⚠ Compression summary failed: Error code: 429 - {'error': {'type': 'usage_limit_reached', 'message': 'The usage limit has been reached', 'plan_type': 'plus', 'resets_at': 1777132814, 'eligible_promo': None, 'resets_in_seconds': 1657}}. Inserted a fallback context marker.

---

model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex

fallback_providers:
  - provider: deepseek-v4
    model: deepseek-v4-flash
    reasoning_effort: xhigh

providers:
  deepseek-v4:
    name: DeepSeek V4 Official
    base_url: https://api.deepseek.com/v1
    key_env: DEEPSEEK_API_KEY
    transport: openai_chat
    default_model: deepseek-v4-flash

auxiliary:
  compression:
    provider: auto
    model: ""
    timeout: 120

---

auxiliary:
  compression:
    provider: deepseek-v4
    model: deepseek-v4-flash
    timeout: 120

---

usage_limit_reached
The usage limit has been reached
resets_in_seconds

RAW_BUFFERClick to expand / collapse

Bug description

When the main chat model hits a rate/usage limit and Hermes switches the conversation to a configured fallback_providers entry, context compression can still fail because the compression summarizer is routed through auxiliary.compression and, in auto mode, it resolves back to the main model instead of the active fallback model or configured fallback providers.

Observed warning:

⚠ Compression summary failed: Error code: 429 - {'error': {'type': 'usage_limit_reached', 'message': 'The usage limit has been reached', 'plan_type': 'plus', 'resets_at': 1777132814, 'eligible_promo': None, 'resets_in_seconds': 1657}}. Inserted a fallback context marker.

This degrades long-running sessions right when fallback is most important: the main agent can continue via fallback, but the compaction summary fails and Hermes inserts only a fallback context marker.

Environment

Hermes Agent: v0.11.0 (2026.4.23)
Main provider/model: openai-codex / gpt-5.5
Configured fallback provider/model: custom deepseek-v4 / deepseek-v4-flash
auxiliary.compression.provider: auto

Relevant config shape:

model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex

fallback_providers:
  - provider: deepseek-v4
    model: deepseek-v4-flash
    reasoning_effort: xhigh

providers:
  deepseek-v4:
    name: DeepSeek V4 Official
    base_url: https://api.deepseek.com/v1
    key_env: DEEPSEEK_API_KEY
    transport: openai_chat
    default_model: deepseek-v4-flash

auxiliary:
  compression:
    provider: auto
    model: ""
    timeout: 120

Steps to reproduce

Configure a main provider/model that can return 429/usage limit errors.
Configure a working fallback_providers entry.
Keep auxiliary.compression.provider: auto.
Run a long session until context compression is triggered while the main provider is rate/usage-limited.

Expected behavior

If the main model is rate/usage-limited and the conversation is already able to continue through fallback_providers, compression should also use an available fallback-capable route. The compression summary should be generated by the fallback provider rather than failing with a 429 from the exhausted main provider.

Actual behavior

The main conversation can fall back, but compression still attempts the main provider/model through auxiliary auto-routing and fails with usage_limit_reached. Hermes then inserts a fallback context marker instead of a real summary.

Current workaround

Pin compression explicitly to the fallback provider:

auxiliary:
  compression:
    provider: deepseek-v4
    model: deepseek-v4-flash
    timeout: 120

This works, but it is static and does not generalize to users with multiple fallback providers or changing fallback preference.

Recommended fix

I recommend fixing this at the auxiliary routing layer, not only by documenting the workaround.

Specifically, when auxiliary.<task>.provider: auto, agent/auxiliary_client.py should include configured fallback_providers in the auto/fallback resolution path for auxiliary LLM calls.

Suggested behavior:

Try the active/main provider as today.
If the call fails with rate/usage/quota exhaustion, try the configured fallback_providers in order.
Then fall back to the existing auxiliary provider chain (openrouter, nous, local/custom, openai-codex, API-key providers).

This makes auxiliary tasks track the same resilience policy as the main agent and avoids requiring users to duplicate fallback config under every auxiliary.* task.

It would also be useful to treat OpenAI/Codex-style usage-limit errors as fallback-worthy. Current payment/credit detection appears to cover billing/credit terms, but this observed error uses:

usage_limit_reached
The usage limit has been reached
resets_in_seconds

So the fallback classifier should likely include these usage-limit/rate-limit patterns as transient/exhaustion signals that can trigger provider fallback in auto mode.

Code pointers

Likely areas:

agent/auxiliary_client.py
- _resolve_auto(...)
- _get_provider_chain()
- _is_payment_error(...) or a broader auxiliary fallback classifier
- call_llm(...) fallback handling
agent/context_compressor.py
- _generate_summary(...) calls call_llm(task="compression", main_runtime=...)

Why this matters

extent analysis

TL;DR

The most likely fix is to update the agent/auxiliary_client.py to include configured fallback_providers in the auto/fallback resolution path for auxiliary LLM calls when auxiliary.<task>.provider: auto.

Guidance

Review the agent/auxiliary_client.py file, specifically the _resolve_auto(...), _get_provider_chain(), and call_llm(...) functions, to understand the current auto-routing logic.
Update the fallback classifier to include OpenAI/Codex-style usage-limit errors as fallback-worthy, by adding patterns like usage_limit_reached and resets_in_seconds to the error detection logic.
Modify the call_llm(...) function to try the configured fallback_providers in order when the main provider fails with a rate/usage/quota exhaustion error.
Test the updated code with multiple fallback providers and changing fallback preferences to ensure the fix is robust.

Example

def _resolve_auto(task, providers):
    # Try the active/main provider first
    if try_main_provider(task, providers):
        return providers[0]
    
    # Try the configured fallback providers in order
    for provider in providers[1:]:
        if try_provider(task, provider):
            return provider
    
    # Fall back to the existing auxiliary provider chain
    return try_auxiliary_chain(task, providers)

def try_provider(task, provider):
    # Implement logic to try the provider and handle errors
    pass

def try_auxiliary_chain(task, providers):
    # Implement logic to try the auxiliary provider chain
    pass

Notes

The fix requires updating the agent/auxiliary_client.py file, which may have dependencies on other parts of the codebase. Thorough testing is necessary to ensure the fix does not introduce new issues.

Recommendation

Apply the workaround by updating the agent/auxiliary_client.py file to include the

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #ssr #permission error #memory optimization #batch processing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Auxiliary compression auto-routing ignores fallback_providers after main model usage limit [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Current workaround

PR fix notes

PR #15871: fix(auxiliary): route compression fallback through config fallback_providers on usage limits (#15714)

Description (problem / solution / changelog)

Summary

The bug

The fix

Test plan

Related

Changed files

Code Example

Bug description

Environment

Steps to reproduce

Expected behavior

Actual behavior

Current workaround

Recommended fix

Code pointers

Why this matters

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING