hermes - 💡(How to fix) Fix Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota) [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

When a task resolves to a specific provider (e.g., "custom" for a LiteLLM proxy, or "openrouter"), the fallback chain is completely disabled. If that provider fails with a retriable error, call_llm raises instead of trying alternatives. This is overly conservative. The intent is to respect explicit provider choice, but when the error is clearly "this provider can't serve right now" (payment, quota, connection), trying alternatives is better than failing entirely — especially for background tasks like context compression where the user didn't explicitly choose a provider.

Allow fallback when the error is clearly about provider capacity,

Root Cause

Two root causes:

Fix Action

Fixed

Code Example

# In _is_payment_error or a new _is_quota_exhaustion check:
if any(kw in err_lower for kw in ("quota", "too many tokens", "rate limit exceeded",
                                    "daily limit", "tokens per day")):
    return True

---

is_auto = resolved_provider in ("auto", "", None)
if should_fallback and is_auto:

---

# Allow fallback when the error is clearly about provider capacity,
# not about the request itself (4xx client errors etc.)
if should_fallback:
    # ... try fallback chain
RAW_BUFFERClick to expand / collapse

Problem

When the auxiliary LLM provider (used for context compression, memory flush, web extraction, etc.) returns a 429 rate limit with a daily quota message like "Too many tokens per day", the fallback chain in call_llm() does not activate. This causes context compaction to silently fail, dropping conversation history without a summary.

Two root causes:

1. Daily rate limits not classified as fallback-worthy errors

_is_payment_error() checks for keywords like "credits", "insufficient funds", "billing", "payment required" — but daily token quota exhaustion (common with Bedrock, Vertex AI, and other cloud providers) uses different language like "Too many tokens per day" or "quota exceeded". These are functionally identical to credit exhaustion but don't trigger fallback.

Suggested fix: Add quota-related keywords to _is_payment_error() or create a separate _is_quota_error():

# In _is_payment_error or a new _is_quota_exhaustion check:
if any(kw in err_lower for kw in ("quota", "too many tokens", "rate limit exceeded",
                                    "daily limit", "tokens per day")):
    return True

2. Fallback chain gated on resolved_provider == "auto" only

In call_llm() (~line 2293):

is_auto = resolved_provider in ("auto", "", None)
if should_fallback and is_auto:

When a task resolves to a specific provider (e.g., "custom" for a LiteLLM proxy, or "openrouter"), the fallback chain is completely disabled. If that provider fails with a retriable error, call_llm raises instead of trying alternatives.

This is overly conservative. The intent is to respect explicit provider choice, but when the error is clearly "this provider can't serve right now" (payment, quota, connection), trying alternatives is better than failing entirely — especially for background tasks like context compression where the user didn't explicitly choose a provider.

Suggested fix: Allow fallback for quota/payment/connection errors regardless of provider resolution source, or at minimum for tasks where the provider was resolved via auto-detection chain rather than explicit user config:

# Allow fallback when the error is clearly about provider capacity,
# not about the request itself (4xx client errors etc.)
if should_fallback:
    # ... try fallback chain

Impact

  • Context compaction fails silently when the primary provider hits daily limits
  • Middle conversation turns are dropped without summary
  • Agent loses task context and starts repeating work or acting confused
  • Affects any deployment using provider rate limits (Bedrock, Vertex AI, free-tier OpenRouter, etc.)

Environment

  • hermes-agent 0.8.0
  • LiteLLM proxy routing to Bedrock (daily token limit) with Anthropic as fallback
  • Context compressor calls call_llm(task="compression", ...) which resolves to the custom/litellm provider

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota) [2 pull requests]