hermes - 💡(How to fix) Fix Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota) [2 pull requests]

hermes2026-05-16 07:05:26

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

When a task resolves to a specific provider (e.g., "custom" for a LiteLLM proxy, or "openrouter"), the fallback chain is completely disabled. If that provider fails with a retriable error, call_llm raises instead of trying alternatives. This is overly conservative. The intent is to respect explicit provider choice, but when the error is clearly "this provider can't serve right now" (payment, quota, connection), trying alternatives is better than failing entirely — especially for background tasks like context compression where the user didn't explicitly choose a provider.

Allow fallback when the error is clearly about provider capacity,

Root Cause

Two root causes:

Fix Action

Fixed

Fixed by PR: fix(auxiliary): detect quota keywords in _is_payment_error and allow fallback for explicit providers (https://github.com/NousResearch/hermes-agent/pull/26809)
Fixed by PR: fix(auxiliary): detect quota exhaustion as payment error; allow capacity-error fallback for explicit providers (https://github.com/NousResearch/hermes-agent/pull/26811)

Code Example

# In _is_payment_error or a new _is_quota_exhaustion check:
if any(kw in err_lower for kw in ("quota", "too many tokens", "rate limit exceeded",
                                    "daily limit", "tokens per day")):
    return True

---

is_auto = resolved_provider in ("auto", "", None)
if should_fallback and is_auto:

---

# Allow fallback when the error is clearly about provider capacity,
# not about the request itself (4xx client errors etc.)
if should_fallback:
    # ... try fallback chain

RAW_BUFFERClick to expand / collapse

Problem

When the auxiliary LLM provider (used for context compression, memory flush, web extraction, etc.) returns a 429 rate limit with a daily quota message like "Too many tokens per day", the fallback chain in call_llm() does not activate. This causes context compaction to silently fail, dropping conversation history without a summary.

Two root causes:

1. Daily rate limits not classified as fallback-worthy errors

_is_payment_error() checks for keywords like "credits", "insufficient funds", "billing", "payment required" — but daily token quota exhaustion (common with Bedrock, Vertex AI, and other cloud providers) uses different language like "Too many tokens per day" or "quota exceeded". These are functionally identical to credit exhaustion but don't trigger fallback.

Suggested fix: Add quota-related keywords to _is_payment_error() or create a separate _is_quota_error():

# In _is_payment_error or a new _is_quota_exhaustion check:
if any(kw in err_lower for kw in ("quota", "too many tokens", "rate limit exceeded",
                                    "daily limit", "tokens per day")):
    return True

2. Fallback chain gated on `resolved_provider == "auto"` only

In call_llm() (~line 2293):

is_auto = resolved_provider in ("auto", "", None)
if should_fallback and is_auto:

This is overly conservative. The intent is to respect explicit provider choice, but when the error is clearly "this provider can't serve right now" (payment, quota, connection), trying alternatives is better than failing entirely — especially for background tasks like context compression where the user didn't explicitly choose a provider.

Suggested fix: Allow fallback for quota/payment/connection errors regardless of provider resolution source, or at minimum for tasks where the provider was resolved via auto-detection chain rather than explicit user config:

# Allow fallback when the error is clearly about provider capacity,
# not about the request itself (4xx client errors etc.)
if should_fallback:
    # ... try fallback chain

Impact

Context compaction fails silently when the primary provider hits daily limits
Middle conversation turns are dropped without summary
Agent loses task context and starts repeating work or acting confused
Affects any deployment using provider rate limits (Bedrock, Vertex AI, free-tier OpenRouter, etc.)

Environment

hermes-agent 0.8.0
LiteLLM proxy routing to Bedrock (daily token limit) with Anthropic as fallback
Context compressor calls call_llm(task="compression", ...) which resolves to the custom/litellm provider

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #GPU setup #container setup #orchestration issue #cache issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota) [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Allow fallback when the error is clearly about provider capacity,

Root Cause

Fix Action

Fixed

Code Example

Problem

1. Daily rate limits not classified as fallback-worthy errors

2. Fallback chain gated on `resolved_provider == "auto"` only

Impact

Environment

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Auxiliary call_llm fallback doesn't trigger on provider rate limits (429 daily quota) [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Allow fallback when the error is clearly about provider capacity,

Root Cause

Fix Action

Fixed

Code Example

Problem

1. Daily rate limits not classified as fallback-worthy errors

2. Fallback chain gated on resolved_provider == "auto" only

Impact

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING

2. Fallback chain gated on `resolved_provider == "auto"` only