hermes - ✅(Solved) Fix Tight fallback-switch loop when multiple providers fail non-retryably can exhaust host memory [2 pull requests, 1 participants]

hermes2026-05-13 12:30:54

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#24996•Fetched 2026-05-14 03:49:54

View on GitHub

Comments

Participants

Timeline

Reactions

Author

drabekj

Participants

drabekj

Timeline (top)

labeled ×3cross-referenced ×2referenced ×1

When the fallback chain contains multiple providers and all of them return non-retryable errors back-to-back (e.g., one with depleted credits, one rate-limited, one rejecting the request), _try_activate_fallback is called repeatedly with no interval and no cumulative limit. With large session contexts (~80k tokens, 100+ messages), this produces a tight loop that re-marshals the full context per attempt and can exhaust host memory/swap.

On constrained hosts (e.g., self-hosted single-board hardware), this can drive load average into the dozens and require manual power-cycle to recover.

Error Message

HH:MM:SS ⚠️ API call failed (attempt 1/3): BadRequestError [HTTP 400] HH:MM:SS 📝 Error: HTTP 400: This model only supports single tool-calls at once! HH:MM:SS ⚠️ Non-retryable error (HTTP 400) — trying fallback... HH:MM:SS ⚠️ API call failed (attempt 1/3): APIStatusError [HTTP 402] HH:MM:SS 📝 Error: HTTP 402: Insufficient credits. HH:MM:SS ⚠️ Rate limited — switching to fallback provider... HH:MM:SS ⚠️ API call failed (attempt 1/3): RateLimitError [HTTP 429] ... (loops within sub-second windows) HH:MM:SS ⚠️ Skipping session persistence for large failed session to prevent growth loop.

Root Cause

In run_agent.py::HermesAgent._try_activate_fallback (~line 6285), two call sites loop back immediately with retry_count = 0 after activation:

Rate-limit-triggered switch (~line 10417)
Non-retryable client error switch (~line 10677)

Both reset the retry counter and continue, with no minimum interval between activations and no cap on activations in a time window.

Fix Action

Fixed

Fixed by PR: fix(agent): add circuit-breaker to _try_activate_fallback to prevent tight retry loops (https://github.com/NousResearch/hermes-agent/pull/24998)
Fixed by PR: fix(run_agent): circuit-breaker on fallback activations to prevent memory storm (fixes #24996) (https://github.com/NousResearch/hermes-agent/pull/25059)

PR fix notes

PR #24998: fix(agent): add circuit-breaker to _try_activate_fallback to prevent tight retry loops

Repository: NousResearch/hermes-agent
Author: drabekj
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/24998

Description (problem / solution / changelog)

Summary

Adds a circuit-breaker to AIAgent._try_activate_fallback so that when every provider in the fallback chain returns non-retryable errors back-to-back, the function self-throttles and ultimately trips, preventing host memory exhaustion from a runaway retry loop.

Two protections, both at the top of the function:

Throttle: enforce ≥2s between consecutive activations
Breaker: ≥5 activations in a 60s rolling window → return False, which existing call sites already handle as "chain exhausted"

Thresholds exposed as class constants (_FALLBACK_THROTTLE_INTERVAL_S, _FALLBACK_BREAKER_WINDOW_S, _FALLBACK_BREAKER_LIMIT) for easy tuning.

Motivation

Refs #24996.

When the primary provider returns a non-retryable error (HTTP 402 — depleted credits, HTTP 400 — request rejected, etc.) and the fallback chain has providers that themselves fail non-retryably (rate-limited, model-incompatible request shape), the existing code resets retry_count = 0 and continues the outer loop. There is no minimum interval between activations and no cap. On a large accumulated session context (~80k tokens), each iteration re-marshals the context, and hundreds of activations per second can pile up in under a second of wall-clock time, exhausting RAM + swap on constrained hosts (single-board / self-hosted setups).

The existing #1630 guard ("Skipping session persistence for large failed session") prevents context growth across persisted-restart cycles. This PR addresses a complementary failure mode: rapid in-process activation churn.

How it works

_FALLBACK_THROTTLE_INTERVAL_S = 2.0
_FALLBACK_BREAKER_WINDOW_S = 60.0
_FALLBACK_BREAKER_LIMIT = 5

def _try_activate_fallback(self, reason=None) -> bool:
    # ... docstring ...
    if not hasattr(self, "_fallback_activations"):
        self._fallback_activations = []
    now = time.monotonic()
    # Drop activations older than the breaker window
    self._fallback_activations = [t for t in self._fallback_activations
                                  if now - t < self._FALLBACK_BREAKER_WINDOW_S]
    if len(self._fallback_activations) >= self._FALLBACK_BREAKER_LIMIT:
        self._emit_status("🛑 Circuit-breaker tripped: ...")
        return False
    if self._fallback_activations:
        elapsed = now - self._fallback_activations[-1]
        if elapsed < self._FALLBACK_THROTTLE_INTERVAL_S:
            time.sleep(self._FALLBACK_THROTTLE_INTERVAL_S - elapsed)
    self._fallback_activations.append(time.monotonic())
    # ... existing logic (rate-limit cooldown etc.) continues ...

Return value False reaches existing handlers cleanly — every if self._try_activate_fallback(): continue call site already has the fall-through "fallback exhausted" path.

Notes / open questions

Recursive intra-chain skips (e.g., the existing return self._try_activate_fallback() for invalid/duplicate entries) will increment the activation counter under this implementation. In practice this is harmless: those recursive calls are cheap (no API call) and rare (mis-configured fallback entries). If maintainers prefer, the counter could be scoped to only external entries via a recursion-depth flag.
No tests added yet — happy to add one modeled on tests/run_agent/test_1630_context_overflow_loop.py if the approach is accepted.
Thresholds are conservative defaults (2s interval, 5/60s limit). Tuning may be warranted based on the expected legitimate fallback velocity.

Closes / Refs

Refs #24996.

Changed files

run_agent.py (modified, +35/-0)

PR #25059: fix(run_agent): circuit-breaker on fallback activations to prevent memory storm (fixes #24996)

Repository: NousResearch/hermes-agent
Author: shanewas
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/25059

Description (problem / solution / changelog)

Fixes #24996

What broke

When all fallback providers fail back-to-back (402, 429, 400), _try_activate_fallback() is called with no interval and no cumulative cap. With large sessions (~80k tokens, 100+ messages), this produces a tight loop that re-marshals full context per attempt and exhausts host memory/swap. On constrained hosts this can drive load into the dozens and require manual power-cycle recovery.

Root cause

_try_activate_fallback() at ~line 8620 had no throttle between activations and no cap on how many times it could fire in a time window.

Fix

Added a per-instance circuit-breaker at the top of _try_activate_fallback():

Throttle: ≥2s between consecutive activations (sleeps if called sooner)
Breaker: ≥5 activations in 60s → returns False (chain-exhausted signal existing code already handles cleanly)
Status messages emitted for both throttle sleeps and breaker trips

Tests

8 new unit tests in tests/run_agent/test_fallback_circuit_breaker.py
All 30 fallback + circuit-breaker tests pass
Pre-existing failure in test_async_httpx_del_neuter.py confirmed as pre-existing (exists on main, unrelated to this change)

Files changed

run_agent.py — circuit-breaker added to _try_activate_fallback()
tests/run_agent/test_fallback_circuit_breaker.py — 8 new tests

Changed files

run_agent.py (modified, +29/-0)
tests/run_agent/test_fallback_circuit_breaker.py (added, +247/-0)

Code Example

HH:MM:SS ⚠️  API call failed (attempt 1/3): BadRequestError [HTTP 400]
HH:MM:SS    📝 Error: HTTP 400: This model only supports single tool-calls at once!
HH:MM:SS ⚠️  Non-retryable error (HTTP 400) — trying fallback...
HH:MM:SS ⚠️  API call failed (attempt 1/3): APIStatusError [HTTP 402]
HH:MM:SS    📝 Error: HTTP 402: Insufficient credits.
HH:MM:SS ⚠️  Rate limited — switching to fallback provider...
HH:MM:SS ⚠️  API call failed (attempt 1/3): RateLimitError [HTTP 429]
... (loops within sub-second windows)
HH:MM:SS ⚠️  Skipping session persistence for large failed session to prevent growth loop.

---

def _try_activate_fallback(self) -> bool:
        """..."""
+       # Circuit-breaker: prevent tight fallback-switch loops when every
+       # provider fails back-to-back.  Two protections: throttle (>=2s
+       # between activations) + breaker (>=5 activations in 60s -> return
+       # False, exhausting the chain cleanly).
+       import time as _cb_time
+       if not hasattr(self, "_fallback_activations"):
+           self._fallback_activations = []
+       _now = _cb_time.monotonic()
+       self._fallback_activations = [t for t in self._fallback_activations if _now - t < 60.0]
+       if len(self._fallback_activations) >= 5:
+           try:
+               self._emit_status(
+                   "🛑 Circuit-breaker tripped: 5 fallback activations in 60s. "
+                   "Aborting to prevent retry storm."
+               )
+           except Exception:
+               pass
+           return False
+       if self._fallback_activations:
+           _elapsed = _now - self._fallback_activations[-1]
+           if _elapsed < 2.0:
+               _wait = 2.0 - _elapsed
+               try:
+                   self._emit_status(f"⏸️ Circuit-breaker: sleeping {_wait:.1f}s before fallback switch")
+               except Exception:
+                   pass
+               _cb_time.sleep(_wait)
+       self._fallback_activations.append(_cb_time.monotonic())

        if self._fallback_index >= len(self._fallback_chain):
            return False

RAW_BUFFERClick to expand / collapse

Summary

On constrained hosts (e.g., self-hosted single-board hardware), this can drive load average into the dozens and require manual power-cycle to recover.

Reproduction

Configure a fallback chain with at least 2 providers.
Arrange for all providers to fail non-retryably in quick succession, for example:
- Primary returns HTTP 402 (e.g., OpenRouter with no credits)
- Fallback A returns HTTP 429 (rate-limited)
- Fallback B returns HTTP 400 (e.g., model rejects the request shape, like multi-tool-call when not supported)
Trigger a request with a large accumulated context (~80k tokens, 100+ messages).

Observed

Hundreds of "switching to fallback provider" / "trying fallback" log events within a single wall-clock second, followed by sustained memory pressure. Sample (sanitized):

HH:MM:SS ⚠️  API call failed (attempt 1/3): BadRequestError [HTTP 400]
HH:MM:SS    📝 Error: HTTP 400: This model only supports single tool-calls at once!
HH:MM:SS ⚠️  Non-retryable error (HTTP 400) — trying fallback...
HH:MM:SS ⚠️  API call failed (attempt 1/3): APIStatusError [HTTP 402]
HH:MM:SS    📝 Error: HTTP 402: Insufficient credits.
HH:MM:SS ⚠️  Rate limited — switching to fallback provider...
HH:MM:SS ⚠️  API call failed (attempt 1/3): RateLimitError [HTTP 429]
... (loops within sub-second windows)
HH:MM:SS ⚠️  Skipping session persistence for large failed session to prevent growth loop.

The existing "Skipping session persistence" guard (#1630) handles intra-request context overflow well, but a separate failure mode involves repeated activations across messages with no minimum interval.

Root cause

In run_agent.py::HermesAgent._try_activate_fallback (~line 6285), two call sites loop back immediately with retry_count = 0 after activation:

Rate-limit-triggered switch (~line 10417)
Non-retryable client error switch (~line 10677)

Both reset the retry counter and continue, with no minimum interval between activations and no cap on activations in a time window.

Proposed fix

Add a per-instance circuit-breaker at the top of _try_activate_fallback:

Throttle: enforce ≥2s between consecutive activations
Breaker: ≥5 activations in 60s → return False (chain-exhausted signal that existing code already handles cleanly)

Diff:

    def _try_activate_fallback(self) -> bool:
        """..."""
+       # Circuit-breaker: prevent tight fallback-switch loops when every
+       # provider fails back-to-back.  Two protections: throttle (>=2s
+       # between activations) + breaker (>=5 activations in 60s -> return
+       # False, exhausting the chain cleanly).
+       import time as _cb_time
+       if not hasattr(self, "_fallback_activations"):
+           self._fallback_activations = []
+       _now = _cb_time.monotonic()
+       self._fallback_activations = [t for t in self._fallback_activations if _now - t < 60.0]
+       if len(self._fallback_activations) >= 5:
+           try:
+               self._emit_status(
+                   "🛑 Circuit-breaker tripped: 5 fallback activations in 60s. "
+                   "Aborting to prevent retry storm."
+               )
+           except Exception:
+               pass
+           return False
+       if self._fallback_activations:
+           _elapsed = _now - self._fallback_activations[-1]
+           if _elapsed < 2.0:
+               _wait = 2.0 - _elapsed
+               try:
+                   self._emit_status(f"⏸️ Circuit-breaker: sleeping {_wait:.1f}s before fallback switch")
+               except Exception:
+                   pass
+               _cb_time.sleep(_wait)
+       self._fallback_activations.append(_cb_time.monotonic())

        if self._fallback_index >= len(self._fallback_chain):
            return False

Happy to submit as a PR if the approach seems acceptable. Tests could be modeled on tests/run_agent/test_1630_context_overflow_loop.py.

Environment

Hermes commit: 627abbb1
Linux

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #memory optimization #batch processing #GPU compatibility #latency issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Tight fallback-switch loop when multiple providers fail non-retryably can exhaust host memory [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #24998: fix(agent): add circuit-breaker to _try_activate_fallback to prevent tight retry loops

Description (problem / solution / changelog)

Summary

Motivation

How it works

Notes / open questions

Closes / Refs

Changed files

PR #25059: fix(run_agent): circuit-breaker on fallback activations to prevent memory storm (fixes #24996)

Description (problem / solution / changelog)

Fixes #24996

What broke

Root cause

Fix

Tests

Files changed

Changed files

Code Example

Summary

Reproduction

Observed

Root cause

Proposed fix

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING