hermes - ✅(Solved) Fix _restore_primary_runtime() doesn't check credential cooldown — burns retries every turn while provider is exhausted [3 pull requests, 1 comments, 2 participants]

hermes2026-04-24 18:33:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15298•Fetched 2026-04-25 06:23:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mordekai-lab

Participants

alt-glitch

mordekai-lab

Timeline (top)

cross-referenced ×3labeled ×3commented ×1

_restore_primary_runtime() unconditionally restores the primary provider at the start of every new turn, even when the primary provider's credential is still in exhaustion cooldown. This causes wasted retries on every turn until the cooldown expires.

Root Cause

Fix Action

Fixed

Fixed by PR: fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297) (https://github.com/NousResearch/hermes-agent/pull/15414)
Fixed by PR: fix(agent): check credential pool exhaustion before restoring primary runtime (#15298) (https://github.com/NousResearch/hermes-agent/pull/15434)
Fixed by PR: fix(credential-pool): exponential backoff on repeated exhaustion (#15296) (https://github.com/NousResearch/hermes-agent/pull/15455)

PR fix notes

PR #15414: fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)

Repository: NousResearch/hermes-agent
Author: briandevans
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15414

Description (problem / solution / changelog)

What does this PR do?

Fixes `#15297`. Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages like `"The service may be temporarily overloaded, please try again later"` to signal server-wide overload, not per-credential rate limiting.

The two conditions need different recovery strategies:

Reason	Correct strategy
`rate_limit`	Retry same credential once (the limit may have reset), then rotate
`overloaded`	Skip retry, rotate immediately (the whole endpoint is saturated)

Before this fix:

`error_classifier.py` mapped every 429 to `FailoverReason.rate_limit` regardless of message body.
`FailoverReason.overloaded` already existed as an enum value (and was produced for 503/529) but no production path emitted it for 429.
`_recover_with_credential_pool` had no handler for `overloaded` — an `overloaded` classification fell through to the default no-op `return False, has_retried_429` line.

Net effect: every overload-language 429 burned a `has_retried_429` slot on the same saturated credential before the rotation happened. Cron jobs (one turn each) often used their entire execution on that wasted retry.

Fix — two narrow, additive changes

1. `agent/error_classifier.py` — disambiguate 429s on message body

New `_OVERLOADED_PATTERNS` list of provider-language overload phrases:

```python _OVERLOADED_PATTERNS = [ "temporarily overloaded", "server is overloaded", "server overloaded", "service overloaded", "service is overloaded", "service may be temporarily overloaded", "upstream overloaded", "is overloaded, please try again", "at capacity", "over capacity", "currently overloaded", ] ```

The 429 branch now checks the error body and emits `FailoverReason.overloaded` when matched, preserving `rate_limit` for everything else. Phrases are deliberately narrow and provider-language-flavoured so normal rate-limit messages (`"you have been rate-limited"`, `"Too Many Requests"`) don't hit this bucket.

2. `run_agent.py::_recover_with_credential_pool` — handle `overloaded`

New branch with the same shape as the existing `billing` handler: rotate immediately via `mark_exhausted_and_rotate(...)`, no retry-on-same-credential first. The 503/529 `overloaded` classifications produced by the existing code now also flow through this branch as a side benefit.

Related Issue

Fixes #15297

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✅ Tests (adding or improving test coverage)

Test plan

15 new tests in 2 files — all green on py3.11 venv (145 total in the affected test files)
Regression guards verified: temporarily reverted the classifier's overload branch; 10 of the 11 new classifier tests correctly failed with clear assertion messages pointing at the regressed invariant. Restored fix → all 145 pass.
Pre-existing tests still green:
- `test_error_classifier.py` — 102/102 (was 95, +7 new)
- `test_credential_pool_routing.py` — 13/13 (was 9, +4 new)

Test coverage detail

Classifier (`tests/agent/test_error_classifier.py` — 7 new):

`test_429_zai_temporarily_overloaded` — exact #15297 Z.AI message
`test_429_overload_phrases_route_to_overloaded` — 9-phrase parametrised matrix (server/service/upstream overloaded, at/over capacity, `currently overloaded`, `is overloaded, please try again`, …)
`test_429_plain_rate_limit_remains_rate_limit` — "Rate limit reached for requests per minute" 429 → still `rate_limit` (regression guard against disambiguation silently broadening)
`test_429_too_many_requests_remains_rate_limit` — literal HTTP 429 reason phrase → still `rate_limit`
`test_503_overloaded_path_unchanged` — sanity check that the existing 503 → `overloaded` path didn't regress

Pool handler (`tests/agent/test_credential_pool_routing.py` — 4 new):

`test_overloaded_rotates_immediately_on_first_failure` — the fix
`test_overloaded_does_not_block_on_has_retried_flag` — overloaded rotates regardless of retry-flag state
`test_overloaded_pool_exhaustion_returns_false` — single-entry pool returns `recovered=False` so outer fallback takes over
`test_rate_limit_still_uses_retry_first` — negative control: rate_limit must KEEP its retry-first semantics, otherwise normal per-key throttles would double-rotate and burn the pool

Not in scope

The companion #15298 (`_restore_primary_runtime` cooldown check) and #15296 (credential pool exponential backoff) issues — same reporter, related but independent diffs. Kept this PR scoped to the classifier + handler so review stays small.
Other status codes that providers misuse for overload (e.g. some providers return 502 for cold-start latency rather than crash) — only addressing the documented 429 case here.

Changed files

agent/error_classifier.py (modified, +37/-1)
run_agent.py (modified, +21/-3)
tests/agent/test_credential_pool_routing.py (modified, +131/-0)
tests/agent/test_error_classifier.py (modified, +79/-0)

PR #15434: fix(agent): check credential pool exhaustion before restoring primary runtime (#15298)

Repository: NousResearch/hermes-agent
Author: Tranquil-Flow
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15434

Description (problem / solution / changelog)

Summary

Fixes #15298.

_restore_primary_runtime() checks a 60-second rate-limit timer (_rate_limited_until) but does not consult the credential pool's exhaustion state. After the 60-second timer expires, it attempts to restore the primary provider every turn even if the credential pool still marks that provider as exhausted. This burns retries and generates noise during extended outages.

Changes

run_agent.py: After the _rate_limited_until check in _restore_primary_runtime(), added a guard that consults self._credential_pool.has_available(). If the pool exists for the primary provider and reports all credentials exhausted, restoration is skipped. This integrates with the existing CredentialPool architecture (which already tracks per-credential exhaustion with cooldowns) rather than adding another independent timer.

Validation

Tests: tests/run_agent/test_primary_runtime_restore.py — 4 new tests in TestCredentialPoolExhaustionGate:

Pool exhausted, same provider: restoration blocked
Pool has credentials: restoration proceeds
Pool for different provider: not blocked
No pool at all: not blocked

Tested on macOS (Python 3.11).

Changed files

run_agent.py (modified, +10/-0)
tests/run_agent/test_primary_runtime_restore.py (modified, +95/-0)

PR #15455: fix(credential-pool): exponential backoff on repeated exhaustion (#15296)

Repository: NousResearch/hermes-agent
Author: briandevans
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15455

Description (problem / solution / changelog)

What does this PR do?

Fixes `#15296`. The pool's `exhausted_ttl` returned a flat `EXHAUSTED_TTL*` constant regardless of how many consecutive times the same credential had failed. When a provider was overloaded for hours, the pool cycled through:

TTL expires → credential cleared back to `ok`
Provider tried again → fails with the same error (3 retries wasted)
Credential re-exhausted with the same flat TTL
Wait for TTL to expire again → repeat

For cron jobs (each run = one turn), every execution burned its retry budget on the same dead credential until the upstream actually recovered.

Fix

New `consecutive_failures: int = 0` field on `PooledCredential`
`_mark_exhausted` increments it from the prior value (defensive `getattr` + `or 0` so legacy on-disk entries that predate the field default to 0 via the dataclass and get promoted to 1 on first failure)
`_exhausted_ttl(error_code, consecutive_failures=0)` now applies `base_ttl * min(2 ** (failures - 1), 8)` — capped at 8× so a long-running outage doesn't push the cooldown into days. The default-arg keeps backward compat with any caller still using the old single-arg signature.
`_exhausted_until` passes `entry.consecutive_failures` through
Resets to 0 happen on real success markers (refresh produces fresh tokens, anthropic claude_code sync from credentials file, nous adopts newer tokens from auth.json) and on operator-driven `reset_statuses()`
The cooldown auto-clear path (`_available_entries(clear_expired=True)` flipping `last_status` back to `STATUS_OK` when the TTL elapses) deliberately does NOT reset the counter — that's the entire point of the fix. If the same credential exhausts again right after the cooldown expires, the upstream is still down and the next cooldown should be longer.

Backoff progression

consecutive_failures	multiplier	cooldown (default 1h base)
0 (default)	1×	1h
1 (first)	1×	1h
2	2×	2h
3	4×	4h
4	8×	8h
5+	8×	8h (cap)

The 8h cap matters: without it, a credential that's been failing for a week could end up on a multi-day cooldown that survives operator intervention (refreshing the upstream provider, swapping API keys, etc.).

Related Issue

Fixes #15296

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✅ Tests (adding or improving test coverage)

Test plan

21 new tests in `tests/agent/test_credential_pool_backoff.py` — all green on py3.11 venv
All 44 pre-existing `test_credential_pool.py` tests still pass — backward compat preserved (the default-argument keeps the single-arg signature working)
Verified regression guards: temporarily reverted the `_mark_exhausted` increment to a no-op; 4 tests in `TestMarkExhaustedIncrement` correctly failed with clear assertion messages pointing at the regressed invariant. Restored fix → all 30 backoff tests pass.

Test coverage detail

`TestExhaustedTtl` (8 cases):

Backward-compat default (no `consecutive_failures` arg → flat base TTL)
`consecutive_failures=1` is 1× (not 2×) — prevents off-by-one on the very first failure
8-row parametrised matrix on 429 progression: 1× → 2× → 4× → 8× → cap
Same matrix on 402 (non-429 cooldowns also back off)
`consecutive_failures=0` defensive default
Negative count clamped to base (corrupted on-disk entry safety)
Cap constant pinned to 8 so a refactor flagging this test prompts a docstring update

`TestExhaustedUntil` (4 cases):

First failure → 1× base TTL after `last_status_at`
Fourth failure → 8× base TTL
20 failures → still 8× (cap holds)
Explicit `last_error_reset_at` from upstream wins over backoff (provider-supplied resets are absolute truth)

`TestMarkExhaustedIncrement` (4 cases):

0 → 1 on first exhaustion
2 → 3 on consecutive (the streak persists)
Round-trips through pool reload (persistence)
Legacy on-disk entry without the field → loads as 0, increments to 1

`TestCounterResetSemantics` (3 cases):

`reset_statuses()` zeroes the counter (operator path)
`clear_expired` does NOT zero the counter — the bug-fix invariant; without this assertion a future refactor could silently break the backoff
`to_dict`/`from_dict` round-trips the field

Companion PRs

This is one of three related credential-pool resilience fixes filed by @mordekai-lab:

#15297 — error classifier disambiguates 429 "overloaded" from rate_limit (my open PR #15414)
#15298 — `_restore_primary_runtime` cooldown check (Tranquil-Flow's #15434)
#15296 — this PR

The three are independent and can land in any order.

Not in scope

Per-provider tuning of the cap or base TTL — currently uses the same 8× cap for every provider. A future enhancement could let providers configure their own backoff curve, but that's a bigger surface than this user-facing regression needs to unblock.

Changed files

agent/credential_pool.py (modified, +108/-6)
tests/agent/test_credential_pool_backoff.py (added, +409/-0)

Code Example

def _restore_primary_runtime(self) -> bool:
    if not self._fallback_activated:
        return False
    # ← No cooldown check here — always restores
    rt = self._primary_runtime
    # ... restores model, provider, client, etc.

---

def _restore_primary_runtime(self) -> bool:
    if not self._fallback_activated:
        return False

    # Check if primary credential is still in cooldown
    pool = self._credential_pool
    if pool is not None:
        rt_provider = self._primary_runtime.get("provider", "")
        if rt_provider and pool.provider == rt_provider:
            current_entry = pool.peek()
            if current_entry and current_entry.last_status == "exhausted":
                cooldown_until = _exhausted_until(current_entry)
                if cooldown_until is not None and time.time() < cooldown_until:
                    # Stay on fallback — primary still cooling down
                    return False

    rt = self._primary_runtime
    # ... existing restore logic

RAW_BUFFERClick to expand / collapse

Summary

How It Happens

Turn N: Primary provider returns 429 → credential marked exhausted → fallback activated
Turn N+1: _restore_primary_runtime() runs at top of run_conversation() → restores primary provider
Turn N+1: Primary tried again → 429 (3 retries burned) → fallback activated again
Repeat every turn until cooldown expires

In long-lived gateway sessions (agent caching), this means every single message burns 3 retries on the exhausted provider before falling back. For cron jobs it's worse — each run only gets one turn, so the retry burns the entire execution.

Affected Code

run_agent.py — _restore_primary_runtime() (line ~6490):

def _restore_primary_runtime(self) -> bool:
    if not self._fallback_activated:
        return False
    # ← No cooldown check here — always restores
    rt = self._primary_runtime
    # ... restores model, provider, client, etc.

The docstring correctly explains why restoration exists (transient failures shouldn't permanently pin to fallback), but it doesn't account for sustained cooldown periods.

Proposed Fix

Before restoring, check if the primary provider's credential pool entry is still in cooldown:

def _restore_primary_runtime(self) -> bool:
    if not self._fallback_activated:
        return False

    # Check if primary credential is still in cooldown
    pool = self._credential_pool
    if pool is not None:
        rt_provider = self._primary_runtime.get("provider", "")
        if rt_provider and pool.provider == rt_provider:
            current_entry = pool.peek()
            if current_entry and current_entry.last_status == "exhausted":
                cooldown_until = _exhausted_until(current_entry)
                if cooldown_until is not None and time.time() < cooldown_until:
                    # Stay on fallback — primary still cooling down
                    return False

    rt = self._primary_runtime
    # ... existing restore logic

This preserves the existing behavior for transient failures (cooldown expired → restore works as before) while avoiding the retry burn during sustained outages.

Environment

Hermes-agent latest, gateway mode
Credential pool enabled with fill_first strategy
Observed during extended provider overload (429 errors lasting 1+ hours)

extent analysis

TL;DR

Implement a cooldown check in _restore_primary_runtime() to prevent restoring the primary provider during sustained outages.

Guidance

Modify the _restore_primary_runtime() function to check if the primary provider's credential is still in cooldown before restoring it.
Verify the fix by testing the conversation flow during an extended provider overload (e.g., 429 errors lasting 1+ hours) and checking that retries are not burned on the exhausted provider.
Ensure the credential_pool is properly configured and the _exhausted_until() function is correctly calculating the cooldown expiration time.
Review the run_agent.py code to ensure the proposed fix is correctly integrated and does not introduce any new issues.

Example

The proposed fix code snippet provided in the issue body can be used as a starting point:

def _restore_primary_runtime(self) -> bool:
    # ...
    if current_entry and current_entry.last_status == "exhausted":
        cooldown_until = _exhausted_until(current_entry)
        if cooldown_until is not None and time.time() < cooldown_until:
            # Stay on fallback — primary still cooling down
            return False
    # ...

Notes

This fix assumes that the credential_pool and _exhausted_until() function are correctly implemented and functioning as expected. Additional testing and verification may be necessary to ensure the fix works as intended in all scenarios.

Recommendation

Apply the proposed workaround by implementing the cooldown check in _restore_primary_runtime() to prevent retry burns during sustained outages, as it directly addresses the identified issue and preserves the existing behavior for transient failures.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#request error #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix _restore_primary_runtime() doesn't check credential cooldown — burns retries every turn while provider is exhausted [3 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #15414: fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)

Description (problem / solution / changelog)

What does this PR do?

Fix — two narrow, additive changes

1. `agent/error_classifier.py` — disambiguate 429s on message body

2. `run_agent.py::_recover_with_credential_pool` — handle `overloaded`

Related Issue

Type of Change

Test plan

Test coverage detail

Not in scope

Changed files

PR #15434: fix(agent): check credential pool exhaustion before restoring primary runtime (#15298)

Description (problem / solution / changelog)

Summary

Changes

Validation

Changed files

PR #15455: fix(credential-pool): exponential backoff on repeated exhaustion (#15296)

Description (problem / solution / changelog)

What does this PR do?

Fix

Backoff progression

Related Issue

Type of Change

Test plan

Test coverage detail

Companion PRs

Not in scope

Changed files

Code Example

Summary

How It Happens

Affected Code

Proposed Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING