hermes - ✅(Solved) Fix Error classifier: 429 'temporarily overloaded' misclassified as rate_limit — triggers wrong recovery path [2 pull requests, 1 comments, 2 participants]

hermes2026-04-24 18:33:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#15297•Fetched 2026-04-25 06:23:07

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mordekai-lab

Participants

alt-glitch

mordekai-lab

Timeline (top)

labeled ×4cross-referenced ×2commented ×1

The error classifier treats all HTTP 429 responses as FailoverReason.rate_limit, regardless of whether the 429 indicates a per-key rate limit or a server-side overload. This causes the wrong recovery strategy to be used.

Some providers (e.g. Z.AI/Zhipu) return HTTP 429 with messages like:

HTTP 429: The service may be temporarily overloaded, please try again later

This is a server-side overload — the entire provider endpoint is struggling, not just this API key hitting a per-key quota. The recovery strategy should be different:

Reason	Correct Behavior
`rate_limit`	Retry same credential once, then rotate to next key
`overloaded`	Skip retry, rotate immediately (the whole provider is down)

Error Message

In error_classifier.py: inspect the error message for overload patterns ("temporarily overloaded", "server is overloaded", "capacity", etc.) and classify as FailoverReason.overloaded instead of rate_limit

No retry_after header or resets_at field in the error response

Root Cause

Some providers (e.g. Z.AI/Zhipu) return HTTP 429 with messages like:

HTTP 429: The service may be temporarily overloaded, please try again later

This is a server-side overload — the entire provider endpoint is struggling, not just this API key hitting a per-key quota. The recovery strategy should be different:

Reason	Correct Behavior
`rate_limit`	Retry same credential once, then rotate to next key
`overloaded`	Skip retry, rotate immediately (the whole provider is down)

Fix Action

Fixed

Fixed by PR: fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297) (https://github.com/NousResearch/hermes-agent/pull/15414)
Fixed by PR: fix(credential-pool): exponential backoff on repeated exhaustion (#15296) (https://github.com/NousResearch/hermes-agent/pull/15455)

PR fix notes

PR #15414: fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)

Repository: NousResearch/hermes-agent
Author: briandevans
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15414

Description (problem / solution / changelog)

What does this PR do?

Fixes `#15297`. Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages like `"The service may be temporarily overloaded, please try again later"` to signal server-wide overload, not per-credential rate limiting.

The two conditions need different recovery strategies:

Reason	Correct strategy
`rate_limit`	Retry same credential once (the limit may have reset), then rotate
`overloaded`	Skip retry, rotate immediately (the whole endpoint is saturated)

Before this fix:

`error_classifier.py` mapped every 429 to `FailoverReason.rate_limit` regardless of message body.
`FailoverReason.overloaded` already existed as an enum value (and was produced for 503/529) but no production path emitted it for 429.
`_recover_with_credential_pool` had no handler for `overloaded` — an `overloaded` classification fell through to the default no-op `return False, has_retried_429` line.

Net effect: every overload-language 429 burned a `has_retried_429` slot on the same saturated credential before the rotation happened. Cron jobs (one turn each) often used their entire execution on that wasted retry.

Fix — two narrow, additive changes

1. `agent/error_classifier.py` — disambiguate 429s on message body

New `_OVERLOADED_PATTERNS` list of provider-language overload phrases:

```python _OVERLOADED_PATTERNS = [ "temporarily overloaded", "server is overloaded", "server overloaded", "service overloaded", "service is overloaded", "service may be temporarily overloaded", "upstream overloaded", "is overloaded, please try again", "at capacity", "over capacity", "currently overloaded", ] ```

The 429 branch now checks the error body and emits `FailoverReason.overloaded` when matched, preserving `rate_limit` for everything else. Phrases are deliberately narrow and provider-language-flavoured so normal rate-limit messages (`"you have been rate-limited"`, `"Too Many Requests"`) don't hit this bucket.

2. `run_agent.py::_recover_with_credential_pool` — handle `overloaded`

New branch with the same shape as the existing `billing` handler: rotate immediately via `mark_exhausted_and_rotate(...)`, no retry-on-same-credential first. The 503/529 `overloaded` classifications produced by the existing code now also flow through this branch as a side benefit.

Related Issue

Fixes #15297

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✅ Tests (adding or improving test coverage)

Test plan

15 new tests in 2 files — all green on py3.11 venv (145 total in the affected test files)
Regression guards verified: temporarily reverted the classifier's overload branch; 10 of the 11 new classifier tests correctly failed with clear assertion messages pointing at the regressed invariant. Restored fix → all 145 pass.
Pre-existing tests still green:
- `test_error_classifier.py` — 102/102 (was 95, +7 new)
- `test_credential_pool_routing.py` — 13/13 (was 9, +4 new)

Test coverage detail

Classifier (`tests/agent/test_error_classifier.py` — 7 new):

`test_429_zai_temporarily_overloaded` — exact #15297 Z.AI message
`test_429_overload_phrases_route_to_overloaded` — 9-phrase parametrised matrix (server/service/upstream overloaded, at/over capacity, `currently overloaded`, `is overloaded, please try again`, …)
`test_429_plain_rate_limit_remains_rate_limit` — "Rate limit reached for requests per minute" 429 → still `rate_limit` (regression guard against disambiguation silently broadening)
`test_429_too_many_requests_remains_rate_limit` — literal HTTP 429 reason phrase → still `rate_limit`
`test_503_overloaded_path_unchanged` — sanity check that the existing 503 → `overloaded` path didn't regress

Pool handler (`tests/agent/test_credential_pool_routing.py` — 4 new):

`test_overloaded_rotates_immediately_on_first_failure` — the fix
`test_overloaded_does_not_block_on_has_retried_flag` — overloaded rotates regardless of retry-flag state
`test_overloaded_pool_exhaustion_returns_false` — single-entry pool returns `recovered=False` so outer fallback takes over
`test_rate_limit_still_uses_retry_first` — negative control: rate_limit must KEEP its retry-first semantics, otherwise normal per-key throttles would double-rotate and burn the pool

Not in scope

The companion #15298 (`_restore_primary_runtime` cooldown check) and #15296 (credential pool exponential backoff) issues — same reporter, related but independent diffs. Kept this PR scoped to the classifier + handler so review stays small.
Other status codes that providers misuse for overload (e.g. some providers return 502 for cold-start latency rather than crash) — only addressing the documented 429 case here.

Changed files

agent/error_classifier.py (modified, +37/-1)
run_agent.py (modified, +21/-3)
tests/agent/test_credential_pool_routing.py (modified, +131/-0)
tests/agent/test_error_classifier.py (modified, +79/-0)

PR #15455: fix(credential-pool): exponential backoff on repeated exhaustion (#15296)

Repository: NousResearch/hermes-agent
Author: briandevans
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15455

Description (problem / solution / changelog)

What does this PR do?

Fixes `#15296`. The pool's `exhausted_ttl` returned a flat `EXHAUSTED_TTL*` constant regardless of how many consecutive times the same credential had failed. When a provider was overloaded for hours, the pool cycled through:

TTL expires → credential cleared back to `ok`
Provider tried again → fails with the same error (3 retries wasted)
Credential re-exhausted with the same flat TTL
Wait for TTL to expire again → repeat

For cron jobs (each run = one turn), every execution burned its retry budget on the same dead credential until the upstream actually recovered.

Fix

New `consecutive_failures: int = 0` field on `PooledCredential`
`_mark_exhausted` increments it from the prior value (defensive `getattr` + `or 0` so legacy on-disk entries that predate the field default to 0 via the dataclass and get promoted to 1 on first failure)
`_exhausted_ttl(error_code, consecutive_failures=0)` now applies `base_ttl * min(2 ** (failures - 1), 8)` — capped at 8× so a long-running outage doesn't push the cooldown into days. The default-arg keeps backward compat with any caller still using the old single-arg signature.
`_exhausted_until` passes `entry.consecutive_failures` through
Resets to 0 happen on real success markers (refresh produces fresh tokens, anthropic claude_code sync from credentials file, nous adopts newer tokens from auth.json) and on operator-driven `reset_statuses()`
The cooldown auto-clear path (`_available_entries(clear_expired=True)` flipping `last_status` back to `STATUS_OK` when the TTL elapses) deliberately does NOT reset the counter — that's the entire point of the fix. If the same credential exhausts again right after the cooldown expires, the upstream is still down and the next cooldown should be longer.

Backoff progression

consecutive_failures	multiplier	cooldown (default 1h base)
0 (default)	1×	1h
1 (first)	1×	1h
2	2×	2h
3	4×	4h
4	8×	8h
5+	8×	8h (cap)

The 8h cap matters: without it, a credential that's been failing for a week could end up on a multi-day cooldown that survives operator intervention (refreshing the upstream provider, swapping API keys, etc.).

Related Issue

Fixes #15296

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✅ Tests (adding or improving test coverage)

Test plan

21 new tests in `tests/agent/test_credential_pool_backoff.py` — all green on py3.11 venv
All 44 pre-existing `test_credential_pool.py` tests still pass — backward compat preserved (the default-argument keeps the single-arg signature working)
Verified regression guards: temporarily reverted the `_mark_exhausted` increment to a no-op; 4 tests in `TestMarkExhaustedIncrement` correctly failed with clear assertion messages pointing at the regressed invariant. Restored fix → all 30 backoff tests pass.

Test coverage detail

`TestExhaustedTtl` (8 cases):

Backward-compat default (no `consecutive_failures` arg → flat base TTL)
`consecutive_failures=1` is 1× (not 2×) — prevents off-by-one on the very first failure
8-row parametrised matrix on 429 progression: 1× → 2× → 4× → 8× → cap
Same matrix on 402 (non-429 cooldowns also back off)
`consecutive_failures=0` defensive default
Negative count clamped to base (corrupted on-disk entry safety)
Cap constant pinned to 8 so a refactor flagging this test prompts a docstring update

`TestExhaustedUntil` (4 cases):

First failure → 1× base TTL after `last_status_at`
Fourth failure → 8× base TTL
20 failures → still 8× (cap holds)
Explicit `last_error_reset_at` from upstream wins over backoff (provider-supplied resets are absolute truth)

`TestMarkExhaustedIncrement` (4 cases):

0 → 1 on first exhaustion
2 → 3 on consecutive (the streak persists)
Round-trips through pool reload (persistence)
Legacy on-disk entry without the field → loads as 0, increments to 1

`TestCounterResetSemantics` (3 cases):

`reset_statuses()` zeroes the counter (operator path)
`clear_expired` does NOT zero the counter — the bug-fix invariant; without this assertion a future refactor could silently break the backoff
`to_dict`/`from_dict` round-trips the field

Companion PRs

This is one of three related credential-pool resilience fixes filed by @mordekai-lab:

#15297 — error classifier disambiguates 429 "overloaded" from rate_limit (my open PR #15414)
#15298 — `_restore_primary_runtime` cooldown check (Tranquil-Flow's #15434)
#15296 — this PR

The three are independent and can land in any order.

Not in scope

Per-provider tuning of the cap or base TTL — currently uses the same 8× cap for every provider. A future enhancement could let providers configure their own backoff curve, but that's a bigger surface than this user-facing regression needs to unblock.

Changed files

agent/credential_pool.py (modified, +108/-6)
tests/agent/test_credential_pool_backoff.py (added, +409/-0)

Code Example

HTTP 429: The service may be temporarily overloaded, please try again later

---

if status_code == 429:
    return result_fn(
        FailoverReason.rate_limit,  # ← always rate_limit
        retryable=True,
        should_rotate_credential=True,
        should_fallback=True,
    )

RAW_BUFFERClick to expand / collapse

Summary

Some providers (e.g. Z.AI/Zhipu) return HTTP 429 with messages like:

HTTP 429: The service may be temporarily overloaded, please try again later

This is a server-side overload — the entire provider endpoint is struggling, not just this API key hitting a per-key quota. The recovery strategy should be different:

Reason	Correct Behavior
`rate_limit`	Retry same credential once, then rotate to next key
`overloaded`	Skip retry, rotate immediately (the whole provider is down)

Current Behavior

agent/error_classifier.py (line ~551):

if status_code == 429:
    return result_fn(
        FailoverReason.rate_limit,  # ← always rate_limit
        retryable=True,
        should_rotate_credential=True,
        should_fallback=True,
    )

The message body is not inspected to distinguish overload from rate limiting.

Additionally, FailoverReason.overloaded exists as an enum value but is never produced by the 429 classification path, and _handle_credential_failover() in run_agent.py has no handler for it — it falls through to the default no-op return.

Proposed Fix

In error_classifier.py: inspect the error message for overload patterns ("temporarily overloaded", "server is overloaded", "capacity", etc.) and classify as FailoverReason.overloaded instead of rate_limit
In run_agent.py: add an overloaded handler in _handle_credential_failover() that skips the retry-on-same-credential step and rotates immediately (same behavior as billing)

Environment

Hermes-agent latest
Observed with Z.AI provider returning HTTP 429: "The service may be temporarily overloaded, please try again later"
No retry_after header or resets_at field in the error response

extent analysis

TL;DR

Inspect the error message for overload patterns and classify as FailoverReason.overloaded instead of rate_limit to apply the correct recovery strategy.

Guidance

Inspect the error message in agent/error_classifier.py for keywords like "temporarily overloaded", "server is overloaded", or "capacity" to distinguish between rate limiting and server-side overload.
Update the classification logic to return FailoverReason.overloaded when an overload pattern is detected.
Add an overloaded handler in _handle_credential_failover() in run_agent.py to skip retrying on the same credential and rotate immediately.
Verify the fix by testing with a provider that returns an HTTP 429 response with an overload message.

Example

if status_code == 429:
    error_message = response.text
    if "temporarily overloaded" in error_message or "server is overloaded" in error_message:
        return result_fn(
            FailoverReason.overloaded,
            retryable=False,
            should_rotate_credential=True,
            should_fallback=True,
        )
    else:
        return result_fn(
            FailoverReason.rate_limit,
            retryable=True,
            should_rotate_credential=True,
            should_fallback=True,
        )

Notes

The proposed fix assumes that the error message contains a clear indication of server-side overload. If the error message is not reliable, additional logic may be needed to determine the correct recovery strategy.

Recommendation

Apply the workaround by inspecting the error message and updating the classification logic, as this will allow for the correct recovery strategy to be applied based on the type of HTTP 429 response received.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Error classifier: 429 'temporarily overloaded' misclassified as rate_limit — triggers wrong recovery path [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #15414: fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)

Description (problem / solution / changelog)

What does this PR do?

Fix — two narrow, additive changes

1. `agent/error_classifier.py` — disambiguate 429s on message body

2. `run_agent.py::_recover_with_credential_pool` — handle `overloaded`

Related Issue

Type of Change

Test plan

Test coverage detail

Not in scope

Changed files

PR #15455: fix(credential-pool): exponential backoff on repeated exhaustion (#15296)

Description (problem / solution / changelog)

What does this PR do?

Fix

Backoff progression

Related Issue

Type of Change

Test plan

Test coverage detail

Companion PRs

Not in scope

Changed files

Code Example

Summary

Current Behavior

Proposed Fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING