hermes - ✅(Solved) Fix Error classifier: 429 'temporarily overloaded' misclassified as rate_limit — triggers wrong recovery path [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15297Fetched 2026-04-25 06:23:07
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×4cross-referenced ×2commented ×1

The error classifier treats all HTTP 429 responses as FailoverReason.rate_limit, regardless of whether the 429 indicates a per-key rate limit or a server-side overload. This causes the wrong recovery strategy to be used.

Some providers (e.g. Z.AI/Zhipu) return HTTP 429 with messages like:

HTTP 429: The service may be temporarily overloaded, please try again later

This is a server-side overload — the entire provider endpoint is struggling, not just this API key hitting a per-key quota. The recovery strategy should be different:

ReasonCorrect Behavior
rate_limitRetry same credential once, then rotate to next key
overloadedSkip retry, rotate immediately (the whole provider is down)

Error Message

The error classifier treats all HTTP 429 responses as FailoverReason.rate_limit, regardless of whether the 429 indicates a per-key rate limit or a server-side overload. This causes the wrong recovery strategy to be used.

  1. In error_classifier.py: inspect the error message for overload patterns ("temporarily overloaded", "server is overloaded", "capacity", etc.) and classify as FailoverReason.overloaded instead of rate_limit
  • No retry_after header or resets_at field in the error response

Root Cause

The error classifier treats all HTTP 429 responses as FailoverReason.rate_limit, regardless of whether the 429 indicates a per-key rate limit or a server-side overload. This causes the wrong recovery strategy to be used.

Some providers (e.g. Z.AI/Zhipu) return HTTP 429 with messages like:

HTTP 429: The service may be temporarily overloaded, please try again later

This is a server-side overload — the entire provider endpoint is struggling, not just this API key hitting a per-key quota. The recovery strategy should be different:

ReasonCorrect Behavior
rate_limitRetry same credential once, then rotate to next key
overloadedSkip retry, rotate immediately (the whole provider is down)

Fix Action

Fixed

PR fix notes

PR #15414: fix(error-classifier): route 429 'overloaded' messages to FailoverReason.overloaded (#15297)

Description (problem / solution / changelog)

What does this PR do?

Fixes `#15297`. Some providers — notably Z.AI / Zhipu — return HTTP 429 with messages like `"The service may be temporarily overloaded, please try again later"` to signal server-wide overload, not per-credential rate limiting.

The two conditions need different recovery strategies:

ReasonCorrect strategy
`rate_limit`Retry same credential once (the limit may have reset), then rotate
`overloaded`Skip retry, rotate immediately (the whole endpoint is saturated)

Before this fix:

  • `error_classifier.py` mapped every 429 to `FailoverReason.rate_limit` regardless of message body.
  • `FailoverReason.overloaded` already existed as an enum value (and was produced for 503/529) but no production path emitted it for 429.
  • `_recover_with_credential_pool` had no handler for `overloaded` — an `overloaded` classification fell through to the default no-op `return False, has_retried_429` line.

Net effect: every overload-language 429 burned a `has_retried_429` slot on the same saturated credential before the rotation happened. Cron jobs (one turn each) often used their entire execution on that wasted retry.

Fix — two narrow, additive changes

1. `agent/error_classifier.py` — disambiguate 429s on message body

New `_OVERLOADED_PATTERNS` list of provider-language overload phrases:

```python _OVERLOADED_PATTERNS = [ "temporarily overloaded", "server is overloaded", "server overloaded", "service overloaded", "service is overloaded", "service may be temporarily overloaded", "upstream overloaded", "is overloaded, please try again", "at capacity", "over capacity", "currently overloaded", ] ```

The 429 branch now checks the error body and emits `FailoverReason.overloaded` when matched, preserving `rate_limit` for everything else. Phrases are deliberately narrow and provider-language-flavoured so normal rate-limit messages (`"you have been rate-limited"`, `"Too Many Requests"`) don't hit this bucket.

2. `run_agent.py::_recover_with_credential_pool` — handle `overloaded`

New branch with the same shape as the existing `billing` handler: rotate immediately via `mark_exhausted_and_rotate(...)`, no retry-on-same-credential first. The 503/529 `overloaded` classifications produced by the existing code now also flow through this branch as a side benefit.

Related Issue

Fixes #15297

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Test plan

  • 15 new tests in 2 files — all green on py3.11 venv (145 total in the affected test files)
  • Regression guards verified: temporarily reverted the classifier's overload branch; 10 of the 11 new classifier tests correctly failed with clear assertion messages pointing at the regressed invariant. Restored fix → all 145 pass.
  • Pre-existing tests still green:
    • `test_error_classifier.py` — 102/102 (was 95, +7 new)
    • `test_credential_pool_routing.py` — 13/13 (was 9, +4 new)

Test coverage detail

Classifier (`tests/agent/test_error_classifier.py` — 7 new):

  • `test_429_zai_temporarily_overloaded` — exact #15297 Z.AI message
  • `test_429_overload_phrases_route_to_overloaded` — 9-phrase parametrised matrix (server/service/upstream overloaded, at/over capacity, `currently overloaded`, `is overloaded, please try again`, …)
  • `test_429_plain_rate_limit_remains_rate_limit` — "Rate limit reached for requests per minute" 429 → still `rate_limit` (regression guard against disambiguation silently broadening)
  • `test_429_too_many_requests_remains_rate_limit` — literal HTTP 429 reason phrase → still `rate_limit`
  • `test_503_overloaded_path_unchanged` — sanity check that the existing 503 → `overloaded` path didn't regress

Pool handler (`tests/agent/test_credential_pool_routing.py` — 4 new):

  • `test_overloaded_rotates_immediately_on_first_failure` — the fix
  • `test_overloaded_does_not_block_on_has_retried_flag` — overloaded rotates regardless of retry-flag state
  • `test_overloaded_pool_exhaustion_returns_false` — single-entry pool returns `recovered=False` so outer fallback takes over
  • `test_rate_limit_still_uses_retry_first` — negative control: rate_limit must KEEP its retry-first semantics, otherwise normal per-key throttles would double-rotate and burn the pool

Not in scope

  • The companion #15298 (`_restore_primary_runtime` cooldown check) and #15296 (credential pool exponential backoff) issues — same reporter, related but independent diffs. Kept this PR scoped to the classifier + handler so review stays small.
  • Other status codes that providers misuse for overload (e.g. some providers return 502 for cold-start latency rather than crash) — only addressing the documented 429 case here.

Changed files

  • agent/error_classifier.py (modified, +37/-1)
  • run_agent.py (modified, +21/-3)
  • tests/agent/test_credential_pool_routing.py (modified, +131/-0)
  • tests/agent/test_error_classifier.py (modified, +79/-0)

PR #15455: fix(credential-pool): exponential backoff on repeated exhaustion (#15296)

Description (problem / solution / changelog)

What does this PR do?

Fixes `#15296`. The pool's `exhausted_ttl` returned a flat `EXHAUSTED_TTL*` constant regardless of how many consecutive times the same credential had failed. When a provider was overloaded for hours, the pool cycled through:

  1. TTL expires → credential cleared back to `ok`
  2. Provider tried again → fails with the same error (3 retries wasted)
  3. Credential re-exhausted with the same flat TTL
  4. Wait for TTL to expire again → repeat

For cron jobs (each run = one turn), every execution burned its retry budget on the same dead credential until the upstream actually recovered.

Fix

  • New `consecutive_failures: int = 0` field on `PooledCredential`
  • `_mark_exhausted` increments it from the prior value (defensive `getattr` + `or 0` so legacy on-disk entries that predate the field default to 0 via the dataclass and get promoted to 1 on first failure)
  • `_exhausted_ttl(error_code, consecutive_failures=0)` now applies `base_ttl * min(2 ** (failures - 1), 8)` — capped at 8× so a long-running outage doesn't push the cooldown into days. The default-arg keeps backward compat with any caller still using the old single-arg signature.
  • `_exhausted_until` passes `entry.consecutive_failures` through
  • Resets to 0 happen on real success markers (refresh produces fresh tokens, anthropic claude_code sync from credentials file, nous adopts newer tokens from auth.json) and on operator-driven `reset_statuses()`
  • The cooldown auto-clear path (`_available_entries(clear_expired=True)` flipping `last_status` back to `STATUS_OK` when the TTL elapses) deliberately does NOT reset the counter — that's the entire point of the fix. If the same credential exhausts again right after the cooldown expires, the upstream is still down and the next cooldown should be longer.

Backoff progression

consecutive_failuresmultipliercooldown (default 1h base)
0 (default)1h
1 (first)1h
22h
34h
48h
5+8h (cap)

The 8h cap matters: without it, a credential that's been failing for a week could end up on a multi-day cooldown that survives operator intervention (refreshing the upstream provider, swapping API keys, etc.).

Related Issue

Fixes #15296

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Test plan

  • 21 new tests in `tests/agent/test_credential_pool_backoff.py` — all green on py3.11 venv
  • All 44 pre-existing `test_credential_pool.py` tests still pass — backward compat preserved (the default-argument keeps the single-arg signature working)
  • Verified regression guards: temporarily reverted the `_mark_exhausted` increment to a no-op; 4 tests in `TestMarkExhaustedIncrement` correctly failed with clear assertion messages pointing at the regressed invariant. Restored fix → all 30 backoff tests pass.

Test coverage detail

`TestExhaustedTtl` (8 cases):

  • Backward-compat default (no `consecutive_failures` arg → flat base TTL)
  • `consecutive_failures=1` is 1× (not 2×) — prevents off-by-one on the very first failure
  • 8-row parametrised matrix on 429 progression: 1× → 2× → 4× → 8× → cap
  • Same matrix on 402 (non-429 cooldowns also back off)
  • `consecutive_failures=0` defensive default
  • Negative count clamped to base (corrupted on-disk entry safety)
  • Cap constant pinned to 8 so a refactor flagging this test prompts a docstring update

`TestExhaustedUntil` (4 cases):

  • First failure → 1× base TTL after `last_status_at`
  • Fourth failure → 8× base TTL
  • 20 failures → still 8× (cap holds)
  • Explicit `last_error_reset_at` from upstream wins over backoff (provider-supplied resets are absolute truth)

`TestMarkExhaustedIncrement` (4 cases):

  • 0 → 1 on first exhaustion
  • 2 → 3 on consecutive (the streak persists)
  • Round-trips through pool reload (persistence)
  • Legacy on-disk entry without the field → loads as 0, increments to 1

`TestCounterResetSemantics` (3 cases):

  • `reset_statuses()` zeroes the counter (operator path)
  • `clear_expired` does NOT zero the counter — the bug-fix invariant; without this assertion a future refactor could silently break the backoff
  • `to_dict`/`from_dict` round-trips the field

Companion PRs

This is one of three related credential-pool resilience fixes filed by @mordekai-lab:

  • #15297 — error classifier disambiguates 429 "overloaded" from rate_limit (my open PR #15414)
  • #15298 — `_restore_primary_runtime` cooldown check (Tranquil-Flow's #15434)
  • #15296 — this PR

The three are independent and can land in any order.

Not in scope

  • Per-provider tuning of the cap or base TTL — currently uses the same 8× cap for every provider. A future enhancement could let providers configure their own backoff curve, but that's a bigger surface than this user-facing regression needs to unblock.

Changed files

  • agent/credential_pool.py (modified, +108/-6)
  • tests/agent/test_credential_pool_backoff.py (added, +409/-0)

Code Example

HTTP 429: The service may be temporarily overloaded, please try again later

---

if status_code == 429:
    return result_fn(
        FailoverReason.rate_limit,  # ← always rate_limit
        retryable=True,
        should_rotate_credential=True,
        should_fallback=True,
    )
RAW_BUFFERClick to expand / collapse

Summary

The error classifier treats all HTTP 429 responses as FailoverReason.rate_limit, regardless of whether the 429 indicates a per-key rate limit or a server-side overload. This causes the wrong recovery strategy to be used.

Some providers (e.g. Z.AI/Zhipu) return HTTP 429 with messages like:

HTTP 429: The service may be temporarily overloaded, please try again later

This is a server-side overload — the entire provider endpoint is struggling, not just this API key hitting a per-key quota. The recovery strategy should be different:

ReasonCorrect Behavior
rate_limitRetry same credential once, then rotate to next key
overloadedSkip retry, rotate immediately (the whole provider is down)

Current Behavior

agent/error_classifier.py (line ~551):

if status_code == 429:
    return result_fn(
        FailoverReason.rate_limit,  # ← always rate_limit
        retryable=True,
        should_rotate_credential=True,
        should_fallback=True,
    )

The message body is not inspected to distinguish overload from rate limiting.

Additionally, FailoverReason.overloaded exists as an enum value but is never produced by the 429 classification path, and _handle_credential_failover() in run_agent.py has no handler for it — it falls through to the default no-op return.

Proposed Fix

  1. In error_classifier.py: inspect the error message for overload patterns ("temporarily overloaded", "server is overloaded", "capacity", etc.) and classify as FailoverReason.overloaded instead of rate_limit
  2. In run_agent.py: add an overloaded handler in _handle_credential_failover() that skips the retry-on-same-credential step and rotates immediately (same behavior as billing)

Environment

  • Hermes-agent latest
  • Observed with Z.AI provider returning HTTP 429: "The service may be temporarily overloaded, please try again later"
  • No retry_after header or resets_at field in the error response

extent analysis

TL;DR

Inspect the error message for overload patterns and classify as FailoverReason.overloaded instead of rate_limit to apply the correct recovery strategy.

Guidance

  • Inspect the error message in agent/error_classifier.py for keywords like "temporarily overloaded", "server is overloaded", or "capacity" to distinguish between rate limiting and server-side overload.
  • Update the classification logic to return FailoverReason.overloaded when an overload pattern is detected.
  • Add an overloaded handler in _handle_credential_failover() in run_agent.py to skip retrying on the same credential and rotate immediately.
  • Verify the fix by testing with a provider that returns an HTTP 429 response with an overload message.

Example

if status_code == 429:
    error_message = response.text
    if "temporarily overloaded" in error_message or "server is overloaded" in error_message:
        return result_fn(
            FailoverReason.overloaded,
            retryable=False,
            should_rotate_credential=True,
            should_fallback=True,
        )
    else:
        return result_fn(
            FailoverReason.rate_limit,
            retryable=True,
            should_rotate_credential=True,
            should_fallback=True,
        )

Notes

The proposed fix assumes that the error message contains a clear indication of server-side overload. If the error message is not reliable, additional logic may be needed to determine the correct recovery strategy.

Recommendation

Apply the workaround by inspecting the error message and updating the classification logic, as this will allow for the correct recovery strategy to be applied based on the type of HTTP 429 response received.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Error classifier: 429 'temporarily overloaded' misclassified as rate_limit — triggers wrong recovery path [2 pull requests, 1 comments, 2 participants]