hermes - ✅(Solved) Fix "overloaded" server errors classified as rate_limit, exhausting credential pool [2 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14038Fetched 2026-04-23 07:47:12
View on GitHub
Comments
2
Participants
2
Timeline
8
Reactions
0
Author
Participants
Timeline (top)
labeled ×4commented ×2cross-referenced ×2

Error Message

"servicequotaexceededexception", "overloaded", "temporarily overloaded", ]

Fix Action

Fixed

PR fix notes

PR #14055: fix(error_classifier): classify 'overloaded' message as FailoverReason.overloaded

Description (problem / solution / changelog)

Closes #14038

Summary

When a provider (e.g. Z.AI) returns a 'temporarily overloaded' error (HTTP 200 with code 1305, or HTTP 400), it was being classified as with . After 2 failures, the single API key was marked exhausted, causing all further retries to fail immediately.

The fix adds an 'overloaded' / 'temporarily overloaded' pattern check before the rate_limit check in both and . Overloaded errors now get (retryable, should_fallback) instead of , preventing unnecessary credential rotation.

Changes

  • : Added overloaded pattern check before rate_limit in (~line 594) and (~line 736)

Root cause

contains , but 'overloaded' error messages from providers like Z.AI were matching as generic rate limits. The flag caused the credential pool to mark the API key as exhausted after just 2 transient errors.

Changed files

  • agent/error_classifier.py (modified, +18/-1)

PR #14261: fix(agent): classify 'overloaded' messages as server overload, not rate_limit

Description (problem / solution / changelog)

Closes #14038

Changed files

  • agent/error_classifier.py (modified, +8/-0)
  • gateway/run.py (modified, +1/-1)

Code Example

# Overloaded patterns — server-side overload, NOT a credential/billing issue.
# Must come before rate_limit check to avoid rotating credentials unnecessarily.
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
    return result_fn(
        FailoverReason.overloaded,
        retryable=True,
    )

# Rate limit patterns
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
    ...

---

"servicequotaexceededexception",
    "overloaded",
    "temporarily overloaded",
]
RAW_BUFFERClick to expand / collapse

Bug Description

When a provider (e.g., Z.AI) returns a "temporarily overloaded" error (HTTP 200 with code 1305), Hermes classifies it as rate_limit with should_rotate_credential=True. This causes the credential pool to mark the API key as "exhausted" after just 2 errors, making all further retries useless.

Steps to Reproduce

  1. Configure a provider that occasionally returns overloaded errors (e.g., Z.AI with a single API key)
  2. Trigger multiple requests during peak load
  3. Provider returns: HTTP 200: The service may be temporarily overloaded, please try again later
  4. After 2 errors, the single API key is marked exhausted
  5. All subsequent retries fail immediately with no valid credential

Expected Behavior

"Overloaded" errors should be classified as server-side issues (FailoverReason.overloaded), NOT as rate limits. The credential is valid — the server is just busy. Rotating credentials is counterproductive and exhausts the pool unnecessarily.

Suggested Fix

In agent/error_classifier.py, add an overloaded check before the rate_limit check in both _classify_by_message functions:

# Overloaded patterns — server-side overload, NOT a credential/billing issue.
# Must come before rate_limit check to avoid rotating credentials unnecessarily.
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
    return result_fn(
        FailoverReason.overloaded,
        retryable=True,
    )

# Rate limit patterns
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
    ...

Also add overloaded patterns to _RATE_LIMIT_PATTERNS:

    "servicequotaexceededexception",
    "overloaded",
    "temporarily overloaded",
]

Related

Retry parameters (max_retries, base_delay, max_delay) are hardcoded in run_agent.py. Making them configurable via config.yaml would help users tune retry behavior for providers with volatile availability without editing source code (changes are lost on hermes update).

Environment

  • Provider: Z.AI (api.z.ai) GLM Coding Max Plan
  • Error: HTTP 200 with code 1305 "The service may be temporarily overloaded, please try again later"

extent analysis

TL;DR

Modify the error classification in agent/error_classifier.py to correctly handle "temporarily overloaded" errors as server-side issues, not rate limits.

Guidance

  • Update the _classify_by_message functions in agent/error_classifier.py to check for "overloaded" patterns before rate limit checks.
  • Add "overloaded" and "temporarily overloaded" patterns to the _RATE_LIMIT_PATTERNS list, but ensure they are handled as server-side overload cases.
  • Consider making retry parameters (max_retries, base_delay, max_delay) configurable via config.yaml to improve flexibility for providers with volatile availability.
  • Verify the changes by triggering multiple requests during peak load and checking that the credential pool does not exhaust after two errors.

Example

if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
    return result_fn(
        FailoverReason.overloaded,
        retryable=True,
    )

Notes

The suggested fix assumes that the "overloaded" patterns are correctly identified and handled as server-side issues. Additionally, making retry parameters configurable may require further changes to the run_agent.py file.

Recommendation

Apply the workaround by modifying the agent/error_classifier.py file as suggested, to correctly classify "temporarily overloaded" errors and prevent unnecessary credential rotation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix "overloaded" server errors classified as rate_limit, exhausting credential pool [2 pull requests, 2 comments, 2 participants]