hermes - ✅(Solved) Fix "overloaded" server errors classified as rate_limit, exhausting credential pool [2 pull requests, 2 comments, 2 participants]

hermes2026-04-22 15:00:52

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#14038•Fetched 2026-04-23 07:47:12

View on GitHub

Comments

Participants

Timeline

Reactions

Author

seniaxls

Participants

alt-glitch

seniaxls

Timeline (top)

labeled ×4commented ×2cross-referenced ×2

Error Message

"servicequotaexceededexception", "overloaded", "temporarily overloaded", ]

Fix Action

Fixed

Fixed by PR: fix(error_classifier): classify 'overloaded' message as FailoverReason.overloaded (https://github.com/NousResearch/hermes-agent/pull/14055)
Fixed by PR: fix(agent): classify 'overloaded' messages as server overload, not rate_limit (https://github.com/NousResearch/hermes-agent/pull/14261)

PR fix notes

PR #14055: fix(error_classifier): classify 'overloaded' message as FailoverReason.overloaded

Repository: NousResearch/hermes-agent
Author: ms-alan
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14055

Description (problem / solution / changelog)

Closes #14038

Summary

When a provider (e.g. Z.AI) returns a 'temporarily overloaded' error (HTTP 200 with code 1305, or HTTP 400), it was being classified as with . After 2 failures, the single API key was marked exhausted, causing all further retries to fail immediately.

The fix adds an 'overloaded' / 'temporarily overloaded' pattern check before the rate_limit check in both and . Overloaded errors now get (retryable, should_fallback) instead of , preventing unnecessary credential rotation.

Changes

: Added overloaded pattern check before rate_limit in (~line 594) and (~line 736)

Root cause

contains , but 'overloaded' error messages from providers like Z.AI were matching as generic rate limits. The flag caused the credential pool to mark the API key as exhausted after just 2 transient errors.

Changed files

agent/error_classifier.py (modified, +18/-1)

PR #14261: fix(agent): classify 'overloaded' messages as server overload, not rate_limit

Repository: NousResearch/hermes-agent
Author: ms-alan
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14261

Description (problem / solution / changelog)

Closes #14038

Changed files

agent/error_classifier.py (modified, +8/-0)
gateway/run.py (modified, +1/-1)

Code Example

# Overloaded patterns — server-side overload, NOT a credential/billing issue.
# Must come before rate_limit check to avoid rotating credentials unnecessarily.
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
    return result_fn(
        FailoverReason.overloaded,
        retryable=True,
    )

# Rate limit patterns
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
    ...

---

"servicequotaexceededexception",
    "overloaded",
    "temporarily overloaded",
]

RAW_BUFFERClick to expand / collapse

Bug Description

When a provider (e.g., Z.AI) returns a "temporarily overloaded" error (HTTP 200 with code 1305), Hermes classifies it as rate_limit with should_rotate_credential=True. This causes the credential pool to mark the API key as "exhausted" after just 2 errors, making all further retries useless.

Steps to Reproduce

Configure a provider that occasionally returns overloaded errors (e.g., Z.AI with a single API key)
Trigger multiple requests during peak load
Provider returns: HTTP 200: The service may be temporarily overloaded, please try again later
After 2 errors, the single API key is marked exhausted
All subsequent retries fail immediately with no valid credential

Expected Behavior

"Overloaded" errors should be classified as server-side issues (FailoverReason.overloaded), NOT as rate limits. The credential is valid — the server is just busy. Rotating credentials is counterproductive and exhausts the pool unnecessarily.

Suggested Fix

In agent/error_classifier.py, add an overloaded check before the rate_limit check in both _classify_by_message functions:

# Overloaded patterns — server-side overload, NOT a credential/billing issue.
# Must come before rate_limit check to avoid rotating credentials unnecessarily.
if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
    return result_fn(
        FailoverReason.overloaded,
        retryable=True,
    )

# Rate limit patterns
if any(p in error_msg for p in _RATE_LIMIT_PATTERNS):
    ...

Also add overloaded patterns to _RATE_LIMIT_PATTERNS:

    "servicequotaexceededexception",
    "overloaded",
    "temporarily overloaded",
]

Retry parameters (max_retries, base_delay, max_delay) are hardcoded in run_agent.py. Making them configurable via config.yaml would help users tune retry behavior for providers with volatile availability without editing source code (changes are lost on hermes update).

Environment

Provider: Z.AI (api.z.ai) GLM Coding Max Plan
Error: HTTP 200 with code 1305 "The service may be temporarily overloaded, please try again later"

extent analysis

TL;DR

Modify the error classification in agent/error_classifier.py to correctly handle "temporarily overloaded" errors as server-side issues, not rate limits.

Guidance

Update the _classify_by_message functions in agent/error_classifier.py to check for "overloaded" patterns before rate limit checks.
Add "overloaded" and "temporarily overloaded" patterns to the _RATE_LIMIT_PATTERNS list, but ensure they are handled as server-side overload cases.
Consider making retry parameters (max_retries, base_delay, max_delay) configurable via config.yaml to improve flexibility for providers with volatile availability.
Verify the changes by triggering multiple requests during peak load and checking that the credential pool does not exhaust after two errors.

Example

if "overloaded" in error_msg or "temporarily overloaded" in error_msg:
    return result_fn(
        FailoverReason.overloaded,
        retryable=True,
    )

Notes

The suggested fix assumes that the "overloaded" patterns are correctly identified and handled as server-side issues. Additionally, making retry parameters configurable may require further changes to the run_agent.py file.

Recommendation

Apply the workaround by modifying the agent/error_classifier.py file as suggested, to correctly classify "temporarily overloaded" errors and prevent unnecessary credential rotation.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #network issue #logging issue #authentication issue #prompt issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix "overloaded" server errors classified as rate_limit, exhausting credential pool [2 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed

PR fix notes

PR #14055: fix(error_classifier): classify 'overloaded' message as FailoverReason.overloaded

Description (problem / solution / changelog)

Summary

Changes

Root cause

Changed files

PR #14261: fix(agent): classify 'overloaded' messages as server overload, not rate_limit

Description (problem / solution / changelog)

Changed files

Code Example

Bug Description

Steps to Reproduce

Expected Behavior

Suggested Fix

Related

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING