hermes - ✅(Solved) Fix feat: reduce default 429 cooldown from 1 hour to 3 minutes for better UX [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18697Fetched 2026-05-03 04:54:53
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
1
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×1referenced ×1subscribed ×1

We've been running with a 3-minute cooldown for GLM-5.1 in production for 2+ weeks with zero issues. Rate limits resolve within 1-2 minutes in practice.

Related: #15298 (cooldown not checked on restore), #17929 (single-key RuntimeError)

Root Cause

This is especially painful because:

  • Rate limits are often burst-based (e.g., "20 req/min") — after 1 minute the limit resets, but the user is still blocked for 59 more minutes
  • The agent logs show "no available entries (all exhausted or empty)" with no clear explanation
  • Users think the service is broken, not rate-limited

Fix Action

Fixed

PR fix notes

PR #18713: fix: reduce default 429 cooldown from 1 hour to 3 minutes

Description (problem / solution / changelog)

Closes #18697

Problem

The default 429 rate-limit cooldown in agent/credential_pool.py is 1 hour. For users with a single API key, a single 429 response locks them out for an entire hour. Burst rate limits typically reset within 1-2 minutes.

We have been running with a 3-minute cooldown for GLM-5.1 in production for 2+ weeks with zero issues.

Fix

  • Changed EXHAUSTED_TTL_429_SECONDS from 60 * 60 (1 hour) to 60 * 3 (3 minutes)
  • Updated comments to reflect the change
  • Provider-supplied reset_at timestamps still override this default
  • EXHAUSTED_TTL_DEFAULT_SECONDS (for 402/billing) remains at 1 hour — unchanged

Testing

All existing tests pass (tests/agent/test_credential_pool.py — 25/25 passed).

Changed files

  • agent/credential_pool.py (modified, +4/-3)

Code Example

EXHAUSTED_TTL_429_SECONDS = 60 * 60  # 1 hour

---

EXHAUSTED_TTL_429_SECONDS = 60 * 3  # 3 minutes

---

credential_pool:
  exhausted_ttl_429_seconds: 180
RAW_BUFFERClick to expand / collapse

Problem

The default 429 rate-limit cooldown in agent/credential_pool.py is 1 hour:

EXHAUSTED_TTL_429_SECONDS = 60 * 60  # 1 hour

For users with a single API key (common with custom providers like Zhipu/GLM-5.1, Moonshot, DeepSeek, etc.), a single 429 response locks them out for an entire hour with no way to retry.

This is especially painful because:

  • Rate limits are often burst-based (e.g., "20 req/min") — after 1 minute the limit resets, but the user is still blocked for 59 more minutes
  • The agent logs show "no available entries (all exhausted or empty)" with no clear explanation
  • Users think the service is broken, not rate-limited

Proposed Fix

Reduce the default to 3 minutes:

EXHAUSTED_TTL_429_SECONDS = 60 * 3  # 3 minutes

This matches typical burst-rate-limit reset windows while still preventing retry storms.

Alternatively, make this configurable via config.yaml:

credential_pool:
  exhausted_ttl_429_seconds: 180

Context

We've been running with a 3-minute cooldown for GLM-5.1 in production for 2+ weeks with zero issues. Rate limits resolve within 1-2 minutes in practice.

Related: #15298 (cooldown not checked on restore), #17929 (single-key RuntimeError)

extent analysis

TL;DR

Reduce the default 429 rate-limit cooldown in agent/credential_pool.py to 3 minutes to prevent users from being locked out for an entire hour.

Guidance

  • Consider reducing the EXHAUSTED_TTL_429_SECONDS value to 3 minutes (180 seconds) to match typical burst-rate-limit reset windows.
  • Alternatively, make this value configurable via config.yaml to allow for more flexibility.
  • Verify the change by testing the rate-limiting behavior and ensuring that users are not locked out for an entire hour after receiving a 429 response.
  • Monitor the agent logs to ensure that the "no available entries" error is resolved and that users can retry after the cooldown period.

Example

EXHAUSTED_TTL_429_SECONDS = 60 * 3  # 3 minutes

or

credential_pool:
  exhausted_ttl_429_seconds: 180

Notes

The proposed fix is based on the context that a 3-minute cooldown has been successfully used in production for 2+ weeks with zero issues. However, it's essential to test and verify the change to ensure it works as expected in different scenarios.

Recommendation

Apply the workaround by reducing the default 429 rate-limit cooldown to 3 minutes, as it has been tested and proven to work in production.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING