hermes - 💡(How to fix) Fix fallback_providers not activated when 429 follows prior timeout recovery

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

When the primary model (glm-5.1) hit HTTP 429 rate limits, the configured fallback to glm-4.7 was never activated for the main gateway session. All retries stayed on glm-5.1 until the retry budget was exhausted, then the error was returned to the user. 3. "API call failed after 3 retries" → error returned to user 16:06:36 ERROR API call failed after 3 retries. HTTP 429 | provider=zai model=glm-5.1

Root Cause

I traced through agent/conversation_loop.py and believe the issue is:

  1. After 3 timeouts, _try_recover_primary_transport() succeeds and resets retry_count = 0 (line ~2897)
  2. The subsequent 429s trigger _recover_with_credential_pool() (line ~2065), which returns (False, True) — setting has_retried_429 = True but not recovering
  3. The early fallback check at line ~2456 (is_rate_limited and _fallback_index < len(_fallback_chain)) calls _pool_may_recover_from_rate_limit() which correctly returns False (single-credential pool)
  4. However, _try_activate_fallback() at line ~2468 appears to not fire — possibly because the combination of primary_recovery_attempted = True and the prior credential pool interaction leaves the retry state in a condition where the fallback path is skipped

The cron session succeeded because it hit 429 directly (no prior timeout), so the simpler path through the fallback logic worked correctly.

Fix Action

Workaround

None found. The fallback mechanism works in simple scenarios but breaks when a timeout precedes the 429.

Code Example

provider: zai
model: glm-5.1
base_url: https://open.bigmodel.cn/api/coding/paas/v4

fallback_providers:
  - provider: zai
    model: glm-4.7
    base_url: https://open.bigmodel.cn/api/coding/paas/v4

---

2026-05-26 16:00:48 INFO [cron_006bda9176a7...] agent.chat_completion_helpers: Fallback activated: glm-5.1 → glm-4.7 (zai)

---

16:06:27 WARNING API call failed (attempt 1/3) error_type=RateLimitError provider=zai model=glm-5.1
16:06:31 WARNING API call failed (attempt 2/3) error_type=RateLimitError provider=zai model=glm-5.1
16:06:36 ERROR   API call failed after 3 retries. HTTP 429 | provider=zai model=glm-5.1

---

16:06:21 WARNING API call failed (attempt 3/3) error_type=APITimeoutError model=glm-5.1
16:06:27 WARNING API call failed (attempt 1/3) error_type=RateLimitError model=glm-5.1
RAW_BUFFERClick to expand / collapse

Bug: fallback_providers not activated when primary model hits 429 after prior timeout recovery

Environment

  • Hermes Agent: latest main (May 26, 2026)
  • Provider: zai (智谱 GLM Coding Plan endpoint)
  • Primary model: glm-5.1
  • Fallback model: glm-4.7 (same provider, same base_url, different model)
  • Platform: macOS, Telegram gateway

Config

provider: zai
model: glm-5.1
base_url: https://open.bigmodel.cn/api/coding/paas/v4

fallback_providers:
  - provider: zai
    model: glm-4.7
    base_url: https://open.bigmodel.cn/api/coding/paas/v4

What happened

When the primary model (glm-5.1) hit HTTP 429 rate limits, the configured fallback to glm-4.7 was never activated for the main gateway session. All retries stayed on glm-5.1 until the retry budget was exhausted, then the error was returned to the user.

However, a cron job session running at the same time did successfully fall back:

2026-05-26 16:00:48 INFO [cron_006bda9176a7...] agent.chat_completion_helpers: Fallback activated: glm-5.1 → glm-4.7 (zai)

Timeline (main session 20260526_041128_140c21)

  1. 3x timeout_try_recover_primary_transport() rebuilds the OpenAI client, resets retry_count = 0
  2. 3x HTTP 429 → all on glm-5.1, no fallback attempted
  3. "API call failed after 3 retries" → error returned to user
  4. User sends follow-up → another 3x HTTP 429 on glm-5.1, same result

Evidence

agent.log — all requests show model=glm-5.1, never glm-4.7:

16:06:27 WARNING API call failed (attempt 1/3) error_type=RateLimitError provider=zai model=glm-5.1
16:06:31 WARNING API call failed (attempt 2/3) error_type=RateLimitError provider=zai model=glm-5.1
16:06:36 ERROR   API call failed after 3 retries. HTTP 429 | provider=zai model=glm-5.1

No "Fallback activated" entry exists for this session.

errors.log — shows the timeout → 429 transition:

16:06:21 WARNING API call failed (attempt 3/3) error_type=APITimeoutError model=glm-5.1
16:06:27 WARNING API call failed (attempt 1/3) error_type=RateLimitError model=glm-5.1

Root cause analysis

I traced through agent/conversation_loop.py and believe the issue is:

  1. After 3 timeouts, _try_recover_primary_transport() succeeds and resets retry_count = 0 (line ~2897)
  2. The subsequent 429s trigger _recover_with_credential_pool() (line ~2065), which returns (False, True) — setting has_retried_429 = True but not recovering
  3. The early fallback check at line ~2456 (is_rate_limited and _fallback_index < len(_fallback_chain)) calls _pool_may_recover_from_rate_limit() which correctly returns False (single-credential pool)
  4. However, _try_activate_fallback() at line ~2468 appears to not fire — possibly because the combination of primary_recovery_attempted = True and the prior credential pool interaction leaves the retry state in a condition where the fallback path is skipped

The cron session succeeded because it hit 429 directly (no prior timeout), so the simpler path through the fallback logic worked correctly.

Expected behavior

When the primary model is rate-limited (429) and fallback_providers is configured with a different model under the same provider, the agent should switch to the fallback model after the retry budget is exhausted.

Workaround

None found. The fallback mechanism works in simple scenarios but breaks when a timeout precedes the 429.

Related code

  • agent/conversation_loop.py — retry loop, fallback checks at lines ~2456 and ~2900
  • agent/chat_completion_helpers.pytry_activate_fallback() at line ~740
  • run_agent.py_pool_may_recover_from_rate_limit() at line ~239

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When the primary model is rate-limited (429) and fallback_providers is configured with a different model under the same provider, the agent should switch to the fallback model after the retry budget is exhausted.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING