hermes - 💡(How to fix) Fix [Bug] Interactive CLI session does not auto-fallback on Codex 429 'usage_limit_reached', while cron jobs with the same fallback chain do [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20465Fetched 2026-05-06 06:36:43
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
1
Author
Participants
Timeline (top)
labeled ×5commented ×1

When the primary provider is openai-codex/gpt-5.5 and Codex returns HTTP 429 usage_limit_reached (the periodic 5-hour quota wall, not billing), an interactive CLI session exhausts its 3 retries against Codex and surfaces API call failed after 3 retries: HTTP 429: The usage limit has been reached to the user. The configured fallback_providers chain is never activated in this path, even though hermes fallback list confirms it is loaded.

The exact same fallback_providers chain does activate successfully for cron jobs running concurrently against the same Codex quota.

Error Message

  1. Use Hermes interactively until the Codex 5-hour quota window is hit. Error returned by Codex: {"error": {"type": "usage_limit_reached", "message": "The usage limit has been reached", "plan_type": "plus", "resets_at": <epoch>, "resets_in_seconds": <seconds>}}
  2. The fallback model is never tried; provider=openai-codex is reported in the error log. 2026-05-06 01:42:12 ERROR [20260506_014139_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487 2026-05-06 01:44:27 ERROR [20260506_014356_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487 2026-05-06 01:48:09 ERROR [20260506_014753_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487 2026-05-06 00:28:14 INFO agent.auxiliary_client: Auxiliary title_generation: rate limit on auto (Error code: 429 - {'error': {'type': 'usage_limit_reached', ...}}), trying fallback

Root Cause

When the primary provider is openai-codex/gpt-5.5 and Codex returns HTTP 429 usage_limit_reached (the periodic 5-hour quota wall, not billing), an interactive CLI session exhausts its 3 retries against Codex and surfaces API call failed after 3 retries: HTTP 429: The usage limit has been reached to the user. The configured fallback_providers chain is never activated in this path, even though hermes fallback list confirms it is loaded.

The exact same fallback_providers chain does activate successfully for cron jobs running concurrently against the same Codex quota.

Code Example

{"error": {"type": "usage_limit_reached", "message": "The usage limit has been reached", "plan_type": "plus", "resets_at": <epoch>, "resets_in_seconds": <seconds>}}

---

API call failed after 3 retries: HTTP 429: The usage limit has been reached

---

# Cron jobs — fallback activates correctly
2026-05-06 00:30:31 INFO [cron_8bc7a04c68f9_20260506_003030] root: Fallback activated: gpt-5.5 → nemotron-3-nano:30b (custom)
2026-05-06 01:00:51 INFO [cron_8bc7a04c68f9_20260506_010049] root: Fallback activated: gpt-5.5 → openai/gpt-5.5 (openrouter)
2026-05-06 01:30:50 INFO [cron_8bc7a04c68f9_20260506_013049] root: Fallback activated: gpt-5.5 → qwen3.6:35b-a3b-q4_K_M (ollama-local)
2026-05-06 01:45:44 INFO [cron_8bc7a04c68f9_20260506_014544] root: Fallback activated: gpt-5.5 → qwen3.6:35b-a3b-q4_K_M (ollama-local)

# Interactive sessions — fallback never fires (same chain, same time window)
2026-05-06 01:42:12 ERROR [20260506_014139_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
2026-05-06 01:44:27 ERROR [20260506_014356_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
2026-05-06 01:48:09 ERROR [20260506_014753_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
[…repeats every few minutes for the duration of the quota window]

---

2026-05-06 00:28:14 INFO agent.auxiliary_client: Auxiliary title_generation: rate limit on auto (Error code: 429 - {'error': {'type': 'usage_limit_reached', ...}}), trying fallback
RAW_BUFFERClick to expand / collapse

[Bug] Interactive CLI session does not auto-fallback on Codex 429 usage_limit_reached, while cron jobs with the same fallback chain do

Summary

When the primary provider is openai-codex/gpt-5.5 and Codex returns HTTP 429 usage_limit_reached (the periodic 5-hour quota wall, not billing), an interactive CLI session exhausts its 3 retries against Codex and surfaces API call failed after 3 retries: HTTP 429: The usage limit has been reached to the user. The configured fallback_providers chain is never activated in this path, even though hermes fallback list confirms it is loaded.

The exact same fallback_providers chain does activate successfully for cron jobs running concurrently against the same Codex quota.

Environment

  • Hermes Agent: tip of main at commit 0d41e94ca (feat(i18n): add French (fr) locale support, 2026-05-05) — also reproduced on the v0.12.0 release (87b113c2e).
  • Install: WSL2 Ubuntu 24.04 on Windows 11.
  • Primary: openai-codex / gpt-5.5 (ChatGPT Plus subscription, OAuth via hermes auth).
  • Fallback chain (verified via hermes fallback list): one entry, tested with three configurations — all show identical broken behavior in interactive sessions:
    1. provider: custom + base_url: http://host.docker.internal:11434/v1 + api_key: ollama + model: qwen3.6:35b-a3b-q4_K_M (local Ollama on Windows host)
    2. provider: ollama-local (named provider defined in providers: block) + model: qwen3.6:35b-a3b-q4_K_M
    3. provider: openrouter + model: openai/gpt-5.5

Reproduction

  1. Configure model.provider: openai-codex / model.default: gpt-5.5 and any working fallback_providers chain (verified loaded by hermes fallback list).
  2. Use Hermes interactively until the Codex 5-hour quota window is hit. Error returned by Codex:
    {"error": {"type": "usage_limit_reached", "message": "The usage limit has been reached", "plan_type": "plus", "resets_at": <epoch>, "resets_in_seconds": <seconds>}}
  3. Send another message in the interactive chat. Hermes retries 3 times against Codex, then surfaces:
    API call failed after 3 retries: HTTP 429: The usage limit has been reached
  4. The fallback model is never tried; provider=openai-codex is reported in the error log.

Evidence — divergence between cron and interactive paths

Same chain, same Codex 429, same wall-clock window:

# Cron jobs — fallback activates correctly
2026-05-06 00:30:31 INFO [cron_8bc7a04c68f9_20260506_003030] root: Fallback activated: gpt-5.5 → nemotron-3-nano:30b (custom)
2026-05-06 01:00:51 INFO [cron_8bc7a04c68f9_20260506_010049] root: Fallback activated: gpt-5.5 → openai/gpt-5.5 (openrouter)
2026-05-06 01:30:50 INFO [cron_8bc7a04c68f9_20260506_013049] root: Fallback activated: gpt-5.5 → qwen3.6:35b-a3b-q4_K_M (ollama-local)
2026-05-06 01:45:44 INFO [cron_8bc7a04c68f9_20260506_014544] root: Fallback activated: gpt-5.5 → qwen3.6:35b-a3b-q4_K_M (ollama-local)

# Interactive sessions — fallback never fires (same chain, same time window)
2026-05-06 01:42:12 ERROR [20260506_014139_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
2026-05-06 01:44:27 ERROR [20260506_014356_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
2026-05-06 01:48:09 ERROR [20260506_014753_<id>] root: API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex model=gpt-5.5 msgs=2 tokens=~5,487
[…repeats every few minutes for the duration of the quota window…]

provider=openai-codex in every interactive failure indicates _try_activate_fallback() was either not called or returned False without leaving any Fallback activated / Fallback to <provider> failed: provider not configured / Failed to activate fallback log lines (the three observable outcomes I'd expect).

The auxiliary client did log a successful fallback for one title_generation call earlier in the same session window:

2026-05-06 00:28:14 INFO agent.auxiliary_client: Auxiliary title_generation: rate limit on auto (Error code: 429 - {'error': {'type': 'usage_limit_reached', ...}}), trying fallback

…so aux-side fallback (post-PR #20294) appears to work; only the main-agent interactive retry-then-fallback handoff at run_agent.py around line 12879 does not produce any "trying fallback" / "Fallback activated" / "Failed to activate fallback" output for this 429 type.

What I tried

  • Verified hermes fallback list shows the chain loaded.
  • Migrated config from legacy fallback_model: (single dict) to fallback_providers: (list) — no change.
  • Tried three different fallback provider configs (above) — no change.
  • Restarted Hermes from a fresh shell (no /continue) — no change.
  • Ran hermes update -y to pull current main (0d41e94ca) — no change.
  • hermes -z "say pong" -m qwen3.6:35b-a3b-q4_K_M --provider ollama-local returns pong (confirms the fallback target itself is reachable and configured correctly).

Possibly related open issues

  • #19839 — apply fallback cooldown for all failover reasons (cooldown gating may be related, but doesn't explain "fallback never fires at all").
  • #17446 — Fallback announced but never sent (similar symptom shape, different trigger).
  • #19411 — Gateway fallback keeps primary model (gateway-specific; this report is CLI/interactive).
  • #15714 — Aux compression ignores fallback_providers (separate bug).

This report appears distinct: the main agent in interactive mode never produces any fallback-attempt log line for Codex usage_limit_reached 429s, while the cron path under the same agent code on the same chain consistently does.

Suggested diagnostic

A debug log line at the entry of _try_activate_fallback() and at every return False site in run_agent.py would let users with this symptom confirm in one repro which branch is failing. If the function isn't being called at all on Codex 429 in interactive mode, the divergence is upstream — likely in how codex_responses API errors propagate out of _run_codex_stream versus how the chat-completions retry loop catches them.

Severity

Effectively breaks the documented use case of "primary subscription provider with local Ollama as cost-free fallback". Users hit a 5-hour wall on every Codex quota cycle with no automatic recovery, despite the fallback chain being correctly configured per hermes fallback list.

extent analysis

TL;DR

The issue can be resolved by modifying the run_agent.py file to properly handle Codex 429 errors in interactive mode and activate the fallback provider.

Guidance

  • Investigate the _try_activate_fallback() function in run_agent.py to determine why it's not being called or why it's returning False without attempting to activate the fallback provider.
  • Add debug log lines at the entry of _try_activate_fallback() and at every return False site to help diagnose the issue.
  • Verify that the codex_responses API errors are being properly propagated out of _run_codex_stream and caught by the chat-completions retry loop.
  • Compare the error handling and fallback activation logic between the cron job path and the interactive mode to identify any discrepancies.

Example

No code example is provided as the issue requires investigation and modification of the existing codebase.

Notes

The issue appears to be specific to the interactive mode and Codex 429 errors, and the fallback provider is not being activated despite being correctly configured. The cron job path is working as expected, which suggests that the issue is related to the error handling and fallback activation logic in the interactive mode.

Recommendation

Apply a workaround by modifying the run_agent.py file to properly handle Codex 429 errors in interactive mode and activate the fallback provider. This will require investigation and debugging to determine the root cause of the issue and implement the necessary changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug] Interactive CLI session does not auto-fallback on Codex 429 'usage_limit_reached', while cron jobs with the same fallback chain do [1 comments, 2 participants]