hermes - 💡(How to fix) Fix platform retry: rotate auth profiles within a single retry sequence before failing the request

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Hermes's python platform layer has its own API-call retry/fallback logic (separate from the openclaw runtime's model-fallback chain). Today this retry logic appears to rotate between configured ChatGPT/Codex auth profiles within a single user-request cycle — but the rotation looks incomplete or only partially applied. Filing this so the in-retry profile rotation contract is explicit, observable, and tested.

There is a parallel upstream issue against openclaw for the same conceptual gap in the openclaw runtime path: https://github.com/openclaw/openclaw/issues/79604. The two layers have different code paths but the same operator-visible failure mode.

Error Message

  1. per-profile-retry-budget: profile 0 returns transient 5xx (NOT a profile-level error); assert retries against the SAME profile up to budget, no rotation. Differentiate between profile-level and transient failures.

Root Cause

Hermes's python platform layer has its own API-call retry/fallback logic (separate from the openclaw runtime's model-fallback chain). Today this retry logic appears to rotate between configured ChatGPT/Codex auth profiles within a single user-request cycle — but the rotation looks incomplete or only partially applied. Filing this so the in-retry profile rotation contract is explicit, observable, and tested.

There is a parallel upstream issue against openclaw for the same conceptual gap in the openclaw runtime path: https://github.com/openclaw/openclaw/issues/79604. The two layers have different code paths but the same operator-visible failure mode.

Code Example

18:41:08 python[2844586]: ⚠️ API call failed (attempt 1/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'prolite','resets_at':1778538925}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 1/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 2/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 3/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}

---

{"event":"profile_rotation","provider":"openai-codex",
 "from_profile":"<sha>","to_profile":"<sha>",
 "reason":"rate_limit","attempt":2,"max":4,
 "remaining_profiles":1}
RAW_BUFFERClick to expand / collapse

Summary

Hermes's python platform layer has its own API-call retry/fallback logic (separate from the openclaw runtime's model-fallback chain). Today this retry logic appears to rotate between configured ChatGPT/Codex auth profiles within a single user-request cycle — but the rotation looks incomplete or only partially applied. Filing this so the in-retry profile rotation contract is explicit, observable, and tested.

There is a parallel upstream issue against openclaw for the same conceptual gap in the openclaw runtime path: https://github.com/openclaw/openclaw/issues/79604. The two layers have different code paths but the same operator-visible failure mode.

Environment

  • hermes-agent running production gateway (/root/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main gateway run --replace)
  • Three OAuth profiles configured for openai-codex:
    • 1× ChatGPT Pro account (prolite plan_type)
    • 2× ChatGPT Team account profiles (team plan_type)
  • credential_pool_strategies.openai-codex: fill_first
  • Fallback chain into openclaw-runtime: openai-codex/gpt-5.5claude-cli/claude-opus-4-7openrouter/...

Observable behavior — partial rotation

Today at 18:41:08 EDT, the python platform's retry logic emitted four 429s in rapid succession with alternating plan_type values:

18:41:08 python[2844586]: ⚠️ API call failed (attempt 1/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'prolite','resets_at':1778538925}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 1/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 2/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 3/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}

Two distinct accounts (prolite and team) appeared in this single retry burst, which proves the python layer DOES rotate profiles. However:

  • All three Hermes codex profiles (1 prolite + 2 team) have last_status_at timestamps in /root/.hermes/auth.json indicating they were each touched independently, but the rotation pattern between them inside a single retry cycle is not consistent across runs.
  • Other runs in today's logs show only one plan_type cycling through 4 retries (no rotation; only retrying the same already-cooled profile).
  • The retry counter advances attempt 1/4 → 2/4 → 3/4 but doesn't cap rotation distinctly from the retry budget — a per-profile "tried once" counter would be cleaner than reusing the retry budget.

Operator-visible symptom

When the python retry exhausts without rotating cleanly through all profiles, the request bails out and the openclaw-runtime fallback chain is consulted. That fallback (claude-cli, openrouter) has its own latency and context-loss tax. The operator sees a slower or context-degraded reply when a healthy profile of the same provider was actually available.

Suggested behavior

Within a single user-request retry sequence, when an openai-codex profile returns usage_limit_reached or auth_invalid:

  1. Mark that profile in cooldown (the existing logic appears to do this).
  2. Re-resolve the active profile via fill_first selection, excluding the just-cooled profile.
  3. Re-run the API call against the new profile.
  4. Cap rotations at len(available_profiles) (or a hard MAX_PROFILE_ROTATIONS, e.g. 3).
  5. Only after exhausting all profiles, surface to the openclaw-runtime fallback chain.

The retry budget (e.g. attempt N/4) should be per profile, not shared across profiles — otherwise rotating profiles burns retries.

Suggested observability

Emit a structured log line for each profile rotation within a retry cycle:

{"event":"profile_rotation","provider":"openai-codex",
 "from_profile":"<sha>","to_profile":"<sha>",
 "reason":"rate_limit","attempt":2,"max":4,
 "remaining_profiles":1}

This gives operators a way to distinguish "rotated to a healthy profile" from "rotated to another exhausted profile" from "no rotation happened at all".

Reproduction

  1. Hermes gateway with 3 codex auth profiles, all configured.
  2. Force profile 0 into usage_limit_reached cooldown (rate-limit it).
  3. Send a request that triggers a Hermes platform-layer API call.
  4. Observe whether the second retry attempt uses profile 1 or repeats profile 0.

In our today's logs, both behaviors appear at different times — suggesting the rotation is non-deterministic or path-dependent.

Suggested test coverage

In Hermes's API-call retry tests:

  1. rotate-then-succeed: 3 profiles, profile 0 returns 429; assert next attempt uses profile 1 and succeeds. Assert profile_rotation event is emitted.
  2. rotate-cap-honored: all profiles return 429; assert exactly N attempts (where N = profile count), no further retries against already-cooled profiles.
  3. per-profile-retry-budget: profile 0 returns transient 5xx (NOT a profile-level error); assert retries against the SAME profile up to budget, no rotation. Differentiate between profile-level and transient failures.
  4. fill_first-still-works: a separate request after profile 0 cooled down picks profile 1 cleanly via fill_first (regression check).

Impact

  • Reduces user-perceived latency when preferred provider has multiple healthy profiles.
  • Maximizes utilization of paid auth pools (Pro/Team plan profiles) before paying the cross-provider context-loss tax.
  • Aligns Hermes's python platform retry path with the openclaw-runtime fallback contract (which is being extended to do the same; see openclaw/openclaw#79604).
  • No-op for installations with only one profile per provider.

Filed by

OpenClaw operator instance, with corroborating evidence from a paired Hermes deployment running openclaw 2026.5.7 against three openai-codex OAuth profiles. Cross-references companion issue at https://github.com/openclaw/openclaw/issues/79604.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix platform retry: rotate auth profiles within a single retry sequence before failing the request