hermes - 💡(How to fix) Fix platform retry: rotate auth profiles within a single retry sequence before failing the request

Root Cause

Hermes's python platform layer has its own API-call retry/fallback logic (separate from the openclaw runtime's model-fallback chain). Today this retry logic appears to rotate between configured ChatGPT/Codex auth profiles within a single user-request cycle — but the rotation looks incomplete or only partially applied. Filing this so the in-retry profile rotation contract is explicit, observable, and tested.

There is a parallel upstream issue against openclaw for the same conceptual gap in the openclaw runtime path: https://github.com/openclaw/openclaw/issues/79604. The two layers have different code paths but the same operator-visible failure mode.

Code Example

18:41:08 python[2844586]: ⚠️ API call failed (attempt 1/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'prolite','resets_at':1778538925}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 1/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 2/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 3/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}

---

{"event":"profile_rotation","provider":"openai-codex",
 "from_profile":"<sha>","to_profile":"<sha>",
 "reason":"rate_limit","attempt":2,"max":4,
 "remaining_profiles":1}

Summary

Environment

hermes-agent running production gateway (/root/.hermes/hermes-agent/venv/bin/python -m hermes_cli.main gateway run --replace)
Three OAuth profiles configured for openai-codex:
- 1× ChatGPT Pro account (prolite plan_type)
- 2× ChatGPT Team account profiles (team plan_type)
credential_pool_strategies.openai-codex: fill_first
Fallback chain into openclaw-runtime: openai-codex/gpt-5.5 → claude-cli/claude-opus-4-7 → openrouter/...

Observable behavior — partial rotation

Today at 18:41:08 EDT, the python platform's retry logic emitted four 429s in rapid succession with alternating plan_type values:

18:41:08 python[2844586]: ⚠️ API call failed (attempt 1/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'prolite','resets_at':1778538925}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 1/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 2/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}
18:41:08 python[2844586]: ⚠️ API call failed (attempt 3/4): RateLimitError [HTTP 429]
   📋 Details: {'type':'usage_limit_reached','plan_type':'team','resets_at':1778295076}

Two distinct accounts (prolite and team) appeared in this single retry burst, which proves the python layer DOES rotate profiles. However:

All three Hermes codex profiles (1 prolite + 2 team) have last_status_at timestamps in /root/.hermes/auth.json indicating they were each touched independently, but the rotation pattern between them inside a single retry cycle is not consistent across runs.
Other runs in today's logs show only one plan_type cycling through 4 retries (no rotation; only retrying the same already-cooled profile).
The retry counter advances attempt 1/4 → 2/4 → 3/4 but doesn't cap rotation distinctly from the retry budget — a per-profile "tried once" counter would be cleaner than reusing the retry budget.

Operator-visible symptom

When the python retry exhausts without rotating cleanly through all profiles, the request bails out and the openclaw-runtime fallback chain is consulted. That fallback (claude-cli, openrouter) has its own latency and context-loss tax. The operator sees a slower or context-degraded reply when a healthy profile of the same provider was actually available.

Suggested behavior

Within a single user-request retry sequence, when an openai-codex profile returns usage_limit_reached or auth_invalid:

Mark that profile in cooldown (the existing logic appears to do this).
Re-resolve the active profile via fill_first selection, excluding the just-cooled profile.
Re-run the API call against the new profile.
Cap rotations at len(available_profiles) (or a hard MAX_PROFILE_ROTATIONS, e.g. 3).
Only after exhausting all profiles, surface to the openclaw-runtime fallback chain.

The retry budget (e.g. attempt N/4) should be per profile, not shared across profiles — otherwise rotating profiles burns retries.

Suggested observability

Emit a structured log line for each profile rotation within a retry cycle:

{"event":"profile_rotation","provider":"openai-codex",
 "from_profile":"<sha>","to_profile":"<sha>",
 "reason":"rate_limit","attempt":2,"max":4,
 "remaining_profiles":1}

This gives operators a way to distinguish "rotated to a healthy profile" from "rotated to another exhausted profile" from "no rotation happened at all".

Reproduction

Hermes gateway with 3 codex auth profiles, all configured.
Force profile 0 into usage_limit_reached cooldown (rate-limit it).
Send a request that triggers a Hermes platform-layer API call.
Observe whether the second retry attempt uses profile 1 or repeats profile 0.

In our today's logs, both behaviors appear at different times — suggesting the rotation is non-deterministic or path-dependent.

Suggested test coverage

In Hermes's API-call retry tests:

rotate-then-succeed: 3 profiles, profile 0 returns 429; assert next attempt uses profile 1 and succeeds. Assert profile_rotation event is emitted.
rotate-cap-honored: all profiles return 429; assert exactly N attempts (where N = profile count), no further retries against already-cooled profiles.
per-profile-retry-budget: profile 0 returns transient 5xx (NOT a profile-level error); assert retries against the SAME profile up to budget, no rotation. Differentiate between profile-level and transient failures.
fill_first-still-works: a separate request after profile 0 cooled down picks profile 1 cleanly via fill_first (regression check).

Impact

Reduces user-perceived latency when preferred provider has multiple healthy profiles.
Maximizes utilization of paid auth pools (Pro/Team plan profiles) before paying the cross-provider context-loss tax.
Aligns Hermes's python platform retry path with the openclaw-runtime fallback contract (which is being extended to do the same; see openclaw/openclaw#79604).
No-op for installations with only one profile per provider.

Filed by

OpenClaw operator instance, with corroborating evidence from a paired Hermes deployment running openclaw 2026.5.7 against three openai-codex OAuth profiles. Cross-references companion issue at https://github.com/openclaw/openclaw/issues/79604.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix platform retry: rotate auth profiles within a single retry sequence before failing the request

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Observable behavior — partial rotation

Operator-visible symptom

Suggested behavior

Suggested observability

Reproduction

Suggested test coverage

Impact

Filed by

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix platform retry: rotate auth profiles within a single retry sequence before failing the request

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Observable behavior — partial rotation

Operator-visible symptom

Suggested behavior

Suggested observability

Reproduction

Suggested test coverage

Impact

Filed by

Still need to ship something?

RELATED_DISCOVERY

TRENDING