For pooled auxiliary providers: 1. Retry transient 429 once on the same credential if desired. 2. Mark the failed pool entry exhausted and rotate to the next credential. 3. Rebuild the auxiliary client against the new pool entry. 4. Ensure cache lookup cannot reuse the stale client after rotation. 5. Only after same-provider pool recovery fails should broader provider fallback be considered (and only for `provider=auto`).

hermes - 💡(How to fix) Fix bug(auxiliary): pooled Codex compression can stay pinned to exhausted auth after rotation [2 pull requests]

StepCodex · 2026-05-09T19:44:05Z

[hermes] agent/auxiliary client.py caches pooled auxiliary clients by provider, async mode, base url, api key, api mode, runtime key, is vision but not by the… `agent/auxiliary_client.py` caches pooled auxiliary clients by `(provider, async_mode, base_url, api_key, api_mode, runtime_key, is_vision)` but **not by the active credential-pool entry id**. For pooled OAuth providers — most visibly `openai-codex` on compression/title/session-search auxiliary tasks — this means a cached client can stay pinned to the credential that just hit `429 usage_limit_reached`, even after the pool rotates to a different auth. This shows up most clearly when `auxiliary. .provider` is explicitly set to `openai-codex`: provider fallback is intentionally disabled for explicit providers, so the missing same-provider pool rotation/cache invalidation becomes user-visible immediately. Related but not identical to: - #22212 — broader platform retry / profile-rotation contract - #22415 — open PR for Codex auxiliary quota rotation ## Fixed - Fixed by PR: fix(auxiliary): rotate pooled auth after quota failures (https://github.com/NousResearch/hermes-agent/pull/22779) - Fixed by PR: fix(auxiliary): rotate pooled auth after quota failures (salvage #22779) (https://github.com/NousResearch/hermes-agent/pull/22792) ## Summary `agent/auxiliary_client.py` caches pooled auxiliary clients by `(provider, async_mode, base_url, api_key, api_mode, runtime_key, is_vision)` but **not by the active credential-pool entry id**. For pooled OAuth providers — most visibly `openai-codex` on compression/title/session-search auxiliary tasks — this means a cached client can stay pinned to the credential that just hit `429 usage_limit_reached`, even after the pool rotates to a different auth. This shows up most clearly when `auxiliary. .provider` is explicitly set to `openai-codex`: provider fallback is intentionally disabled for explicit providers, so the missing same-provider pool rotation/cache invalidation becomes user-visible immediately. Related but not identical to: - #22212 — broader platform retry / profile-rotation contract - #22415 — open PR for Codex auxiliary quota rotation ## Environment - Hermes Agent on `main` - `auxiliary.compression.provider: openai-codex` - multiple pooled `openai-codex` OAuth credentials - `credential_pool_strategies.openai-codex: fill_first` ## Reproduction 1. Configure 2+ `openai-codex` pooled auth entries. 2. Set `auxiliary.compression.provider` (or another auxiliary task provider) explicitly to `openai-codex`. 3. Ensure the currently selected pool entry returns `429 usage_limit_reached`. 4. Trigger context compression / summary generation repeatedly. ## Observed behavior - Main agent requests can recover because the main loop has credential-pool-aware retry/rotation. - Auxiliary compression fails with repeated `429 usage_limit_reached`. - Logs show auxiliary failures without clean same-provider recovery: - `Failed to generate context summary: Error code: 429 ... usage_limit_reached` - `API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex ...` - Even when the pool rotates, the auxiliary cached client can remain pinned to the old exhausted credential unless the cache is explicitly evicted or the cache key changes. ## Expected behavior For pooled auxiliary providers: 1. Retry transient 429 once on the same credential if desired. 2. Mark the failed pool entry exhausted and rotate to the next credential. 3. Rebuild the auxiliary client against the new pool entry. 4. Ensure cache lookup cannot reuse the stale client after rotation. 5. Only after same-provider pool recovery fails should broader provider fallback be considered (and only for `provider=auto`). ## Root cause Two bugs compound here: 1. **Auxiliary path parity gap:** `call_llm()` / `async_call_llm()` lacked same-provider credential-pool recovery for quota/rate-limit failures, unlike the main agent loop. 2. **Cache pinning:** `_client_cache_key()` did not include any discriminator for the currently active pooled credential, so auxiliary cache reuse could outlive pool rotation. ## Suggested fix - Add same-provider pool recovery for auxiliary clients on 401/402/429 where appropriate. - Include a non-mutating pool-entry discriminator (e.g. current/peek credential id) in the auxiliary client cache key for pooled providers, including `provider="auto"` when the resolved main provider comes from a pool. - Add regression tests for: - explicit Codex auxiliary 429 -> rotate to next pooled auth without cross-provider fallback - async auxiliary parity - cached Codex auxiliary client rebuild when the active pool entry changes ## Impact - Fixes context compression/title/session-search failures where healthy pooled Codex auths exist. - Prevents auxiliary tasks from getting “stuck” on an exhausted cached OAuth client. - Brings auxiliary behavior closer to the main agent loop's credential-pool recovery semantics.

agent/auxiliary_client.py caches pooled auxiliary clients by (provider, async_mode, base_url, api_key, api_mode, runtime_key, is_vision) but not by the active credential-pool entry id. For pooled OAuth providers — most visibly openai-codex on compression/title/session-search auxiliary tasks — this means a cached client can stay pinned to the credential that just hit 429 usage_limit_reached, even after the pool rotates to a different auth.

This shows up most clearly when auxiliary.<task>.provider is explicitly set to openai-codex: provider fallback is intentionally disabled for explicit providers, so the missing same-provider pool rotation/cache invalidation becomes user-visible immediately.

Related but not identical to:

#22212 — broader platform retry / profile-rotation contract
#22415 — open PR for Codex auxiliary quota rotation

Error Message

Main agent requests can recover because the main loop has credential-pool-aware retry/rotation.
Auxiliary compression fails with repeated 429 usage_limit_reached.
Logs show auxiliary failures without clean same-provider recovery:
- Failed to generate context summary: Error code: 429 ... usage_limit_reached
- API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex ...
Even when the pool rotates, the auxiliary cached client can remain pinned to the old exhausted credential unless the cache is explicitly evicted or the cache key changes.

Root Cause

Two bugs compound here:

Auxiliary path parity gap: call_llm() / async_call_llm() lacked same-provider credential-pool recovery for quota/rate-limit failures, unlike the main agent loop.
Cache pinning: _client_cache_key() did not include any discriminator for the currently active pooled credential, so auxiliary cache reuse could outlive pool rotation.

Summary

Related but not identical to:

#22212 — broader platform retry / profile-rotation contract
#22415 — open PR for Codex auxiliary quota rotation

Environment

Hermes Agent on main
auxiliary.compression.provider: openai-codex
multiple pooled openai-codex OAuth credentials
credential_pool_strategies.openai-codex: fill_first

Reproduction

Configure 2+ openai-codex pooled auth entries.
Set auxiliary.compression.provider (or another auxiliary task provider) explicitly to openai-codex.
Ensure the currently selected pool entry returns 429 usage_limit_reached.
Trigger context compression / summary generation repeatedly.

Observed behavior

Main agent requests can recover because the main loop has credential-pool-aware retry/rotation.
Auxiliary compression fails with repeated 429 usage_limit_reached.
Logs show auxiliary failures without clean same-provider recovery:
- Failed to generate context summary: Error code: 429 ... usage_limit_reached
- API call failed after 3 retries. HTTP 429: The usage limit has been reached | provider=openai-codex ...
Even when the pool rotates, the auxiliary cached client can remain pinned to the old exhausted credential unless the cache is explicitly evicted or the cache key changes.

Expected behavior

For pooled auxiliary providers:

Retry transient 429 once on the same credential if desired.
Mark the failed pool entry exhausted and rotate to the next credential.
Rebuild the auxiliary client against the new pool entry.
Ensure cache lookup cannot reuse the stale client after rotation.
Only after same-provider pool recovery fails should broader provider fallback be considered (and only for provider=auto).

Root cause

Two bugs compound here:

Auxiliary path parity gap: call_llm() / async_call_llm() lacked same-provider credential-pool recovery for quota/rate-limit failures, unlike the main agent loop.
Cache pinning: _client_cache_key() did not include any discriminator for the currently active pooled credential, so auxiliary cache reuse could outlive pool rotation.

Suggested fix

Add same-provider pool recovery for auxiliary clients on 401/402/429 where appropriate.
Include a non-mutating pool-entry discriminator (e.g. current/peek credential id) in the auxiliary client cache key for pooled providers, including provider="auto" when the resolved main provider comes from a pool.
Add regression tests for:
- explicit Codex auxiliary 429 -> rotate to next pooled auth without cross-provider fallback
- async auxiliary parity
- cached Codex auxiliary client rebuild when the active pool entry changes

Impact

Fixes context compression/title/session-search failures where healthy pooled Codex auths exist.
Prevents auxiliary tasks from getting “stuck” on an exhausted cached OAuth client.
Brings auxiliary behavior closer to the main agent loop's credential-pool recovery semantics.

FAQ

Expected behavior

For pooled auxiliary providers:

Retry transient 429 once on the same credential if desired.
Mark the failed pool entry exhausted and rotate to the next credential.
Rebuild the auxiliary client against the new pool entry.
Ensure cache lookup cannot reuse the stale client after rotation.
Only after same-provider pool recovery fails should broader provider fallback be considered (and only for provider=auto).

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix bug(auxiliary): pooled Codex compression can stay pinned to exhausted auth after rotation [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Summary

Environment

Reproduction

Observed behavior

Expected behavior

Root cause

Suggested fix

Impact

FAQ

Expected behavior

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix bug(auxiliary): pooled Codex compression can stay pinned to exhausted auth after rotation [2 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

Summary

Environment

Reproduction

Observed behavior

Expected behavior

Root cause

Suggested fix

Impact

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING