hermes - ✅(Solved) Fix Single-key credential pool: rate-limit cooldown causes init-time RuntimeError, fallback chain never tried [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17929Fetched 2026-05-01 05:55:03
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
labeled ×3cross-referenced ×2commented ×1

When a provider's credential_pool contains only one entry and that entry is currently in 429-cooldown (marked exhausted by mark_exhausted_and_rotate), AIAgent.__init__ raises RuntimeError before the fallback chain is constructed. As a result, every fresh agent (cron jobs, new gateway sessions, etc.) fails with a misleading "no API key was found" message even though both:

  • the credential exists in auth.json (status exhausted, not deleted), and
  • a valid fallback_providers entry is configured and has working credentials.

The eager-fallback safety at run_agent.py:11880 only protects in-flight requests, not init.

Error Message

Init raises immediately. The user-facing error suggests setting an env var that — by design — is not where Hermes stores this provider's key, leading to misdiagnosis (the user opens .env, sees no ALIBABA_CODING_PLAN_API_KEY, and concludes the key is gone).

Root Cause

When a provider's credential_pool contains only one entry and that entry is currently in 429-cooldown (marked exhausted by mark_exhausted_and_rotate), AIAgent.__init__ raises RuntimeError before the fallback chain is constructed. As a result, every fresh agent (cron jobs, new gateway sessions, etc.) fails with a misleading "no API key was found" message even though both:

  • the credential exists in auth.json (status exhausted, not deleted), and
  • a valid fallback_providers entry is configured and has working credentials.

The eager-fallback safety at run_agent.py:11880 only protects in-flight requests, not init.

Fix Action

Fix / Workaround

Patch outline (working locally):

PR fix notes

PR #17958: fix(agent): try fallback providers at init when primary credential pool is exhausted

Description (problem / solution / changelog)

Summary

When a provider's credential_pool contains only one entry and that entry is in 429-cooldown, resolve_provider_client returns None and AIAgent.__init__ raises a misleading RuntimeError suggesting the API key is missing — even when valid fallback_providers are configured.

This patch makes __init__ iterate the fallback chain before raising, mirroring the existing in-flight fallback logic in the request loop. If a fallback resolves, the agent initializes against it and sets _fallback_activated=True so _restore_primary_runtime can pick the primary back up after cooldown.

Changes

  • run_agent.py (~1465): Before raising RuntimeError for missing credentials, iterate fallback_model entries and try resolve_provider_client on each. If one resolves, use it as the effective primary.
  • run_agent.py (~1560): Preserve _fallback_activated flag set during init-time fallback (was unconditionally reset to False).
  • run_agent.py (~1501): Guard the "No provider configured" raise to skip when fallback was activated.

Test plan

  • Added tests/run_agent/test_init_fallback_on_exhausted_pool.py with two tests:
    1. test_init_tries_fallback_when_primary_returns_none — verifies agent initializes with fallback provider
    2. test_init_raises_when_no_fallback_configured — verifies original error is preserved when no fallback exists
  • All 1166 existing tests/run_agent/ tests pass

Closes #17929

Changed files

  • run_agent.py (modified, +42/-10)
  • tests/run_agent/test_init_fallback_on_exhausted_pool.py (added, +69/-0)

Code Example

model:
     default: qwen3.6-plus
     provider: alibaba-coding-plan
     fallback_providers:
       - provider: tencent-token-plan
         model: kimi2.5

---

RuntimeError: Provider 'alibaba-coding-plan' is set in config.yaml but no API key was found.
   Set the ALIBABA_CODING_PLAN_API_KEY environment variable, or switch to a different provider with `hermes model`.

---

# Before the existing `raise RuntimeError(...)` for missing creds:
for fb in (fallback_model if isinstance(fallback_model, list) else [fallback_model]):
    if not isinstance(fb, dict):
        continue
    fb_client, fb_model = resolve_provider_client(
        fb["provider"], model=fb["model"], raw_codex=True,
        explicit_base_url=fb.get("base_url"), explicit_api_key=fb.get("api_key"),
    )
    if fb_client is None:
        continue
    self.provider = fb["provider"]
    self.model = fb_model or fb["model"]
    self._fallback_activated = True
    client_kwargs = {"api_key": fb_client.api_key, "base_url": str(fb_client.base_url)}
    break
else:
    raise RuntimeError(...)

---

14:35:59 INFO  root: Fallback activated: qwen3.6-plus → qwen3.6-plus (alibaba-coding-plan)
14:36:01 INFO  agent.credential_pool: marking coding plan key exhausted (status=429), rotating
14:36:01 INFO  agent.credential_pool: no available entries (all exhausted or empty)
14:36:12 ERROR root: API call failed after 3 retries. HTTP 429: usage allocated quota exceeded.
15:06:17 ERROR cron.scheduler: Job '...' failed: RuntimeError: Provider 'alibaba-coding-plan' is set
                in config.yaml but no API key was found.
[repeats every cron tick for 3+ hours until pool cooldown TTL elapses]
RAW_BUFFERClick to expand / collapse

Summary

When a provider's credential_pool contains only one entry and that entry is currently in 429-cooldown (marked exhausted by mark_exhausted_and_rotate), AIAgent.__init__ raises RuntimeError before the fallback chain is constructed. As a result, every fresh agent (cron jobs, new gateway sessions, etc.) fails with a misleading "no API key was found" message even though both:

  • the credential exists in auth.json (status exhausted, not deleted), and
  • a valid fallback_providers entry is configured and has working credentials.

The eager-fallback safety at run_agent.py:11880 only protects in-flight requests, not init.

Affected versions

hermes-agent HEAD 2d137074a (refactor(config): add cfg_get() helper; migrate 20 nested-get call sites (#17304)). The misleading branch dates back to whenever the explicit "fail fast instead of silently routing through OpenRouter" behavior was added.

Repro

  1. Configure a single-key pool for any provider with daily-quota rate limits, e.g. alibaba-coding-plan:
    model:
      default: qwen3.6-plus
      provider: alibaba-coding-plan
      fallback_providers:
        - provider: tencent-token-plan
          model: kimi2.5
  2. Drive enough traffic through cron to hit HTTP 429: usage allocated quota exceeded.
  3. After 429, agent.credential_pool correctly logs marking coding plan key exhausted (status=429), rotating and credential pool: no available entries (all exhausted or empty).
  4. Every subsequent fresh AIAgent(...) (the next cron firing, the next gateway message that builds a new agent) fails with:
    RuntimeError: Provider 'alibaba-coding-plan' is set in config.yaml but no API key was found.
    Set the ALIBABA_CODING_PLAN_API_KEY environment variable, or switch to a different provider with `hermes model`.

Real call site: run_agent.py:1441-1445. The else: branch executes when resolve_provider_client returns None, which happens cleanly when the pool has entries but none are currently available().

Expected behavior

When a fallback chain is configured, init-time credential exhaustion should not abort. The agent should construct a client against the first usable fallback entry and surface an _emit_status notification ("⚠️ Primary rate-limited, using fallback X/Y").

Observed behavior

Init raises immediately. The user-facing error suggests setting an env var that — by design — is not where Hermes stores this provider's key, leading to misdiagnosis (the user opens .env, sees no ALIBABA_CODING_PLAN_API_KEY, and concludes the key is gone).

Proposed fix

In run_agent.py:__init__, before raising the "no API key was found" RuntimeError, iterate the fallback_model argument and try resolve_provider_client for each entry. If any resolves, use it as the effective primary client and set self._fallback_activated = True so the existing _restore_primary_runtime machinery can pick the primary back up after cooldown.

Patch outline (working locally):

# Before the existing `raise RuntimeError(...)` for missing creds:
for fb in (fallback_model if isinstance(fallback_model, list) else [fallback_model]):
    if not isinstance(fb, dict):
        continue
    fb_client, fb_model = resolve_provider_client(
        fb["provider"], model=fb["model"], raw_codex=True,
        explicit_base_url=fb.get("base_url"), explicit_api_key=fb.get("api_key"),
    )
    if fb_client is None:
        continue
    self.provider = fb["provider"]
    self.model = fb_model or fb["model"]
    self._fallback_activated = True
    client_kwargs = {"api_key": fb_client.api_key, "base_url": str(fb_client.base_url)}
    break
else:
    raise RuntimeError(...)

Secondary issue (will file separately if confirmed)

_pool_may_recover_from_rate_limit returns False for len(entries) == 1, which correctly triggers eager fallback in the request loop, but _recover_with_credential_pool for FailoverReason.rate_limit first burns one retry on the same exhausted credential (if not has_retried_429: return False, True). On a known single-credential pool this retry is wasted — quota won't reset within the retry window. Consider gating the "retry once on first 429" path on _pool_may_recover_from_rate_limit(pool).

Logs (anonymized)

14:35:59 INFO  root: Fallback activated: qwen3.6-plus → qwen3.6-plus (alibaba-coding-plan)
14:36:01 INFO  agent.credential_pool: marking coding plan key exhausted (status=429), rotating
14:36:01 INFO  agent.credential_pool: no available entries (all exhausted or empty)
14:36:12 ERROR root: API call failed after 3 retries. HTTP 429: usage allocated quota exceeded.
15:06:17 ERROR cron.scheduler: Job '...' failed: RuntimeError: Provider 'alibaba-coding-plan' is set
                in config.yaml but no API key was found.
[repeats every cron tick for 3+ hours until pool cooldown TTL elapses]

extent analysis

TL;DR

The proposed fix involves iterating through the fallback_model argument in AIAgent.__init__ to find a usable fallback entry when the primary credential is exhausted.

Guidance

  • Before raising the "no API key was found" RuntimeError, iterate through the fallback_model argument to try resolve_provider_client for each entry.
  • If a usable fallback entry is found, use it as the effective primary client and set self._fallback_activated = True.
  • Verify that the fallback chain is correctly configured and that the fallback_providers entry has working credentials.
  • Test the proposed fix by driving enough traffic through cron to hit the HTTP 429: usage allocated quota exceeded error and verifying that the agent constructs a client against the first usable fallback entry.

Example

for fb in (fallback_model if isinstance(fallback_model, list) else [fallback_model]):
    if not isinstance(fb, dict):
        continue
    fb_client, fb_model = resolve_provider_client(
        fb["provider"], model=fb["model"], raw_codex=True,
        explicit_base_url=fb.get("base_url"), explicit_api_key=fb.get("api_key"),
    )
    if fb_client is None:
        continue
    self.provider = fb["provider"]
    self.model = fb_model or fb["model"]
    self._fallback_activated = True
    client_kwargs = {"api_key": fb_client.api_key, "base_url": str(fb_client.base_url)}
    break
else:
    raise RuntimeError(...)

Notes

  • The proposed fix assumes that the fallback_providers entry is correctly configured and has working credentials.
  • The fix may not work if the fallback_providers entry is not correctly configured or if the credentials are not valid.

Recommendation

Apply the proposed workaround by iterating through the fallback_model argument in AIAgent.__init__ to find

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When a fallback chain is configured, init-time credential exhaustion should not abort. The agent should construct a client against the first usable fallback entry and surface an _emit_status notification ("⚠️ Primary rate-limited, using fallback X/Y").

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Single-key credential pool: rate-limit cooldown causes init-time RuntimeError, fallback chain never tried [1 pull requests, 1 comments, 2 participants]