hermes - ✅(Solved) Fix [Bug] Credential pool not loaded when /model switches provider in gateway — causes fallback instead of rotation on 429 [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#16678Fetched 2026-04-28 06:51:40
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #16701: fix(gateway): propagate credential_pool through /model session overrides (#16678)

Description (problem / solution / changelog)

Summary

When a gateway session uses /model to switch to a different provider, the new provider's credential_pool was dropped — leaving the original provider's pool live across the switch. A 429 on the new provider then either rotated against the old provider's credentials (which don't apply to the new endpoint) or skipped pool rotation entirely and fell through to the configured fallback model.

Fixes #16678.

The bug

hermes_cli.runtime_provider.resolve_runtime_provider() returns credential_pool alongside api_key / base_url / api_mode for the resolved provider. Three layers conspired to drop it on /model switches:

  1. ModelSwitchResult had no credential_pool field, so switch_model() discarded what resolve_runtime_provider returned for the target provider.
  2. The gateway's two /model handlers (regular switch at gateway/run.py:6116, picker callback at gateway/run.py:5988) stored model/provider/api_key/base_url/api_mode on _session_model_overrides[session_key] — but not the pool.
  3. _apply_session_model_override propagated everything except credential_pool into runtime_kwargs, so on the next turn the runtime still carried the startup-time pool from the original provider.

Net effect: agent._credential_pool (the rotation source for _recover_with_credential_pool at run_agent.py:5761) belonged to the old provider after a /model switch.

The fix

  • hermes_cli/model_switch.py — add credential_pool to ModelSwitchResult, capture it from the runtime in switch_model() on both the explicit-provider path and the same-provider re-resolve path.
  • gateway/run.py — store credential_pool on _session_model_overrides[session_key] in both /model handler paths.
  • gateway/run.py_apply_session_model_override now also propagates credential_pool, with one deviation from the existing skip-if-None rule: an explicit None overrides the prior pool instead of being skipped. A switched-to provider may legitimately have no pool (single env-var key, custom local endpoint), and leaving the old pool in place would rotate against the wrong provider's credentials.

The legacy override shape (no credential_pool key at all) is preserved by the if \"credential_pool\" in override: gate, so any pre-fix override sitting in memory keeps the existing runtime pool rather than getting silently blanked.

Test plan

  • tests/hermes_cli/test_model_switch_credential_pool.pyModelSwitchResult.credential_pool round-trips through explicit-provider switches, same-provider re-resolves, and the no-pool case.
  • tests/gateway/test_session_model_override_routing.py — three new cases verify _apply_session_model_override propagates the override pool, replaces an existing pool with None when explicitly cleared, and keeps the existing runtime pool when the override predates this field.
  • Regression guard: temporarily reverted the production change and confirmed all 5 new assertions fail on clean origin/main with the precise expected AttributeError: 'ModelSwitchResult' object has no attribute 'credential_pool' / pool-identity mismatch, then restored.
  • Adjacent suites: full tests/hermes_cli/test_model_switch_*.py, tests/run_agent/test_switch_model_*.py, and gateway session/model tests pass (101 + 18, all green).

Related

Fixes #16678.

Adjacent prior work on the same surface, none of which addressed this gap:

  • #15455 (merged) — exponential backoff on repeated pool exhaustion
  • #15414 (merged) — route 429 "overloaded" to FailoverReason.overloaded
  • #15787 (merged) — honor custom_providers context_length on /model switch

Changed files

  • gateway/run.py (modified, +24/-3)
  • hermes_cli/model_switch.py (modified, +22/-1)
  • tests/gateway/test_session_model_override_routing.py (modified, +115/-0)
  • tests/hermes_cli/test_model_switch_credential_pool.py (added, +145/-0)
RAW_BUFFERClick to expand / collapse

When a gateway session uses /model to switch to a different provider than the global default, the credential pool for the new provider is not loaded. If the new provider hits a 429, pool rotation is skipped entirely and the session falls through to the configured fallback model instead of rotating to the next credential.

extent analysis

TL;DR

The issue can be addressed by ensuring the credential pool for the new provider is loaded when switching providers via the /model endpoint.

Guidance

  • Verify that the credential pool loading mechanism is correctly triggered when switching providers to identify why it's not loading for the new provider.
  • Check the logic for handling 429 errors to ensure it doesn't bypass pool rotation, allowing the session to fall back to the next credential instead of the fallback model.
  • Review the provider switching logic in the /model endpoint to ensure it properly updates the credential pool for the new provider.
  • Consider adding logging or debugging statements to track the credential pool loading and rotation process when switching providers.

Notes

The exact fix may depend on the specific implementation details of the credential pool loading and provider switching mechanisms, which are not provided in the issue description.

Recommendation

Apply workaround: Modify the provider switching logic to ensure the credential pool for the new provider is loaded and pool rotation is correctly handled when encountering 429 errors, to prevent sessions from falling back to the configured fallback model unnecessarily.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING