hermes - 💡(How to fix) Fix [Feature]: Multi-provider credential pools for cross-provider failover and rotation [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11737Fetched 2026-04-18 05:59:07
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Error Message

  1. If the selected entry fails with an auth/rate-limit error, mark it exhausted and rotate to the next entry
  • Error classification: Not all errors should trigger pool rotation — only auth (401/403), rate-limit (429), and billing/quota (402) errors. Server errors (500/502/503) should use retry logic instead.

Code Example

pools:
  openai-compatible:
    strategy: round_robin      # fill_first | round_robin | random | least_used
    entries:
      - provider: openai-codex
        label: "Codex Plus"
        # Uses pool entry from auth.json (openai-codex)

      - provider: custom
        base_url: https://api.openai.com/v1
        api_key_env: OPENAI_API_KEY_DIRECT
        label: "OpenAI Direct"
        api_mode: chat_completions

      - provider: custom
        base_url: https://ollama.com/v1
        pool: custom:local      # Reference existing custom:local pool from auth.json
        label: "Ollama Cloud"
        api_mode: chat_completions

  anthropic-compatible:
    strategy: fill_first
    entries:
      - provider: anthropic
        # Uses pool entry from auth.json (anthropic)

      - provider: custom
        base_url: https://opencode.ai/zen/go/v1
        pool: custom:opencode-go
        label: "OpenCode Go"
        api_mode: anthropic_messages

---

model:
  default: gpt-5.4
  provider: pool:openai-compatible   # Route through a multi-provider pool

  # Or, for single-model failover:
  fallback_providers:
    - provider: pool:openai-compatible
      model: gpt-5.4
RAW_BUFFERClick to expand / collapse

Problem or Use Case

Hermes credential pools are currently scoped to a single provider. Each pool key (e.g., "custom:local", "openai-codex", "copilot") maps to credentials that connect to one API endpoint. When all credentials in a pool are exhausted (rate-limited, billing-quota-hit, or auth-failed), the agent errors out even if other providers have available capacity with compatible models.

This is painful for users who:

  1. Pay for multiple providers (OpenRouter, OpenAI, Anthropic, custom endpoints) and want to use them interchangeably for the same model family (e.g., GPT-5.4 via OpenRouter vs. OpenAI direct).
  2. Run custom providers that serve the same model through different gateways (e.g., ollama.com/v1 and a local Ollama instance both serving glm-5.1) and want automatic failover between them.
  3. Want cost optimization by routing to the cheapest available provider for a given model without manual config changes.
  4. Need resilience — when one provider goes down or rate-limits, seamlessly fall through to the next provider serving a compatible model.

Currently, fallback_model/fallback_providers provides provider-level failover, but it requires the agent to fully fail before trying the next provider. It does not pool credentials across providers for smart selection within a single turn. The provider_routing config (only/ignore/order) restricts which providers OpenRouter uses — it does not bridge across different provider types.

What is needed is a way to define a multi-provider credential pool that groups credentials from different providers (potentially with different base_url, api_mode, and api_key values) into a single failover-capable pool that the CredentialPool class can rotate through.

Proposed Solution

Config-level: Add pools section to config.yaml

pools:
  openai-compatible:
    strategy: round_robin      # fill_first | round_robin | random | least_used
    entries:
      - provider: openai-codex
        label: "Codex Plus"
        # Uses pool entry from auth.json (openai-codex)

      - provider: custom
        base_url: https://api.openai.com/v1
        api_key_env: OPENAI_API_KEY_DIRECT
        label: "OpenAI Direct"
        api_mode: chat_completions

      - provider: custom
        base_url: https://ollama.com/v1
        pool: custom:local      # Reference existing custom:local pool from auth.json
        label: "Ollama Cloud"
        api_mode: chat_completions

  anthropic-compatible:
    strategy: fill_first
    entries:
      - provider: anthropic
        # Uses pool entry from auth.json (anthropic)

      - provider: custom
        base_url: https://opencode.ai/zen/go/v1
        pool: custom:opencode-go
        label: "OpenCode Go"
        api_mode: anthropic_messages

Model config integration

model:
  default: gpt-5.4
  provider: pool:openai-compatible   # Route through a multi-provider pool

  # Or, for single-model failover:
  fallback_providers:
    - provider: pool:openai-compatible
      model: gpt-5.4

Runtime behavior

When provider resolves to a pool: reference, resolve_runtime_provider() would:

  1. Load the named pool definition from config
  2. Iterate entries using the pool strategy (round_robin, least_used, etc.)
  3. For each entry, resolve credentials (pool reference from auth.json, env var, or explicit key)
  4. If the selected entry fails with an auth/rate-limit error, mark it exhausted and rotate to the next entry
  5. Return the full runtime dict (provider, api_key, base_url, api_mode, credential_pool) just like a single-provider pool does today

Implementation approach

The core change is in hermes_cli/runtime_provider.py — when resolve_runtime_provider() receives a pool:<name> provider string:

  • Load the pool definition from config
  • Create a MultiProviderPool that wraps multiple CredentialPool instances (or inline entries)
  • MultiProviderPool.select() iterates entries using the configured strategy
  • On mark_exhausted_and_rotate(), advance to the next entry in the pool
  • Each entry contributes a fully-resolved provider, base_url, api_key, api_mode tuple

The existing CredentialPool class stays unchanged — single-provider pools continue working as before. MultiProviderPool is additive.

Key challenges

  • Model compatibility: A pool entry should declare which models it serves, so the agent does not route a gpt-5.4 request to an Anthropic-only endpoint. Could use the existing custom_providers.models config pattern.
  • api_mode bridging: Entries in the same pool may need different api_mode values (e.g., chat_completions for OpenAI-compatible, anthropic_messages for Claude endpoints). The pool entry must carry its own api_mode.
  • Error classification: Not all errors should trigger pool rotation — only auth (401/403), rate-limit (429), and billing/quota (402) errors. Server errors (500/502/503) should use retry logic instead.

Alternatives Considered

  1. Use fallback_providers for cross-provider failover — This already exists but requires a full agent failure before trying the next provider. It does not pre-emptively rotate credentials or do smart selection. It is also a separate config that does not integrate with pool strategies.

  2. Multiple custom_providers entries pointing to different providers — The current custom_providers list only defines endpoint metadata and model lists. It does not create a failover pool. Each custom provider still resolves to one pool key in auth.json.

  3. Write a custom hook that catches errors and re-resolves — The hook system (gateway/hooks.py) fires agent:end events but cannot modify the in-flight request or switch providers mid-turn.

  4. Shell script wrapper — An external tool cannot inject credentials into a running Hermes agent mid-session.

Feature Type

Gateway / messaging improvement

Scope

Medium (2-3 files, ~300-500 lines)

Additional Context

I investigated the full credential pool and runtime provider resolution pipeline:

  • agent/credential_pool.pyCredentialPool class is provider-scoped (keyed by provider name like "openai-codex", "custom:local")
  • agent/credential_pool.py:346get_pool_strategy() reads credential_pool_strategies from config, accepts fill_first, round_robin, random, least_used
  • hermes_cli/runtime_provider.py:704resolve_runtime_provider() loads a single pool via load_pool(provider) and returns a single credential_pool reference
  • hermes_cli/runtime_provider.py:170-230_resolve_runtime_from_pool_entry() builds the runtime dict with provider, base_url, api_key, api_mode, credential_pool
  • run_agent.py:611-1088AIAgent.__init__ accepts fallback_model (single dict or list of provider dicts), but this is a separate failover chain, not pool-integrated
  • Existing provider_routing config only controls OpenRouter routing preferences, not cross-provider pooling

The custom:local pool in our deployment uses round_robin strategy across 4 Ollama API keys, which works great for same-provider key rotation. Extending this concept across providers is the natural next step.

extent analysis

TL;DR

Implement a MultiProviderPool class that allows credential pooling across different providers, enabling smart selection and failover within a single turn.

Guidance

  1. Update hermes_cli/runtime_provider.py: Modify resolve_runtime_provider() to handle pool:<name> provider strings by loading the pool definition from config and creating a MultiProviderPool instance.
  2. Create MultiProviderPool class: Design a class that wraps multiple CredentialPool instances and iterates entries using a configured strategy (e.g., round-robin, least-used).
  3. Integrate with config.yaml: Add a pools section to config.yaml to define multi-provider credential pools, including strategies and entries with provider-specific details.
  4. Implement error classification: Distinguish between errors that should trigger pool rotation (auth, rate-limit, billing/quota) and those that require retry logic (server errors).

Example

class MultiProviderPool:
    def __init__(self, pool_name, strategy, entries):
        self.pool_name = pool_name
        self.strategy = strategy
        self.entries = entries

    def select(self):
        # Iterate entries using the configured strategy
        for entry in self.entries:
            # Resolve credentials and return the full runtime dict
            # ...
            pass

Notes

  • The implementation should ensure model compatibility between pool entries.
  • api_mode bridging may require additional logic to handle different api_mode values within the same pool.
  • Error classification is crucial to prevent unnecessary pool rotation.

Recommendation

Apply the proposed solution by implementing the MultiProviderPool class and updating hermes_cli/runtime_provider.py to support multi-provider credential pools. This will enable smart selection and failover across different providers, improving resilience and cost optimization.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING