hermes - ✅(Solved) Fix Auxiliary client cache routes wrong model when multiple tasks share provider/base_url [1 pull requests, 2 comments, 3 participants]

hermes2026-04-27 05:12:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16387•Fetched 2026-04-28 06:53:43

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

labeled ×4commented ×2cross-referenced ×2unsubscribed ×1

agent/auxiliary_client.py caches auxiliary clients keyed by (provider, async_mode, base_url, api_key, api_mode, runtime_key) — model is intentionally omitted. On cache hits, _compat_model() then drops any caller-supplied model that contains / for non-OpenRouter clients and falls back to cached_default (the model that happened to be configured the first time the client was created).

Net effect: when several auxiliary tasks point at the same custom provider but different models, all tasks end up using whichever model was set on the first task that warmed the cache, regardless of auxiliary.<task>.model in config.yaml.

Tested on Hermes Agent v0.11.0 (post-update, includes the #15033 / commit b29287258 fix).

Root Cause

Root Cause — two issues compounding

Fix Action

Workaround

Restart the gateway each time the cache might be primed against the wrong default — but the next first call (whichever wins the race) re-poisons the cache. Only durable fix is the source change above.

PR fix notes

PR #16410: fix(auxiliary): include model in client cache key

Repository: NousResearch/hermes-agent
Author: vominh1919
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16410

Description (problem / solution / changelog)

Problem

agent/auxiliary_client._client_cache_key() omits model from the cache key. When multiple auxiliary tasks (vision, compression, title_generation) share the same provider but specify different models, all tasks receive whichever model first warmed the cache.

Reproduction (from #16387):

auxiliary:
  vision:
    provider: myrelay
    model: google/gemini-3.1-flash-image-preview
  compression:
    provider: myrelay
    model: google/gemini-3-flash-preview
  title_generation:
    provider: myrelay
    model: google/gemini-3.1-flash-lite-preview

After vision warms the cache, compression and title_generation also use flash-image-preview.

Fix

Add model to the cache key tuple so each unique (provider, model, ...) combination gets its own cache entry.

Before: (provider, async_mode, base_url, api_key, api_mode, runtime_key) After: (provider, model, async_mode, base_url, api_key, api_mode, runtime_key)

Tests

New regression test in tests/agent/test_auxiliary_cache_key_model.py:

Different models produce different cache keys
Same model produces same key
None model equals empty string
Model is independent of provider

Fixes #16387 Fixes #14249

Changed files

agent/auxiliary_client.py (modified, +4/-1)
tests/agent/test_auxiliary_cache_key_model.py (added, +68/-0)

Code Example

providers:
  myrelay:
    name: myrelay
    base_url: https://example-relay.test/v1
    key_env: MYRELAY_API_KEY
    api_mode: chat_completions

auxiliary:
  vision:
    provider: myrelay
    model: google/gemini-3.1-flash-image-preview     # vision needs image-capable
  compression:
    provider: myrelay
    model: google/gemini-3-flash-preview              # text-only
  title_generation:
    provider: myrelay
    model: google/gemini-3.1-flash-lite-preview       # cheapest text-only

---

INFO agent.auxiliary_client: Auxiliary title_generation: using myrelay (google/gemini-3.1-flash-image-preview)
INFO agent.auxiliary_client: Auxiliary compression: using myrelay (google/gemini-3.1-flash-image-preview)

---

title_generation: provider=myrelay model=google/gemini-3.1-flash-lite-preview
compression:      provider=myrelay model=google/gemini-3-flash-preview
vision:           provider=myrelay model=google/gemini-3.1-flash-image-preview

---

def _client_cache_key(provider, *, async_mode, base_url=None, api_key=None,
                     api_mode=None, main_runtime=None) -> tuple:
    runtime = _normalize_main_runtime(main_runtime)
    runtime_key = tuple(...) if provider == "auto" else ()
    return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
    # model field absent

---

def _compat_model(client, model, cached_default) -> Optional[str]:
    """Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.

    Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
    """
    if model and "/" in model and not _is_openrouter_client(client):
        return cached_default     # user-requested model thrown away silently
    return model or cached_default

---

def _client_cache_key(provider, *, async_mode, base_url=None, api_key=None,
                     api_mode=None, model=None, main_runtime=None) -> tuple:
    runtime = _normalize_main_runtime(main_runtime)
    runtime_key = tuple(...) if provider == "auto" else ()
    return (provider, async_mode, base_url or "", api_key or "", api_mode or "",
            (model or "").strip().lower(), runtime_key)

RAW_BUFFERClick to expand / collapse

Auxiliary client cache routes wrong model when multiple tasks share provider/base_url

Summary

Tested on Hermes Agent v0.11.0 (post-update, includes the #15033 / commit b29287258 fix).

Reproduction

config.yaml:

providers:
  myrelay:
    name: myrelay
    base_url: https://example-relay.test/v1
    key_env: MYRELAY_API_KEY
    api_mode: chat_completions

auxiliary:
  vision:
    provider: myrelay
    model: google/gemini-3.1-flash-image-preview     # vision needs image-capable
  compression:
    provider: myrelay
    model: google/gemini-3-flash-preview              # text-only
  title_generation:
    provider: myrelay
    model: google/gemini-3.1-flash-lite-preview       # cheapest text-only

Once an auxiliary call goes through vision first (cache warmed with flash-image-preview), every subsequent compression and title_generation call also hits flash-image-preview. Logs confirm:

INFO agent.auxiliary_client: Auxiliary title_generation: using myrelay (google/gemini-3.1-flash-image-preview)
INFO agent.auxiliary_client: Auxiliary compression: using myrelay (google/gemini-3.1-flash-image-preview)

A fresh Python interpreter (no warm cache) confirms the config is read correctly:

title_generation: provider=myrelay model=google/gemini-3.1-flash-lite-preview
compression:      provider=myrelay model=google/gemini-3-flash-preview
vision:           provider=myrelay model=google/gemini-3.1-flash-image-preview

So the YAML is fine — the live gateway disagrees with the live config.

Root Cause — two issues compounding

Issue 1: `_client_cache_key` does not include `model`

agent/auxiliary_client.py:2186-2197:

def _client_cache_key(provider, *, async_mode, base_url=None, api_key=None,
                     api_mode=None, main_runtime=None) -> tuple:
    runtime = _normalize_main_runtime(main_runtime)
    runtime_key = tuple(...) if provider == "auto" else ()
    return (provider, async_mode, base_url or "", api_key or "", api_mode or "", runtime_key)
    # model field absent

Two distinct logical clients (different model selection) collapse to the same cache entry.

Issue 2: `_compat_model` silently swaps the requested model for `cached_default`

agent/auxiliary_client.py:2362-2369:

def _compat_model(client, model, cached_default) -> Optional[str]:
    """Drop OpenRouter-format model slugs (with '/') for non-OpenRouter clients.

    Mirrors the guard in resolve_provider_client() which is skipped on cache hits.
    """
    if model and "/" in model and not _is_openrouter_client(client):
        return cached_default     # user-requested model thrown away silently
    return model or cached_default

The docstring acknowledges the design rationale (cache hits skip the resolve_provider_client guard). But the fallback is too aggressive: any aggregator-style model slug (e.g. google/gemini-3.1-flash-lite-preview on a non-OpenRouter OpenAI-compatible base_url) gets reverted to whatever the first warm-up call left in cached_default, with no warning logged.

_is_openrouter_client only matches openrouter.ai, so legitimate OpenAI-compatible aggregators that do use vendor/model slugs in their public model IDs (LiteLLM passthrough, OpenRouter-format mirrors, third-party gateways like ofox, etc.) are all penalised.

Combined symptom

First aux call: vision → builds OpenAI client at myrelay, cached_default=google/gemini-3.1-flash-image-preview.
Second call: title_generation requests google/gemini-3.1-flash-lite-preview. Cache key matches (same provider/base/key/api_mode). Cache hit returns the same (client, cached_default) pair. _compat_model sees / and returns cached_default → caller silently uses flash-image-preview.
Same for compression and any other task pointed at the provider.

Suggested Fix (preferred)

Include model in the cache key. Different model selections deserve different client entries.

def _client_cache_key(provider, *, async_mode, base_url=None, api_key=None,
                     api_mode=None, model=None, main_runtime=None) -> tuple:
    runtime = _normalize_main_runtime(main_runtime)
    runtime_key = tuple(...) if provider == "auto" else ()
    return (provider, async_mode, base_url or "", api_key or "", api_mode or "",
            (model or "").strip().lower(), runtime_key)

Update both call sites (around lines 2244 and 2409) to pass model=. The _compat_model guard then becomes redundant (each cache entry already corresponds to a single model) and can be removed for simplicity.

Alternative

Keep cache key as-is but make _compat_model honour the caller-supplied model when the underlying client's base_url is an OpenAI-compatible aggregator that legitimately uses vendor/model slugs. This requires a maintained allowlist (or detection heuristic) for "aggregator-style" base URLs, which is fragile. The cache-key fix is cleaner.

Impact

Any user with multiple auxiliary tasks pointing at the same custom provider but different models silently runs all of them against whichever model warmed the cache first. Side effects:

Wrong model spent for the wrong task (cost / capability mismatch — e.g. paying image-preview rates for title generation).
Per-task cost/latency tuning in config.yaml becomes a no-op for the second-and-later cache hit.
Hard to diagnose: logs show the wrong model being used, but the config file looks correct.

Workaround

extent analysis

TL;DR

The most likely fix is to include the model in the cache key to prevent different model selections from collapsing to the same cache entry.

Guidance

Update the _client_cache_key function to include the model parameter and pass it to the function calls.
Modify the cache key to include the model field, as shown in the suggested fix.
Remove the _compat_model guard as it becomes redundant after the cache key update.
Verify the fix by checking the logs to ensure that each task is using the correct model.
Test the fix with multiple auxiliary tasks pointing to the same custom provider but different models.

Example

def _client_cache_key(provider, *, async_mode, base_url=None, api_key=None,
                     api_mode=None, model=None, main_runtime=None) -> tuple:
    runtime = _normalize_main_runtime(main_runtime)
    runtime_key = tuple(...) if provider == "auto" else ()
    return (provider, async_mode, base_url or "", api_key or "", api_mode or "",
            (model or "").strip().lower(), runtime_key)

Notes

The suggested fix assumes that the model field is the primary factor in determining the cache key. If there are other factors that need to be considered, additional modifications may be necessary.

Recommendation

Apply the suggested fix to include the model in the cache key, as it is a cleaner and more durable solution compared to maintaining an allowlist of "aggregator-style" base URLs.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #environment variable #network issue #logging issue #authentication issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Auxiliary client cache routes wrong model when multiple tasks share provider/base_url [1 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root Cause — two issues compounding

Fix Action

Workaround

PR fix notes

PR #16410: fix(auxiliary): include model in client cache key

Description (problem / solution / changelog)

Problem

Fix

Tests

Changed files

Code Example

Auxiliary client cache routes wrong model when multiple tasks share provider/base_url

Summary

Reproduction

Root Cause — two issues compounding

Issue 1: _client_cache_key does not include model

Issue 2: _compat_model silently swaps the requested model for cached_default

Combined symptom

Suggested Fix (preferred)

Alternative

Impact

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Issue 1: `_client_cache_key` does not include `model`

Issue 2: `_compat_model` silently swaps the requested model for `cached_default`