hermes - ✅(Solved) Fix [Bug]: Prompt caching silently disabled for MiniMax's own models (MiniMax-M2.7 etc.) on anthropic_messages transport [3 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17332Fetched 2026-04-30 06:48:20
View on GitHub
Comments
1
Participants
1
Timeline
14
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×3referenced ×3closed ×1

Root Cause

Currently, _anthropic_prompt_cache_policy() returns (False, False) for MiniMax because is_claude is False for model="minimax-m2.7".

Fix Action

Fixed

PR fix notes

PR #17333: fix(agent): enable prompt caching for MiniMax own models on anthropic_messages transport

Description (problem / solution / changelog)

Closes #17332

PR #12846 enabled Anthropic prompt caching for third-party gateways, but gated it on is_claude, which excluded providers like MiniMax that serve their own model families (MiniMax-M2.7, etc.) through the native Anthropic protocol.

MiniMax documents full cache_control support on its /anthropic endpoints (global and China). This patch adds MiniMax detection to _anthropic_prompt_cache_policy() using:

  • Built-in provider id (minimax, minimax-cn), or
  • Known Anthropic-compatible hostname (api.minimax.io, api.minimaxi.com)

Both paths receive the native cache_control layout, consistent with how Hermes already treats native Anthropic and Claude-on-third-party gateways.

Tests: 3 new cases, 19 total in the policy suite.

Refs: #8294 (related, but only covered Claude-named models on third-party gateways).

Changed files

  • .env.example (modified, +2/-4)
  • run_agent.py (modified, +29/-5)
  • tests/agent/test_minimax_provider.py (modified, +97/-1)
  • tests/run_agent/test_anthropic_prompt_cache_policy.py (modified, +63/-4)

PR #17339: fix(agent): capability registry for Anthropic prompt-cache support (#17332)

Description (problem / solution / changelog)

Closes #17332. Alternative to (and supersedes) #17333.

Why a different fix

PR #17333 patches the symptom by hardcoding provider in ("minimax", "minimax-cn") and two specific hostnames in _anthropic_prompt_cache_policy. That fixes MiniMax — but the issue's "Scope beyond MiniMax" section explicitly asks for "a capability-based or provider-allowlist approach" that's future-proof.

This PR delivers that. Net result for users is the same (MiniMax M2.x now caches), but the next entrant — and operators running private gateways — get it for free.

Design

1. Provider capability flag in ProviderConfig.extra

Built-in providers declare cache support next to their existing auth/base_url config — single source of truth, no parallel list:

"minimax": ProviderConfig(
    id="minimax",
    inference_base_url="https://api.minimax.io/anthropic",
    api_key_env_vars=("MINIMAX_API_KEY",),
    extra={
        "anthropic_cache": True,
        "anthropic_cache_hosts": ("api.minimax.io",),
    },
),

2. New helper agent/anthropic_cache_capability.py

provider_supports_anthropic_cache(provider, base_url, user_configured_hosts=…) resolves in order:

  • a. ProviderConfig.extra['anthropic_cache'] for the provider id (case-insensitive).
  • b. Hostname match against registry hosts ∪ user-configured hosts.

(b) catches the very common provider: custom + MiniMax URL setup.

3. Operator escape hatch — agent.anthropic_cache_hosts

New config list. Anyone running a private LiteLLM/vLLM Anthropic-compatible deployment can opt in today without an upstream patch:

agent:
  anthropic_cache_hosts:
    - my-internal.gateway.local

Read once in __init__, cached on self._anthropic_cache_user_hosts_cached, threaded through the policy lookup. Malformed values are ignored — a typo never disables caching, only fails to enable it for a host.

4. Policy gains one additive branch

if is_anthropic_wire:
    if provider_supports_anthropic_cache(eff_provider, eff_base_url,
                                          user_configured_hosts=self._anthropic_cache_user_hosts()):
        return True, True

Pre-existing native Anthropic / OpenRouter / Claude-on-third-party branches are untouched.

Why this is better than #17333

#17333This PR
MiniMax M2.x caches today
Next entrant requires code patch❌ — flip one flag in ProviderConfig.extra
Operator can opt-in private gateway✅ via agent.anthropic_cache_hosts
Source of truth for built-in providersnew code pathreuses ProviderConfig
provider == "minimax" substring brittleness✅ exists❌ — registry-driven
Lines of policy logic per providergrows linearlyconstant

Tests

Existing test was using api.minimax.io as the "unknown" host — under this PR that hostname is deliberately known (registry entry), so the test was updated to use a truly unlisted host. That's the intended behavior change.

New TestCapabilityRegistryAnthropicCache (9 cases):

  • test_minimax_built_in_provider_caches_with_native_layout
  • test_minimax_cn_built_in_provider_caches_with_native_layout
  • test_custom_provider_pointing_at_minimax_host_caches
  • test_custom_provider_pointing_at_minimax_cn_host_caches
  • test_user_configured_host_opts_in_unknown_gateway
  • test_user_config_normalizes_case_and_whitespace
  • test_capability_branch_requires_anthropic_messages_transport
  • test_unlisted_anthropic_protocol_gateway_stays_off
  • test_provider_id_lookup_is_case_insensitive

New TestCapabilityHelperUnit (5 cases) — direct tests of the helper.

$ python -m pytest tests/run_agent/test_anthropic_prompt_cache_policy.py -q
..............................                                          [100%]
30 passed in 0.37s

$ python -m pytest tests/run_agent/ -q --ignore=tests/run_agent/test_run_agent.py
863 passed, 7 skipped in 120.97s

Zero regressions across tests/run_agent/.

Changed files

  • agent/anthropic_cache_capability.py (added, +136/-0)
  • hermes_cli/auth.py (modified, +11/-0)
  • run_agent.py (modified, +65/-5)
  • tests/run_agent/test_anthropic_prompt_cache_policy.py (modified, +162/-4)

PR #17425: fix(minimax): enable Anthropic prompt caching for MiniMax's own models

Description (problem / solution / changelog)

MiniMax on anthropic_messages transport now caches with the native layout — same cost reduction as Claude traffic through the same gateway.

Root cause

PR #12846 gated third-party Anthropic-wire caching on "claude" in model_lower. MiniMax serves its own model family (MiniMax-M2.7, M2.5, M2.1, M2) on api.minimax.io/anthropic — they don't match, fall through to (False, False), re-pay full input tokens every turn.

MiniMax documents cache_control support on this endpoint: https://platform.minimax.io/docs/api-reference/anthropic-api-compatible-cache (0.1× read, 1.25× write, 5-min TTL).

Changes

  • run_agent.py — new branch in _anthropic_prompt_cache_policy() after the Claude gateway check: if transport is anthropic_messages AND provider is minimax/minimax-cn OR host matches api.minimax.io/api.minimaxi.com, return (True, True).
  • tests/run_agent/test_anthropic_prompt_cache_policy.py — flipped the old pinned-negative test for minimax-m2.7 (which was pinning the bug) into a renamed unknown-host test, added a new TestMiniMaxAnthropicWire class with 5 cases: both provider ids, both hosts, and a chat_completions-transport negative.

Validation

BeforeAfter
provider=minimax + minimax-m2.7 on anthropic_messages(False, False)(True, True)
provider=minimax-cn + minimax-m2.5 on anthropic_messages(False, False)(True, True)
custom provider, host api.minimax.io/anthropic(False, False)(True, True)
provider=minimax on chat_completions (not /anthropic)(False, False)(False, False)
Unknown host + non-Claude model on anthropic_messages(False, False)(False, False)

tests/run_agent/test_anthropic_prompt_cache_policy.py — 21 passed (16 existing + 5 new).

Scope

Narrow allowlist mirroring the existing Qwen/Alibaba branch. If a third provider needs this later, a capability flag on ProviderConfig becomes the right factoring — for now the surface is small enough to inline.

Closes #17332

Changed files

  • run_agent.py (modified, +18/-0)
  • tests/run_agent/test_anthropic_prompt_cache_policy.py (modified, +63/-3)
RAW_BUFFERClick to expand / collapse

Bug Description

PR #12846 (which closed #8294) enabled Anthropic prompt caching for third-party gateways using api_mode: anthropic_messages, but only when the model name contains "claude". This leaves out providers that use the native Anthropic protocol for their own model families.

MiniMax is a concrete example. It serves MiniMax-M2.7, MiniMax-M2.5, etc. through https://api.minimax.io/anthropic (and api.minimaxi.com/anthropic for China). MiniMax documents explicit cache_control support on this endpoint:

Currently, _anthropic_prompt_cache_policy() returns (False, False) for MiniMax because is_claude is False for model="minimax-m2.7".

Expected Behavior

MiniMax (both built-in minimax / minimax-cn providers and custom endpoints pointing at MiniMax's Anthropic URLs) should receive cache_control breakpoints with the native layout, just like native Anthropic and Claude-on-third-party gateways.

Actual Behavior

No caching. The system prompt + tool schemas are re-billed in full on every turn.

Why this is a Hermes-side issue

MiniMax's Anthropic-compatible endpoint is already handled correctly for auth (Bearer token, stripped betas) and transport (anthropic_messages). The only missing piece is the cache-policy gate in _anthropic_prompt_cache_policy().

Scope beyond MiniMax

This same pattern likely affects any future provider that:

  1. Speaks the native Anthropic protocol (anthropic_messages)
  2. Documents cache_control support
  3. Serves its own (non-Claude) model family

A capability-based or provider-allowlist approach would be more future-proof than the current is_claude name check for third-party gateways.

Version

Current upstream main (post #12846).

extent analysis

TL;DR

Modify the _anthropic_prompt_cache_policy() function to support cache control for providers like MiniMax that use the native Anthropic protocol and document cache_control support.

Guidance

  • Identify providers that speak the native Anthropic protocol and document cache_control support, such as MiniMax.
  • Update the _anthropic_prompt_cache_policy() function to check for cache_control support instead of relying on the is_claude name check.
  • Consider implementing a capability-based or provider-allowlist approach to make the cache policy more future-proof.
  • Verify the fix by testing cache control with MiniMax and other affected providers.

Example

def _anthropic_prompt_cache_policy(model, provider):
    # Check if the provider documents cache_control support
    if provider.cache_control_supported:
        # Return cache policy based on provider's cache control support
        return (True, True)
    else:
        # Return default cache policy
        return (False, False)

Notes

This fix assumes that the cache_control_supported attribute is available for each provider. If not, an alternative approach would be to maintain a list of supported providers or use a more dynamic method to determine cache control support.

Recommendation

Apply a workaround by modifying the _anthropic_prompt_cache_policy() function to support cache control for providers like MiniMax, as this will allow for more flexibility and future-proofing.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: Prompt caching silently disabled for MiniMax's own models (MiniMax-M2.7 etc.) on anthropic_messages transport [3 pull requests, 1 comments, 1 participants]