hermes - 💡(How to fix) Fix Prompt caching: add DeepSeek models to cache_control whitelist for OpenCode Go [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Right now prompt_caching is configured (cache_ttl: 5m) but does not work for deepseek-v4-pro on opencode-go. The model switcher shows cache read $0.01/M pricing, confirming the gateway supports caching — but Hermes never sends cache_control markers because the policy whitelist only includes:

Fix Action

Fixed

Code Example

provider_is_alibaba_family = provider_lower in {
    "opencode", "opencode-zen", "opencode-go", "alibaba",
}

---

# DeepSeek on OpenCode (Zen/Go): also supports cache_control
model_is_deepseek = "deepseek" in model_lower
if provider_is_alibaba_family and model_is_deepseek:
    return True, False

---

model_is_alibaba_family = model_is_qwen or ("deepseek" in model_lower)
RAW_BUFFERClick to expand / collapse

Feature Description

Extend _anthropic_prompt_cache_policy() in run_agent.py to enable Anthropic-style cache_control markers for DeepSeek models on opencode-go / opencode-zen providers — same as currently done for Qwen/Alibaba models.

Motivation

Right now prompt_caching is configured (cache_ttl: 5m) but does not work for deepseek-v4-pro on opencode-go. The model switcher shows cache read $0.01/M pricing, confirming the gateway supports caching — but Hermes never sends cache_control markers because the policy whitelist only includes:

  • Claude (Anthropic/OpenRouter)
  • MiniMax (Anthropic wire)
  • Qwen/Alibaba (OpenCode Go)

DeepSeek on OpenCode Go falls through to return False, False.

Impact

Without cache markers, every turn re-bills the full prompt. With 300K+ token conversations, this is a ~75% cost increase vs cached reads.

Where

run_agent.py:3454-3462 — the model_is_qwen branch. The same provider_is_alibaba_family set already includes opencode-go:

provider_is_alibaba_family = provider_lower in {
    "opencode", "opencode-zen", "opencode-go", "alibaba",
}

Proposed Solution

Add a model_is_deepseek check after the Qwen block, reusing the same envelope layout:

# DeepSeek on OpenCode (Zen/Go): also supports cache_control
model_is_deepseek = "deepseek" in model_lower
if provider_is_alibaba_family and model_is_deepseek:
    return True, False

Or consolidate into the existing branch:

model_is_alibaba_family = model_is_qwen or ("deepseek" in model_lower)

Evidence

  • OpenCode Go is a gateway that implements cache_control at its own level (not model-specific) — the same mechanism works for Qwen
  • Model switcher reports $0.01/M cache read for deepseek-v4-pro on OpenCode Go, indicating gateway-side cache support
  • The chat_completions wire format is used for both Qwen and DeepSeek on OpenCode Go — no api_mode difference

Risk

Zero. If the gateway doesn't honor cache for a specific model, the markers are silently ignored. Nothing breaks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING