hermes - 💡(How to fix) Fix feat: Enable DashScope explicit context caching for alibaba provider [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#24937Fetched 2026-05-14 03:50:27
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×4
RAW_BUFFERClick to expand / collapse

Problem

The alibaba provider plugin sends system/user content as plain strings, which prevents DashScope from using its explicit context caching feature. This feature reduces input token costs by 90% (hit cost: 10% of standard price).

What is Needed

DashScope explicit cache requires the cache_control marker on content:

The alibaba provider currently sends content as a plain string. It needs a prepare_messages() method that:

  1. Converts string content to array-of-dicts format
  2. Adds cache_control to the last part of the system message

Reference

The qwen-oauth provider already implements this pattern (see plugins/model-providers/qwen-oauth/init.py). The same logic should be applied to the alibaba provider.

Test Results

Verified manually with qwen3.6-plus on International (Singapore) region:

Request 1: cached_tokens=0, cache_creation=1204 (cache created) Request 2: cached_tokens=1204, cache_creation=0 (90% discount) Request 3: cached_tokens=1204, cache_creation=0 (cache hit)

Pricing Impact

For a typical session with 10K input tokens of stable prefix:

  • Without cache: 10,000 x $0.4/M = $0.004 per request
  • With explicit cache: 10,000 x $0.04/M = $0.0004 per request
  • Savings: ~90% on input tokens

Note

Implicit caching is NOT supported for qwen3.6-plus on International region, so explicit caching is the only way to get cache discounts.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat: Enable DashScope explicit context caching for alibaba provider [1 participants]