openclaw - 💡(How to fix) Fix [Bug]: Anthropic prompt cache busted on every turn: per-message metadata injected inside system block with cache_control [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75300Fetched 2026-05-01 05:35:34
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
2
Author
Timeline (top)
commented ×3labeled ×2closed ×1

Every message causes a full ~9,500-token cache write on the system block, making the system prompt cache ineffective despite cache_control: {"type":"ephemeral","ttl":"1h"}. Cost: ~$0.035/message (Claude Sonnet 4.6), ~$35/month at 30 msgs/day.

Root Cause

The "Conversation info" runtime context is appended inside the system string, which has a single cache_control covering all 33,682 chars: { "type": "text", "text": "...[full system prompt]...\n\nConversation info (untrusted metadata):\n{"message_id":"682","timestamp":"17:41 CDT"}", "cache_control": {"type":"ephemeral","ttl":"1h"} } message_id changes every message. timestamp changes every minute. Anthropic sees a different cache key → full system cache miss → 9,356-token write every turn.

Fix Action

Fix / Workaround

Affects every interactive channel (Telegram DMs, Slack, Discord, etc.) Does NOT affect hook sessions (Gmail hook, webhooks) or isolated cron sessions since those use a different context injection path Applies regardless of cacheRetention setting (short or long) — the cache key changes either way Workaround: None from user config. The cache_control TTL and placement are controlled entirely by OpenClaw internals.

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Summary

Every message causes a full ~9,500-token cache write on the system block, making the system prompt cache ineffective despite cache_control: {"type":"ephemeral","ttl":"1h"}. Cost: ~$0.035/message (Claude Sonnet 4.6), ~$35/month at 30 msgs/day.

Root cause

The "Conversation info" runtime context is appended inside the system string, which has a single cache_control covering all 33,682 chars: { "type": "text", "text": "...[full system prompt]...\n\nConversation info (untrusted metadata):\n{"message_id":"682","timestamp":"17:41 CDT"}", "cache_control": {"type":"ephemeral","ttl":"1h"} } message_id changes every message. timestamp changes every minute. Anthropic sees a different cache key → full system cache miss → 9,356-token write every turn.

Evidence (mitmproxy intercept, v2026.4.27 and v2026.4.29)

Request structure per turn:

  • Tools: 22 tools, 38,077 chars — all stable (MD5 hashes identical across turns)
  • System block: 33,682 chars, single cache_control on entire string
  • Messages: ~100-500 chars Anthropic console shows: cacheRead ≈ 10,638 tokens (tool definitions — correctly cached) cacheWrite ≈ 9,531 tokens (system block — busted every turn) The 9,531-token write is consistent across all message types, starts from zero on a fresh empty session, and does not shrink regardless of conversation length.

Relationship to prior fix

PR #20597 moved per-message IDs (message_id, sender_id) out of the messages array. The system block runtime context injection path was not covered. The same per-message fields are still appended to the system string before cache_control is applied.

Expected fix

Option A: Move "Conversation info" out of the system block into the messages array as a user-role block (consistent with what #20597 did elsewhere). Option B: Split the system block into two blocks — stable content with cache_control, volatile runtime context without.

Version

Confirmed present: v2026.4.27 and v2026.4.29 (checked changelog, no fix).

Steps to reproduce

Configure OpenClaw with an Anthropic model (e.g. anthropic/claude-sonnet-4-6) and a Telegram channel Set cacheRetention: "long" on the model (or leave unset — default "short" has the same issue) Send any message in Telegram Check the Anthropic console logs for the request — inspect "Cache Write (1h)" token count Send a second message Observe Cache Write token count again

Expected behavior

After the first message, the system prompt is written to Anthropic's cache. Subsequent messages with an identical system prompt hit the cache (Cache Read), with only a small Cache Write for the new conversation delta (~100–200 tokens per turn).

Actual behavior

Every message produces a Cache Write of ~9,500 tokens regardless of conversation history length or session age. The Cache Read stays constant at ~10,600 tokens (tool definitions only). The system prompt cache never achieves a hit.

Root cause: the "Conversation info" runtime context block — containing message_id and timestamp — is appended inside the system string that has the cache_control marker:

{ "type": "text", "text": "...[full system prompt, 33,682 chars]...\n\nConversation info (untrusted metadata):\n{"message_id":"682","timestamp":"17:41 CDT"}", "cache_control": {"type":"ephemeral","ttl":"1h"} } message_id changes on every message. timestamp changes every minute. Anthropic sees a different cache key → system cache miss → 9,356-token write every turn → ~$0.035 wasted per message (at Sonnet 4.6 pricing).

OpenClaw version

2026.4.29

Operating system

WSL2

Install method

npm global

Model

claude-sonnet-4.5

Provider / routing chain

openclaw -> anthropic

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Severity: High — affects all users running Anthropic models on any channel (Telegram, Slack, Discord, etc.) with per-message context injection enabled.

Financial impact:

~$0.035 wasted per message (9,356 tokens × $3.75/MTok, Claude Sonnet 4.6) At 30 messages/day: ~$1.05/day, ~$32/month Scales with model pricing — worse on Claude Opus, same pattern on all Anthropic models Performance impact:

System prompt (~33k chars, ~9,356 tokens) is reprocessed from scratch on every turn instead of being served from cache This adds latency proportional to the uncached system prompt size on every message Scope:

Affects every interactive channel (Telegram DMs, Slack, Discord, etc.) Does NOT affect hook sessions (Gmail hook, webhooks) or isolated cron sessions since those use a different context injection path Applies regardless of cacheRetention setting (short or long) — the cache key changes either way Workaround: None from user config. The cache_control TTL and placement are controlled entirely by OpenClaw internals.

Additional information

No response

extent analysis

TL;DR

Move the "Conversation info" runtime context out of the system block to fix the cache miss issue.

Guidance

  • Identify the "Conversation info" block in the system string and determine how to separate it from the stable content.
  • Consider implementing Option A: move "Conversation info" into the messages array as a user-role block.
  • Alternatively, explore Option B: split the system block into two blocks, one with stable content and cache_control, and another with volatile runtime context.
  • Verify the fix by checking the Anthropic console logs for Cache Write token count after sending multiple messages.

Example

No code snippet is provided as the issue does not include specific code details.

Notes

The fix may require modifications to the OpenClaw internals, and the exact implementation details are not specified in the issue.

Recommendation

Apply workaround by moving the "Conversation info" block out of the system string, as this is the most likely cause of the cache miss issue. This change should help reduce the Cache Write token count and improve performance.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After the first message, the system prompt is written to Anthropic's cache. Subsequent messages with an identical system prompt hit the cache (Cache Read), with only a small Cache Write for the new conversation delta (~100–200 tokens per turn).

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Anthropic prompt cache busted on every turn: per-message metadata injected inside system block with cache_control [3 comments, 2 participants]