openclaw - 💡(How to fix) Fix Anthropic 1h cache invalidates conversation prefix every turn (two independent mechanisms)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

OpenClaw's Anthropic prompt-cache is invalidated on every turn for the conversation-history layer. The bootstrap cache (system + tools) works correctly, but the second cache_control breakpoint anchored at the latest user message fails because the previously-latest user/tool_result block changes between turns. Two independent mechanisms cause this; each is individually sufficient. Long sessions on Anthropic see 50-80k tokens written to 1h cache every turn instead of being read from cache, with the corresponding cost impact.

Fix Action

Fix / Workaround

A small interceptor patched into the bundled Anthropic payload policy module captures per-block SHA-256-truncated hashes plus head/tail snippets of every outbound payload, in two stages: before (raw input to applyAnthropicPayloadPolicyToParams) and after (post-policy, what is actually shipped to Anthropic).

Why this can't be fixed by a one-line patch

A patch that skips the strip on replay (Mechanism 1) would only fix fingerprint A (and the rare fingerprint D). Fingerprints B and C (cache_control field movement alone, accounting for ~21 of 52 problematic pairs in this sample) would still invalidate the prefix because the field-movement byte-shift is independent of the strip.

RAW_BUFFERClick to expand / collapse

Anthropic prompt-cache invalidated every turn by inbound-metadata strip + cache_control field movement

TL;DR

OpenClaw's Anthropic prompt-cache is invalidated on every turn for the conversation-history layer. The bootstrap cache (system + tools) works correctly, but the second cache_control breakpoint anchored at the latest user message fails because the previously-latest user/tool_result block changes between turns. Two independent mechanisms cause this; each is individually sufficient. Long sessions on Anthropic see 50-80k tokens written to 1h cache every turn instead of being read from cache, with the corresponding cost impact.

Environment

  • OpenClaw v2026.5.x (reproduced on v2026.5.7 and v2026.5.20)
  • Anthropic primary model (claude-opus / claude-sonnet)
  • Pi-AI harness: @earendil-works/pi-ai 0.75.4
  • Channel: external (Telegram), but mechanism is channel-agnostic

Symptom observed from Anthropic Console

Across long consecutive sessions, every turn writes 50-80k tokens to cache_creation_input_tokens (1h TTL) and reads only the bootstrap (~60-70k tokens) from cache_read_input_tokens. In a properly working prompt-cache scenario consecutive turns of the same session should read MOST of the prefix from cache and write only ~1-2k new tokens per turn (the new user message + the assistant's prior reply).

Single isolated turns (e.g., a heartbeat that doesn't follow a recent turn) behave correctly — only the new content is written.

Reproduction & forensics method

A small interceptor patched into the bundled Anthropic payload policy module captures per-block SHA-256-truncated hashes plus head/tail snippets of every outbound payload, in two stages: before (raw input to applyAnthropicPayloadPolicyToParams) and after (post-policy, what is actually shipped to Anthropic).

Captures across two days of normal usage (~125 outbound payloads) were analyzed. For each consecutive turn-pair within the same session, the first message-block index where the hashes diverge was identified, along with the byte-level character of the divergence.

Findings

What is stable (good)

  • The system prompt block is byte-identical across all 66 consecutive main-session turns analyzed (single hash, length 105,314 chars).
  • The tools array is byte-identical across the same window.
  • These two together form the "bootstrap" that does get cached and read successfully every turn (~68k tokens).

What is unstable (the bug)

The message-history portion of the prefix flips at every turn. Across 54 consecutive turn-pairs:

FingerprintPairsDescriptionByte delta
A21user message with Conversation info inbound-meta block was the LAST user message in turn N. In turn N+1 (now no longer last), the same logical message appears stripped: Conversation info block removed, cache_control field removed.-498 bytes
B11tool_result block was the LAST in turn N. In turn N+1, content is byte-identical EXCEPT the cache_control field has been removed.-48 bytes
C10user message with Conversation info block was last in turn N. In turn N+1, the Conversation info block is still present but cache_control has been removed.-48 bytes
D10Diff is further back in history (rel_pos ≤ -2), with info → ctx wrapper changes affecting OLDER messages too. Mostly -450 bytes.-450 bytes
E1Massive content change of -279,449 bytes — likely a compaction/pruning event.(one-off)

Fingerprints A, B, C all share the same root: the boundary between "latest" and "historical" position changes per turn, and the runtime rewrites blocks crossing that boundary.

Mechanism 1: inbound-metadata strip on replay

buildInboundUserContextPrefix() in the reply pipeline prepends a multi-line metadata block (Conversation info (untrusted metadata): + JSON, Sender (untrusted metadata): + JSON, optional reply/forwarded/thread context, etc.) to the incoming user message text BEFORE it is persisted to the session transcript.

sanitizeSessionText() in the engine, called during transcript replay, invokes stripInboundMetadataForUserRole()stripInboundMetadata() to remove those metadata blocks from historical user messages before sending the prompt to the model. The source comment for this function states:

These envelopes must be removed BEFORE normalization, because stripInboundMetadata relies on newline structure and fenced json code fences to locate sentinels.

The strip is correct policy-wise (channel metadata shouldn't accumulate in historical replay), but the side-effect is that every historical user message is byte-shifted by ~450 bytes between (a) the turn where it was the latest message (stored as-decorated) and (b) any future turn (stripped on replay).

Mechanism 2: cache_control field movement

The payload policy attaches cache_control: {type: ephemeral, ttl: 1h} to the LAST user-or-tool-result content block on each request, as the second cache_control breakpoint after the system+tools one. When a new user/tool_result block arrives next turn, the breakpoint moves to the new last block, and is removed from the previously-last.

A removed cache_control field is a 48-byte JSON-serialized difference (,"cache_control":{"type":"ephemeral","ttl":"1h"}). 48 bytes is enough to break Anthropic's prefix-byte-identical cache match.

Critically, Mechanism 2 fires on every single turn, including tool-only turns where Mechanism 1 does not apply (e.g., the previously-last block was a tool_result, which has no inbound-meta to strip — yet the hash still flips).

Why bootstrap caches but conversation history does not

Anthropic supports up to 4 cache_control breakpoints. OpenClaw uses two: one at end-of-system-and-tools (the "bootstrap" anchor), and one at the latest user/tool_result block. The bootstrap anchor sits upstream of both rewrite mechanisms — it never changes between turns of the same session — so it caches and reads correctly. The latest-user anchor's prefix (messages 0..N-1) is invalidated every turn by one or both mechanisms, so it always misses on the lookup and always writes new.

This matches the Console observation exactly:

  • cache_read_input_tokens ≈ bootstrap size, every turn ✅ (system+tools layer caches)
  • cache_creation_input_tokens ≈ message-history size, every turn ❌ (conversation layer never reuses prior cache)

Why this can't be fixed by a one-line patch

A patch that skips the strip on replay (Mechanism 1) would only fix fingerprint A (and the rare fingerprint D). Fingerprints B and C (cache_control field movement alone, accounting for ~21 of 52 problematic pairs in this sample) would still invalidate the prefix because the field-movement byte-shift is independent of the strip.

Conversely, a patch that removes the latest-user cache_control breakpoint entirely would fix fingerprints B and C but would also remove the only intentional cache anchor for conversation history (giving up the feature instead of fixing it). Fingerprint A would still cause prefix shifts on each turn even without an active breakpoint.

Possible directions (not prescribing — leaving to maintainers)

  1. Stable cache_control anchor. Place the second cache_control breakpoint on a synthetic, position-stable marker block (e.g., an empty assistant message or boundary block inserted by the runtime) that does not move between turns. The conversation prefix up to that marker would become cache-stable across turns.
  2. Store-stripped + envelope-sidecar. Persist user messages WITHOUT the inbound-metadata decoration. On the active turn, attach the metadata as a separate ephemeral block (placed after the cache_control breakpoint, so it doesn't poison the cached prefix). Historical replay becomes byte-stable because nothing needs stripping.
  3. Single-anchor mode option. Provide a config flag to use only the bootstrap cache anchor and accept that conversation history is never cached at the 1h tier. Cleaner than the current half-broken state; lets operators choose.

Direction (2) seems closest to the existing design intent (channel metadata as current-turn AI context only) while not poisoning the prefix.

Evidence package available

I can provide:

  • Anonymized aggregate hash-flip statistics across ~125 captured payloads
  • The interceptor patch source (small wrapper around applyAnthropicPayloadPolicyToParams)
  • Example before/after head/tail snippets demonstrating each fingerprint (sanitized of message content)

Happy to share these as a follow-up comment if useful.

Cost impact (order-of-magnitude estimate)

For a session running roughly 50 turns over a few hours with around 70k tokens of conversation history, every turn writes ~70k tokens at the 1h cache_write rate that should instead have been read at the cache_read rate. On Anthropic claude-opus pricing, 1h cache_write costs about 2x base input and cache_read costs about 0.1x base input, so the per-turn overhead is roughly 20x what a working cache would cost for the conversation layer. Multiplied across 50 turns, that is a substantial, repeated overhead on every long session. Operators running OpenClaw against external channels with persistent sessions and Anthropic primary will see this as a multiple-of-bill effect.


Filed by an operator running OpenClaw against Telegram with Anthropic claude-opus primary. Investigation done with a local payload-policy interceptor over a 24h capture window. All personal context, message content, account identifiers, and workspace paths have been excluded from this report.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING