openclaw - ✅(Solved) Fix Bug: dropThinkingBlocks breaks prompt cache on Claude Opus 4.5+ / Sonnet 4.5+ [1 pull requests, 5 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#61793Fetched 2026-04-08 02:54:26
View on GitHub
Comments
5
Participants
1
Timeline
11
Reactions
0
Author
Participants
Timeline (top)
commented ×5referenced ×4closed ×1cross-referenced ×1

OpenClaw unconditionally drops thinking blocks from prior assistant turns for all Claude models (dropThinkingBlocks: true). This was correct for Claude Sonnet 3.7 and earlier, but breaks prompt caching on Claude Opus 4.5+, Opus 4.6, Sonnet 4.5+, and Sonnet 4.6.

Root Cause

In transcript policy resolution:

...isAnthropic && modelId.includes("claude") ? { dropThinkingBlocks: true } : {},

This applies to all Claude models without version discrimination.

The dropThinkingBlocks() function strips thinking blocks from all assistant messages except the latest one. When these blocks are removed, the token sequence changes between turns, breaking Anthropic's prefix-based cache matching.

Fix Action

Fixed

PR fix notes

PR #61797: fix: preserve thinking blocks for Claude Opus 4.5+/Sonnet 4.5+ to fix prompt cache [AI-assisted]

Description (problem / solution / changelog)

Summary

Fixes #61793

Claude Opus 4.5+ and Sonnet 4.5+ preserve thinking blocks in model context by default (docs). The current code unconditionally drops thinking blocks from prior assistant turns for all Claude models, which was correct for Sonnet 3.7 but breaks prompt caching on newer models.

Problem

When thinking blocks are dropped between turns, the token sequence changes, breaking Anthropic's prefix-based cache matching. Combined with the 20-block lookback window, accumulated drops can cause complete cache misses (cacheRead = 0).

This was introduced in #44843 (2026-03-13) which extended dropThinkingBlockModelHints: ["claude"] from only github-copilot to also anthropic and amazon-bedrock. The original fix was for GitHub Copilot's proxy limitations, but was incorrectly generalized to native Anthropic API where newer models natively support thinking block preservation.

Changes

  • src/plugins/provider-replay-helpers.ts: Added exported shouldPreserveThinkingBlocks() that returns true for Opus 4.5+, Sonnet 4+, Haiku 4+, and future models. Updated buildAnthropicReplayPolicyForModel, buildNativeAnthropicReplayPolicyForModel, and buildHybridAnthropicOrOpenAIReplayPolicy to use it.
  • src/agents/transcript-policy.ts: Imports shouldPreserveThinkingBlocks from provider-replay-helpers (no duplication) for the unowned-provider fallback path.
  • Tests: Updated and added tests across 3 test files for all affected functions, covering both legacy (drop) and modern (preserve) models.

Model Version Behavior

ModeldropThinkingBlocks
claude-3-7-sonnet✅ true (drop)
claude-3-5-sonnet✅ true (drop)
claude-opus-4-5❌ false (preserve)
claude-opus-4-6❌ false (preserve)
claude-sonnet-4-x❌ false (preserve)
claude-haiku-4-x❌ false (preserve)
claude-5-x+ (future)❌ false (preserve)

Testing

All verification gates passed locally (macOS, Node 22, pnpm):

  • pnpm build: Passed (no [INEFFECTIVE_DYNAMIC_IMPORT] warnings)
  • pnpm check: Passed (0 warnings, 0 errors — includes tsgo, oxlint, format, and policy checks)
  • pnpm test (scoped to 3 changed test files): 36 tests passed, 0 failures
  • Tested manually on live Opus 4.6 session — observed cache miss pattern before fix matches documented behavior

Deployment Verification

Successfully deployed and running on a personal VPS (OpenClaw 2026.4.6) with the fix applied. Gateway is operating normally with Claude Opus 4.6 + thinking: high.

Pre-fix cache data (68 turns, OpenClaw 2026.4.5)

TurninputcacheReadΔreadcacheWriteoutputthinkingtoolnote
030033,721293🔧
1133,721+33,72133,929272
2333,721032,97981⚠️ flat after thinking
3333,721033,241142⚠️ flat after thinking
4333,721033,701847⚠️ flat after thinking
5333,721034,78837🔧⚠️ flat after thinking
6133,721034,438364
7368,159+34,438737788🔧
17185,293+1591,672308✅ (no thinking)
18386,965+1,6726581,325
19387,623+6581,69334
20387,62301,102178⚠️ flat after thinking
21388,725+1,102551507
22388,72501,351676⚠️ flat after thinking
23388,72502,0613,276⚠️ flat after thinking
24388,72505,3241,494⚠️ flat after thinking
25388,72504,2701,612⚠️ flat after thinking
26388,72505,027503⚠️ flat after thinking
27388,72504,621721⚠️ flat after thinking
28388,72505,4751,359⚠️ flat after thinking
29388,72506,845665⚠️ flat after thinking
30388,72506,90328⚠️ flat after thinking

Pre-fix summary: 332,551 tokens of wasted cacheWrite across 68 turns. 10-turn plateau at 88,725 (Δread=0) during consecutive thinking turns.

Post-fix cache data (14 turns, OpenClaw 2026.4.6)

TurninputcacheReadΔreadcacheWriteoutputthinkingtoolcr(N) = cr(N-1)+cw(N-1)?
75300152,88837🔧— (cold start after upgrade)
761152,888+152,888231707✅ exact
773153,119+2311,043297✅ exact
783154,162+1,043651348✅ exact
793154,813+651722745✅ exact
803155,535+7221,123854🔧✅ exact
811156,658+1,1231,357348✅ exact
823158,015+1,357720431✅ exact
833158,735+72080198🔧✅ exact
841159,536+8014,176191✅ exact
853163,712+4,176581292✅ exact
863164,293+5816341,394🔧✅ exact
871164,927+6342,15872✅ exact
883167,085+2,1584101,457🔧✅ exact

Post-fix summary:

  • 13/13 turns show exact cacheRead(N) = cacheRead(N-1) + cacheWrite(N-1)
  • 0 wasted cache writes — every write is read next turn
  • Thinking turns (✅) now grow cache normally instead of causing Δread=0 plateaus
MetricPre-fix (68 turns)Post-fix (14 turns)
Thinking turns with Δread=0140
Wasted cacheWrite332,551 tokens0 tokens
Exact cr(N)=cr(N-1)+cw(N-1)rare13/13

AI Disclosure

  • Mark as AI-assisted in the PR title or description

  • Note the degree of testing (untested / lightly tested / fully tested)

  • Confirm you understand what the code does

  • Resolve or reply to bot review conversations after you address them

  • AI-assisted: PR authored with Claude Opus 4.6 via OpenClaw, reviewed by GPT-5.4 subagent and Greptile bot

  • Testing level: Fully tested — pnpm build, pnpm check, and scoped pnpm test all pass; manual verification on live session

  • Understanding: Fully understood — root cause traced from observed cache metrics through OpenClaw source to Anthropic API docs

Changed files

  • src/agents/transcript-policy.test.ts (modified, +26/-0)
  • src/agents/transcript-policy.ts (modified, +4/-1)
  • src/plugin-sdk/provider-model-shared.test.ts (modified, +11/-3)
  • src/plugins/provider-replay-helpers.test.ts (modified, +55/-3)
  • src/plugins/provider-replay-helpers.ts (modified, +39/-3)

Code Example

...isAnthropic && modelId.includes("claude") ? { dropThinkingBlocks: true } : {},
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw unconditionally drops thinking blocks from prior assistant turns for all Claude models (dropThinkingBlocks: true). This was correct for Claude Sonnet 3.7 and earlier, but breaks prompt caching on Claude Opus 4.5+, Opus 4.6, Sonnet 4.5+, and Sonnet 4.6.

Root Cause

In transcript policy resolution:

...isAnthropic && modelId.includes("claude") ? { dropThinkingBlocks: true } : {},

This applies to all Claude models without version discrimination.

The dropThinkingBlocks() function strips thinking blocks from all assistant messages except the latest one. When these blocks are removed, the token sequence changes between turns, breaking Anthropic's prefix-based cache matching.

Evidence

Cache read pattern from a real Opus 4.6 session with thinking: high:

TurncacheReadΔreadPattern
Thinking turn38,323+257✅ normal
Next turn38,3230⚠️ FLAT — prev thinking was dropped
After that38,923+600✅ recovered

Every turn following a thinking-containing response shows a cache miss (Δread = 0), because the thinking block was present when cache was written but absent when replayed next turn.

In one case, cacheRead dropped to 0 (complete miss, ~57k tokens re-written) — the entire prefix failed to match.

What Anthropic's Docs Say

From Extended Thinking docs:

Starting with Claude Opus 4.5 (and continuing in Claude Opus 4.6), thinking blocks from previous assistant turns are preserved in model context by default. This differs from earlier models, which remove thinking blocks from prior turns.

Benefits of thinking block preservation:

  • Cache optimization: preserved thinking blocks enable cache hits
  • No intelligence impact

Proposed Fix

Condition dropThinkingBlocks on the model version:

  • Drop for claude-3-7-sonnet and earlier models
  • Preserve for claude-opus-4-5, claude-opus-4-6, claude-sonnet-4-5, claude-sonnet-4-6, claude-haiku-4-5, and later

Impact

  • Cost: Every thinking turn causes a full cache re-write instead of a cache read, multiplying input token costs
  • Latency: Cache misses increase TTFT
  • Severity: High for heavy thinking users (thinking: high on Opus 4.6)

Environment

  • OpenClaw 2026.4.5 (3e72c03)
  • Model: anthropic/claude-opus-4-6
  • Direct Anthropic API (api.anthropic.com)
  • 1h cache TTL

extent analysis

TL;DR

Conditionally preserve thinking blocks based on the Claude model version to fix cache misses and optimize performance.

Guidance

  • Update the transcript policy resolution to conditionally set dropThinkingBlocks based on the model version, preserving thinking blocks for Claude Opus 4.5+, Opus 4.6, Sonnet 4.5+, and Sonnet 4.6.
  • Verify the fix by monitoring cache read patterns and token sequence changes between turns to ensure that thinking blocks are preserved correctly.
  • Test the updated implementation with different model versions to confirm that cache optimization and prefix-based cache matching work as expected.
  • Review the Anthropic documentation for any additional guidance on handling thinking blocks and cache optimization for different model versions.

Example

const modelVersion = getModelVersion(modelId);
const dropThinkingBlocks = modelVersion <= 'claude-3-7-sonnet';

Note: This example assumes a getModelVersion function that extracts the model version from the modelId.

Notes

The proposed fix relies on accurately determining the model version and updating the transcript policy resolution accordingly. Ensure that the model version detection is correct and handles different model versions as expected.

Recommendation

Apply the workaround by conditionally preserving thinking blocks based on the model version, as this should fix the cache misses and optimize performance for Claude Opus 4.5+, Opus 4.6, Sonnet 4.5+, and Sonnet 4.6 models.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING