openclaw - ✅(Solved) Fix Bug: dropThinkingBlocks breaks prompt cache on Claude Opus 4.5+ / Sonnet 4.5+ [1 pull requests, 5 comments, 1 participants]

qinyao-he · 2026-04-06T09:40:00Z

[openclaw] OpenClaw unconditionally drops thinking blocks from prior assistant turns for all Claude models dropThinkingBlocks: true . This was correct for Clau… OpenClaw unconditionally drops thinking blocks from prior assistant turns for all Claude models (`dropThinkingBlocks: true`). This was correct for Claude Sonnet 3.7 and earlier, but **breaks prompt caching on Claude Opus 4.5+, Opus 4.6, Sonnet 4.5+, and Sonnet 4.6**. # PR #61797: fix: preserve thinking blocks for Claude Opus 4.5+/Sonnet 4.5+ to fix prompt cache [AI-assisted] - Repository: openclaw/openclaw - Author: qinyao-he - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/61797 ## Description (problem / solution / changelog) ## Summary Fixes #61793 Claude Opus 4.5+ and Sonnet 4.5+ preserve thinking blocks in model context by default ([docs](https://platform.claude.com/docs/en/build-with-claude/extended-thinking#differences-in-thinking-across-model-versions)). The current code unconditionally drops thinking blocks from prior assistant turns for all Claude models, which was correct for Sonnet 3.7 but **breaks prompt caching** on newer models. ## Problem When thinking blocks are dropped between turns, the token sequence changes, breaking Anthropic's prefix-based cache matching. Combined with the [20-block lookback window](https://platform.claude.com/docs/en/build-with-claude/prompt-caching), accumulated drops can cause complete cache misses (`cacheRead = 0`). This was introduced in #44843 (2026-03-13) which extended `dropThinkingBlockModelHints: ["claude"]` from only `github-copilot` to also `anthropic` and `amazon-bedrock`. The original fix was for GitHub Copilot's proxy limitations, but was incorrectly generalized to native Anthropic API where newer models natively support thinking block preservation. ## Changes - **`src/plugins/provider-replay-helpers.ts`**: Added exported `shouldPreserveThinkingBlocks()` that returns `true` for Opus 4.5+, Sonnet 4+, Haiku 4+, and future models. Updated `buildAnthropicReplayPolicyForModel`, `buildNativeAnthropicReplayPolicyForModel`, and `buildHybridAnthropicOrOpenAIReplayPolicy` to use it. - **`src/agents/transcript-policy.ts`**: Imports `shouldPreserveThinkingBlocks` from provider-replay-helpers (no duplication) for the unowned-provider fallback path. - **Tests**: Updated and added tests across 3 test files for all affected functions, covering both legacy (drop) and modern (preserve) models. ## Model Version Behavior | Model | dropThinkingBlocks | |-------|-------------------| | claude-3-7-sonnet | ✅ true (drop) | | claude-3-5-sonnet | ✅ true (drop) | | claude-opus-4-5 | ❌ false (preserve) | | claude-opus-4-6 | ❌ false (preserve) | | claude-sonnet-4-x | ❌ false (preserve) | | claude-haiku-4-x | ❌ false (preserve) | | claude-5-x+ (future) | ❌ false (preserve) | ## Testing All verification gates passed locally (macOS, Node 22, pnpm): - **`pnpm build`**: Passed (no `[INEFFECTIVE_DYNAMIC_IMPORT]` warnings) - **`pnpm check`**: Passed (0 warnings, 0 errors — includes tsgo, oxlint, format, and policy checks) - **`pnpm test`** (scoped to 3 changed test files): **36 tests passed, 0 failures** - Tested manually on live Opus 4.6 session — observed cache miss pattern before fix matches documented behavior ## Deployment Verification Successfully deployed and running on a personal VPS (OpenClaw 2026.4.6) with the fix applied. Gateway is operating normally with Claude Opus 4.6 + thinking: high. ### Pre-fix cache data (68 turns, OpenClaw 2026.4.5) | Turn | input | cacheRead | Δread | cacheWrite | output | thinking | tool | note | |-----:|------:|----------:|------:|-----------:|-------:|:--------:|:----:|------| | 0 | 3 | 0 | 0 | 33,721 | 293 | ✅ | 🔧 | | | 1 | 1 | 33,721 | +33,721 | 33,929 | 272 | ✅ | | | | 2 | 3 | 33,721 | 0 | 32,979 | 81 | ✅ | | ⚠️ flat after thinking | | 3 | 3 | 33,721 | 0 | 33,241 | 142 | ✅ | | ⚠️ flat after thinking | | 4 | 3 | 33,721 | 0 | 33,701 | 847 | ✅ | | ⚠️ flat after thinking | | 5 | 3 | 33,721 | 0 | 34,788 | 37 | | 🔧 | ⚠️ flat after thinking | | 6 | 1 | 33,721 | 0 | 34,438 | 364 | ✅ | | | | 7 | 3 | 68,159 | +34,438 | 737 | 788 | ✅ | 🔧 | | | 17 | 1 | 85,293 | +159 | 1,672 | 308 | | | ✅ (no thinking) | | 18 | 3 | 86,965 | +1,672 | 658 | 1,325 | ✅ | | ✅ | | 19 | 3 | 87,623 | +658 | 1,693 | 34 | | | ✅ | | 20 | 3 | 87,623 | 0 | 1,102 | 178 | ✅ | | ⚠️ flat after thinking | | 21 | 3 | 88,725 | +1,102 | 551 | 507 | ✅ | | | | 22 | 3 | 88,725 | 0 | 1,351 | 676 | ✅ | | ⚠️ flat after thinking | | 23 | 3 | 88,725 | 0 | 2,061 | 3,276 | ✅ | | ⚠️ flat after thinking | | 24 | 3 | 88,725 | 0 | 5,324 | 1,494 | ✅ | | ⚠️ flat after thinking | | 25 | 3 | 88,725 | 0 | 4,270 | 1,612 | ✅ | | ⚠️ flat after thinking | | 26 | 3 | 88,725 | 0 | 5,027 | 503 | ✅ | | ⚠️ flat after thinking | | 27 | 3 | 88,725 | 0 | 4,621 | 721 | ✅ | | ⚠️ flat after thinking | | 28 | 3 | 88,725 | 0 | 5,475 | 1,359 | ✅ | | ⚠️ flat after thinking | | 29 | 3 | 88,725 | 0 | 6,845 | 665 | ✅ | | ⚠️ flat after thinking

openclaw2026-04-06 09:40:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#61793•Fetched 2026-04-08 02:54:26

View on GitHub

Comments

Participants

Timeline

Reactions

Author

qinyao-he

Participants

qinyao-he

Timeline (top)

commented ×5referenced ×4closed ×1cross-referenced ×1

OpenClaw unconditionally drops thinking blocks from prior assistant turns for all Claude models (dropThinkingBlocks: true). This was correct for Claude Sonnet 3.7 and earlier, but breaks prompt caching on Claude Opus 4.5+, Opus 4.6, Sonnet 4.5+, and Sonnet 4.6.

Root Cause

In transcript policy resolution:

...isAnthropic && modelId.includes("claude") ? { dropThinkingBlocks: true } : {},

This applies to all Claude models without version discrimination.

The dropThinkingBlocks() function strips thinking blocks from all assistant messages except the latest one. When these blocks are removed, the token sequence changes between turns, breaking Anthropic's prefix-based cache matching.

Fix Action

Fixed

Fixed by PR: fix: preserve thinking blocks for Claude Opus 4.5+/Sonnet 4.5+ to fix prompt cache [AI-assisted] (https://github.com/openclaw/openclaw/pull/61797)

PR fix notes

PR #61797: fix: preserve thinking blocks for Claude Opus 4.5+/Sonnet 4.5+ to fix prompt cache [AI-assisted]

Repository: openclaw/openclaw
Author: qinyao-he
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/61797

Description (problem / solution / changelog)

Summary

Fixes #61793

Claude Opus 4.5+ and Sonnet 4.5+ preserve thinking blocks in model context by default (docs). The current code unconditionally drops thinking blocks from prior assistant turns for all Claude models, which was correct for Sonnet 3.7 but breaks prompt caching on newer models.

Problem

When thinking blocks are dropped between turns, the token sequence changes, breaking Anthropic's prefix-based cache matching. Combined with the 20-block lookback window, accumulated drops can cause complete cache misses (cacheRead = 0).

This was introduced in #44843 (2026-03-13) which extended dropThinkingBlockModelHints: ["claude"] from only github-copilot to also anthropic and amazon-bedrock. The original fix was for GitHub Copilot's proxy limitations, but was incorrectly generalized to native Anthropic API where newer models natively support thinking block preservation.

Changes

src/plugins/provider-replay-helpers.ts: Added exported shouldPreserveThinkingBlocks() that returns true for Opus 4.5+, Sonnet 4+, Haiku 4+, and future models. Updated buildAnthropicReplayPolicyForModel, buildNativeAnthropicReplayPolicyForModel, and buildHybridAnthropicOrOpenAIReplayPolicy to use it.
src/agents/transcript-policy.ts: Imports shouldPreserveThinkingBlocks from provider-replay-helpers (no duplication) for the unowned-provider fallback path.
Tests: Updated and added tests across 3 test files for all affected functions, covering both legacy (drop) and modern (preserve) models.

Model Version Behavior

Model	dropThinkingBlocks
claude-3-7-sonnet	✅ true (drop)
claude-3-5-sonnet	✅ true (drop)
claude-opus-4-5	❌ false (preserve)
claude-opus-4-6	❌ false (preserve)
claude-sonnet-4-x	❌ false (preserve)
claude-haiku-4-x	❌ false (preserve)
claude-5-x+ (future)	❌ false (preserve)

Testing

All verification gates passed locally (macOS, Node 22, pnpm):

pnpm build: Passed (no [INEFFECTIVE_DYNAMIC_IMPORT] warnings)
pnpm check: Passed (0 warnings, 0 errors — includes tsgo, oxlint, format, and policy checks)
pnpm test (scoped to 3 changed test files): 36 tests passed, 0 failures
Tested manually on live Opus 4.6 session — observed cache miss pattern before fix matches documented behavior

Deployment Verification

Successfully deployed and running on a personal VPS (OpenClaw 2026.4.6) with the fix applied. Gateway is operating normally with Claude Opus 4.6 + thinking: high.

Pre-fix cache data (68 turns, OpenClaw 2026.4.5)

Turn	input	cacheRead	Δread	cacheWrite	output	thinking	tool	note
0	3	0	0	33,721	293	✅	🔧
1	1	33,721	+33,721	33,929	272	✅
2	3	33,721	0	32,979	81	✅		⚠️ flat after thinking
3	3	33,721	0	33,241	142	✅		⚠️ flat after thinking
4	3	33,721	0	33,701	847	✅		⚠️ flat after thinking
5	3	33,721	0	34,788	37		🔧	⚠️ flat after thinking
6	1	33,721	0	34,438	364	✅
7	3	68,159	+34,438	737	788	✅	🔧
17	1	85,293	+159	1,672	308			✅ (no thinking)
18	3	86,965	+1,672	658	1,325	✅		✅
19	3	87,623	+658	1,693	34			✅
20	3	87,623	0	1,102	178	✅		⚠️ flat after thinking
21	3	88,725	+1,102	551	507	✅
22	3	88,725	0	1,351	676	✅		⚠️ flat after thinking
23	3	88,725	0	2,061	3,276	✅		⚠️ flat after thinking
24	3	88,725	0	5,324	1,494	✅		⚠️ flat after thinking
25	3	88,725	0	4,270	1,612	✅		⚠️ flat after thinking
26	3	88,725	0	5,027	503	✅		⚠️ flat after thinking
27	3	88,725	0	4,621	721	✅		⚠️ flat after thinking
28	3	88,725	0	5,475	1,359	✅		⚠️ flat after thinking
29	3	88,725	0	6,845	665	✅		⚠️ flat after thinking
30	3	88,725	0	6,903	28			⚠️ flat after thinking

Pre-fix summary: 332,551 tokens of wasted cacheWrite across 68 turns. 10-turn plateau at 88,725 (Δread=0) during consecutive thinking turns.

Post-fix cache data (14 turns, OpenClaw 2026.4.6)

Turn	input	cacheRead	Δread	cacheWrite	output	thinking	tool	cr(N) = cr(N-1)+cw(N-1)?
75	3	0	0	152,888	37		🔧	— (cold start after upgrade)
76	1	152,888	+152,888	231	707	✅		✅ exact
77	3	153,119	+231	1,043	297	✅		✅ exact
78	3	154,162	+1,043	651	348	✅		✅ exact
79	3	154,813	+651	722	745	✅		✅ exact
80	3	155,535	+722	1,123	854		🔧	✅ exact
81	1	156,658	+1,123	1,357	348			✅ exact
82	3	158,015	+1,357	720	431	✅		✅ exact
83	3	158,735	+720	801	98		🔧	✅ exact
84	1	159,536	+801	4,176	191			✅ exact
85	3	163,712	+4,176	581	292	✅		✅ exact
86	3	164,293	+581	634	1,394		🔧	✅ exact
87	1	164,927	+634	2,158	72			✅ exact
88	3	167,085	+2,158	410	1,457		🔧	✅ exact

Post-fix summary:

13/13 turns show exact cacheRead(N) = cacheRead(N-1) + cacheWrite(N-1)
0 wasted cache writes — every write is read next turn
Thinking turns (✅) now grow cache normally instead of causing Δread=0 plateaus

Metric	Pre-fix (68 turns)	Post-fix (14 turns)
Thinking turns with Δread=0	14	0
Wasted cacheWrite	332,551 tokens	0 tokens
Exact cr(N)=cr(N-1)+cw(N-1)	rare	13/13

AI Disclosure

Mark as AI-assisted in the PR title or description
Note the degree of testing (untested / lightly tested / fully tested)
Confirm you understand what the code does
Resolve or reply to bot review conversations after you address them
AI-assisted: PR authored with Claude Opus 4.6 via OpenClaw, reviewed by GPT-5.4 subagent and Greptile bot
Testing level: Fully tested — pnpm build, pnpm check, and scoped pnpm test all pass; manual verification on live session
Understanding: Fully understood — root cause traced from observed cache metrics through OpenClaw source to Anthropic API docs

Changed files

src/agents/transcript-policy.test.ts (modified, +26/-0)
src/agents/transcript-policy.ts (modified, +4/-1)
src/plugin-sdk/provider-model-shared.test.ts (modified, +11/-3)
src/plugins/provider-replay-helpers.test.ts (modified, +55/-3)
src/plugins/provider-replay-helpers.ts (modified, +39/-3)

Code Example

...isAnthropic && modelId.includes("claude") ? { dropThinkingBlocks: true } : {},

RAW_BUFFERClick to expand / collapse

Summary

Root Cause

In transcript policy resolution:

...isAnthropic && modelId.includes("claude") ? { dropThinkingBlocks: true } : {},

This applies to all Claude models without version discrimination.

Evidence

Cache read pattern from a real Opus 4.6 session with thinking: high:

Turn	cacheRead	Δread	Pattern
Thinking turn	38,323	+257	✅ normal
Next turn	38,323	0	⚠️ FLAT — prev thinking was dropped
After that	38,923	+600	✅ recovered

Every turn following a thinking-containing response shows a cache miss (Δread = 0), because the thinking block was present when cache was written but absent when replayed next turn.

In one case, cacheRead dropped to 0 (complete miss, ~57k tokens re-written) — the entire prefix failed to match.

What Anthropic's Docs Say

From Extended Thinking docs:

Starting with Claude Opus 4.5 (and continuing in Claude Opus 4.6), thinking blocks from previous assistant turns are preserved in model context by default. This differs from earlier models, which remove thinking blocks from prior turns.

Benefits of thinking block preservation:

Cache optimization: preserved thinking blocks enable cache hits

No intelligence impact

Proposed Fix

Condition dropThinkingBlocks on the model version:

Drop for claude-3-7-sonnet and earlier models
Preserve for claude-opus-4-5, claude-opus-4-6, claude-sonnet-4-5, claude-sonnet-4-6, claude-haiku-4-5, and later

Impact

Cost: Every thinking turn causes a full cache re-write instead of a cache read, multiplying input token costs
Latency: Cache misses increase TTFT
Severity: High for heavy thinking users (thinking: high on Opus 4.6)

Environment

OpenClaw 2026.4.5 (3e72c03)
Model: anthropic/claude-opus-4-6
Direct Anthropic API (api.anthropic.com)
1h cache TTL

extent analysis

TL;DR

Conditionally preserve thinking blocks based on the Claude model version to fix cache misses and optimize performance.

Guidance

Update the transcript policy resolution to conditionally set dropThinkingBlocks based on the model version, preserving thinking blocks for Claude Opus 4.5+, Opus 4.6, Sonnet 4.5+, and Sonnet 4.6.
Verify the fix by monitoring cache read patterns and token sequence changes between turns to ensure that thinking blocks are preserved correctly.
Test the updated implementation with different model versions to confirm that cache optimization and prefix-based cache matching work as expected.
Review the Anthropic documentation for any additional guidance on handling thinking blocks and cache optimization for different model versions.

Example

const modelVersion = getModelVersion(modelId);
const dropThinkingBlocks = modelVersion <= 'claude-3-7-sonnet';

Note: This example assumes a getModelVersion function that extracts the model version from the modelId.

Notes

The proposed fix relies on accurately determining the model version and updating the transcript policy resolution accordingly. Ensure that the model version detection is correct and handles different model versions as expected.

Recommendation

Apply the workaround by conditionally preserving thinking blocks based on the model version, as this should fix the cache misses and optimize performance for Claude Opus 4.5+, Opus 4.6, Sonnet 4.5+, and Sonnet 4.6 models.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #retrieval issue #search optimization #API routing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Bug: dropThinkingBlocks breaks prompt cache on Claude Opus 4.5+ / Sonnet 4.5+ [1 pull requests, 5 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #61797: fix: preserve thinking blocks for Claude Opus 4.5+/Sonnet 4.5+ to fix prompt cache [AI-assisted]

Description (problem / solution / changelog)

Summary

Problem

Changes

Model Version Behavior

Testing

Deployment Verification

Pre-fix cache data (68 turns, OpenClaw 2026.4.5)

Post-fix cache data (14 turns, OpenClaw 2026.4.6)

AI Disclosure

Changed files

Code Example

Summary

Root Cause

Evidence

What Anthropic's Docs Say

Proposed Fix

Impact

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING