claude-code - ✅(Solved) Fix [BUG] Per-turn smoosh pipeline folds dynamic <system-reminder> text into tool_result.content, breaking prompt cache [1 pull requests, 5 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#49585Fetched 2026-04-17 08:36:59
View on GitHub
Comments
5
Participants
4
Timeline
24
Reactions
0
Author
Timeline (top)
commented ×5mentioned ×5subscribed ×5labeled ×4

normalizeMessagesForAPI runs a smoosh pass every turn that folds <system-reminder>-prefixed text blocks into the preceding tool_result.content string. Because many of CC's own built-in reminders carry dynamic values (token_usage, output_token_usage, budget_usd, deferred_tools_delta, todo_reminder, mcp_instructions_delta, etc.), the smoosh produces different byte output turn-over-turn. The resulting byte drift in historical user messages breaks the prompt cache prefix, causing cache_creation to spike by tens or hundreds of thousands of tokens on turns that should have been clean cache hits.

No user-side hooks required — CC-internal reminders alone are sufficient to trigger this. We ported the relevant functions and reproduced the byte divergence end-to-end (simulation below).

This is distinct from the resume/scatter bugs in #40524 (closed, partial fix in 2.1.90), #44045 (resume skill_listing scatter), #48644 (am7 reminder breakpoint movement), #43657 (resume cache invalidation), #49038 (non-deterministic tool ordering), and our own #48734 / #38542 — all of which describe adjacent but different mechanisms. The smoosh bug fires per turn on non-resume paths and has not been reported specifically.

Root Cause

normalizeMessagesForAPI runs a smoosh pass every turn that folds <system-reminder>-prefixed text blocks into the preceding tool_result.content string. Because many of CC's own built-in reminders carry dynamic values (token_usage, output_token_usage, budget_usd, deferred_tools_delta, todo_reminder, mcp_instructions_delta, etc.), the smoosh produces different byte output turn-over-turn. The resulting byte drift in historical user messages breaks the prompt cache prefix, causing cache_creation to spike by tens or hundreds of thousands of tokens on turns that should have been clean cache hits.

Fix Action

Fix / Workaround

  1. Idempotency over persisted transcripts: if the smoosh output is going to be re-computed from JSONL every turn, the inputs it consumes must be byte-stable between turns. Dynamic reminders (token_usage, etc.) should be injected AFTER the smoosh pass, or excluded from it.
  2. Store smooshed transcripts: persist the post-smoosh message to JSONL, not the pre-smoosh attachment list. Then re-runs produce identical output.
  3. Gate the smoosh off historical messages: only smoosh the most-recent user message (the one being constructed), never re-smoosh history.
  4. Opt-out env var: CLAUDE_CODE_DISABLE_TOOLRESULT_SMOOSH=1 or similar for quota-sensitive power users — similar to the CLAUDE_CODE_ATTRIBUTION_HEADER=false workaround added for the #40524 family.
  • Closed / partially-fixed family: #40524 (sentinel + scatter, 2.1.90/91), community RE by @jmarianski / @whiletrue0x / @VictorSun92 / @FlorianBruniaux / @ArkNill
  • Related open bugs in same class: #48734 #38542 #43657 #44045 #48644 #49038 #42260 #43044 #42052
  • Reporter-side mitigation that would help: same "post-cache-fix" style env var lever used for #40524 bug 3

PR fix notes

PR #26: Add smoosh_split — universal un-smoosh complementing smoosh_normalize (#49585)

Description (problem / solution / changelog)

Summary

smoosh_normalize (beta.2 + beta.3) is a great fix for the four enumerated dynamic-reminder patterns — but CC's smooshSystemReminderSiblings (messages.ts:1835) fires on ANY <system-reminder>-prefixed text block adjacent to a tool_result, not just those four. This PR adds smoosh_split, which universally peels any trailing smoosh trailer off tool_result.content back into a standalone text block. Composes cleanly with smoosh_normalize for full coverage.

Why this is complementary, not duplicate

Tested against four real captured miss bodies from anthropics/claude-code#49585:

CaptureSmooshed trailers presentsmoosh_normalize catchessmoosh_split catches
2026-04-16T16-47404
2026-04-16T17-12505
2026-04-16T17-16808
2026-04-16T21-1142043

smoosh_normalize catches zero in these cases because the reminder content isn't Token usage: / USD budget: / Output tokens: / TodoWrite — they're hook-injected (thinking-enrichment, action-tracker, MCP deltas). For CC-internal bookkeeping-heavy profiles your normalize is spot-on; for hook-heavy profiles the content needs a content-agnostic peeler.

Composition semantics

The PR places smoosh_split directly after smoosh_normalize:

  1. smoosh_normalize (unchanged) runs first, replacing dynamic values in-place for its 4 known patterns.
  2. smoosh_split peels any remaining trailing \n\n<system-reminder>…\n</system-reminder> off tool_result.content into standalone text blocks.
  3. Reminder content that DID match normalize is now in the split block with stabilized bytes → byte-stable across turns.
  4. Reminder content that did NOT match normalize is in the split block drifting in its own small position → cache break moves from tool_result position (KB-sized) to reminder-block position (~100-500 bytes) — blast radius shrinks 10-100×.

Regex safety

  • Tempered to disallow nested </system-reminder>: stacked trailers peel one at a time instead of greedily capturing both into one block.
  • Anchored to end-of-string (\s*$): user-pasted text containing <system-reminder> mid-string is never mis-split.
  • Pure syntactic reversal — no semantic knowledge of reminder content — so it can't accidentally strip model-relevant data.

Gate + defaults

  • shouldApplyFix("smoosh_split") using your existing gate helper.
  • Opt-out via CACHE_FIX_SKIP_SMOOSH_SPLIT=1.
  • Defaults on because it's a zero-semantic-cost transformation — the model sees the reminder as a standalone text block just as it would have pre-smoosh.
  • Counter added to _STATS_SCHEMA for observability.

Validation

Unit tests against the four captured miss trios confirm 100% coverage (55/55 smooshed trailers peeled across all captures). Tests live in my fork of this repo (happy to also upstream them if you want a test/ directory added — didn't want to presume on layout).

Credit

Builds directly on your smoosh_normalize patches (81ff47c, 946bf1b, b52ce48). The source-function analysis was cross-confirmed in anthropics/claude-code#43657 between your binary-review work and my OTEL capture work — same functions (processSessionStartHooks, reorderAttachmentsForAPI, normalizeMessagesForAPI), different entry points.

Changed files

  • insertion.txt (added, +57/-0)
  • preload.mjs (modified, +58/-0)

Code Example

# Input shape (both turns):
user_msg(
    tool_result("toolu_01abc", "bash output: ls -la\ntotal 0\n"),
    token_usage(used, 200000),           # dynamic
    output_token_usage(turn_out, session_out),  # dynamic
    budget_usd(used_usd, 100.00),         # dynamic
)

# Turn N:   used=150_000, turn_out=500, session_out=3_200, used_usd=4.50
# Turn N+1: used=151_200, turn_out=580, session_out=3_780, used_usd=4.82

---

Turn N:   tool_result.content = "bash output: ls -la\ntotal 0\n\n<system-reminder>\nToken usage: 150000/200000; 50000 remaining\n</system-reminder>\n\n..."
Turn N+1: tool_result.content = "bash output: ls -la\ntotal 0\n\n<system-reminder>\nToken usage: 151200/200000; 48800 remaining\n</system-reminder>\n\n..."
                                                                                          ^
                                                                              First byte divergence at offset 161
RAW_BUFFERClick to expand / collapse

Summary

normalizeMessagesForAPI runs a smoosh pass every turn that folds <system-reminder>-prefixed text blocks into the preceding tool_result.content string. Because many of CC's own built-in reminders carry dynamic values (token_usage, output_token_usage, budget_usd, deferred_tools_delta, todo_reminder, mcp_instructions_delta, etc.), the smoosh produces different byte output turn-over-turn. The resulting byte drift in historical user messages breaks the prompt cache prefix, causing cache_creation to spike by tens or hundreds of thousands of tokens on turns that should have been clean cache hits.

No user-side hooks required — CC-internal reminders alone are sufficient to trigger this. We ported the relevant functions and reproduced the byte divergence end-to-end (simulation below).

This is distinct from the resume/scatter bugs in #40524 (closed, partial fix in 2.1.90), #44045 (resume skill_listing scatter), #48644 (am7 reminder breakpoint movement), #43657 (resume cache invalidation), #49038 (non-deterministic tool ordering), and our own #48734 / #38542 — all of which describe adjacent but different mechanisms. The smoosh bug fires per turn on non-resume paths and has not been reported specifically.

Affected code

src/utils/messages.ts (function names verified against the leaked TS source):

FunctionLineRole
normalizeMessagesForAPI~1989Entry point, runs the full pipeline every outgoing request
smooshSystemReminderSiblings~1835Walks each user message, extracts <system-reminder>-prefixed text blocks, folds them into the last tool_result via smooshIntoToolResult
smooshIntoToolResult~2534The actual fold: joins tool_result.content + incoming text blocks with \n\n separators
mergeAdjacentUserMessagescalled before smooshCombines consecutive user messages, creating the adjacencies smoosh then folds
Feature gate tengu_chair_sermon~2274, ~2335Statsig-cached (_CACHED_MAY_BE_STALE) gate; when ON, runs the merge+smoosh pipeline

The smoosh's own comment claims "Idempotent. Pure function of shape." That claim holds for a single fixed input — but the input is not stable turn-over-turn when:

  1. Reminders carry dynamic values (token_usage, etc.)
  2. New attachments are appended between turns (deferred_tools_delta, todo_reminder after N turns, mcp_instructions_delta)
  3. The Statsig cache refreshes mid-session and toggles the gate
  4. mergeAdjacentUserMessages creates different adjacencies as history grows

Any of the above makes the smoosh output differ from what was previously cached on the server, breaking the prefix match.

Observed fold is distributed by toolUseID, not concentrated on last tool_result

Our captured miss #1 showed msg[30] going from 6 blocks to 3, with each text block folded into a different tool_result (+107, +107, +519 chars matching each reminder's size). A port-faithful simulation of smooshSystemReminderSiblings alone (which folds all text into the LAST tool_result) does NOT reproduce this shape — the mechanism must involve an earlier pass.

Tracing through normalizeMessagesForAPI (messages.ts:1989):

  1. reorderAttachmentsForAPI (messages.ts:1481) — bubbles each hook-attachment user-message up in the list until it lands right after its matching tool_result. Unconditional (no gate).
  2. mergeAdjacentUserMessages (messages.ts:2451) — walks the resulting list and merges consecutive user messages pairwise via mergeUserMessagesAndToolResults (messages.ts:2372).
  3. That merge delegates to mergeUserContentBlocks (messages.ts:2600), which when tengu_chair_sermon is ON takes the universal-smoosh branch (messages.ts:2628-2643) and folds each incoming reminder into the currently-last tool_result of the accumulating message.
  4. Because the pairwise merges happen in order — and each hook attachment is positioned right after its own tool_result thanks to step 1 — each reminder lands on ITS specific tool_result rather than all piling onto the final one.
  5. hoistToolResults (messages.ts:2470) then re-sorts all tool_results to the front of the content array.
  6. Finally smooshSystemReminderSiblings (messages.ts:1835) mops up any remaining <system-reminder>-prefixed siblings that escaped the merge-time fold.

The distributed-fold pattern we observed is an emergent property of the pairwise-merge order combined with the attachment-position reorder — not a single dedicated function.

tengu_chair_sermon controls the universal-smoosh branch; when OFF, mergeUserContentBlocks takes a narrower legacy path that only smooshes when the tool_result's content is a plain string and all incoming siblings are text. That narrower path still covers the common hook-reminder-after-Bash-result case but misses multi-block cases. Net effect: all users with dynamic hook output paired to tool_results are affected; the gate controls the aggressiveness of the fold, not its presence.

Reproduction (minimal, CC-internal only)

Ported smooshSystemReminderSiblings + smooshIntoToolResult to Python (logic verbatim from the leaked TS), fed two adjacent turns with only built-in CC reminders — no user hooks:

# Input shape (both turns):
user_msg(
    tool_result("toolu_01abc", "bash output: ls -la\ntotal 0\n"),
    token_usage(used, 200000),           # dynamic
    output_token_usage(turn_out, session_out),  # dynamic
    budget_usd(used_usd, 100.00),         # dynamic
)

# Turn N:   used=150_000, turn_out=500, session_out=3_200, used_usd=4.50
# Turn N+1: used=151_200, turn_out=580, session_out=3_780, used_usd=4.82

After smooshSystemReminderSiblings:

Turn N:   tool_result.content = "bash output: ls -la\ntotal 0\n\n<system-reminder>\nToken usage: 150000/200000; 50000 remaining\n</system-reminder>\n\n..."
Turn N+1: tool_result.content = "bash output: ls -la\ntotal 0\n\n<system-reminder>\nToken usage: 151200/200000; 48800 remaining\n</system-reminder>\n\n..."
                                                                                          ^
                                                                              First byte divergence at offset 161

The smoosh absorbed the dynamic token-usage numbers into what was previously a stable tool_result.content string. The cache entry for Turn N's tool_result.content no longer matches Turn N+1's version → cache-prefix break at that position → all bytes after the break point must re-cache on this turn.

Full 130-line port is runnable standalone: no CC install needed.

Real-session impact (one user, measured)

Methodology: log ctx_cache_w / ctx_cache_r from every statusline poll (14,797 samples over multiple weeks on Opus 4.7 1M-context). Bucket by cache-read signature:

CauseTurnscache_creation
Fusion-attributable (cr ≈ 16k-19k, tools-only cached)~204~99M
Resume / TTL / server cold-start (cr = 0)5332M
Normal hit-bound operation14,48139M
Partial/other427.6M

The 204 fusion-attributable turns all show the fingerprint cache_read = tools-prefix-only (no message prefix matched) + cache_creation = full message prefix re-built. Their cr values cluster tightly at ~19,282 (the tools-only token count) and ~16,700 (smaller tools set when some MCP servers or skills were toggled). No fusion-signature cr values below 10k — those would indicate different causes.

~99M cache_creation tokens on fusion-attributable turns — ~56% of this session's total cache_creation spend went to re-caching content that was already present in prior turns. Token-cost at Opus 4.7 1M-context ephemeral rates works out to somewhere in the low-four-figure USD range for this one user across the logged window; I'm leaving the exact dollar figure to Anthropic's own pricing math since the per-tier rates shift by TTL and context band.

Consistent with the pattern of "abnormal quota drain" reports

The signature — a trivial user turn triggering cache_creation that bills as though the full message history were new — matches the shape of reports like #42052, #43274, #45756, #41930, #41617, #16157. Users without power hooks still hit this via CC's own dynamic reminders (token_usage/output_token_usage every turn; todo_reminder after N turns; deferred_tools_delta/mcp_instructions_delta on MCP flaps). Worth investigating as a contributing factor to those reports — the mechanism would produce exactly the "I sent 5 prompts and my weekly quota is gone" symptom if any subset of turns hits a smoosh-drift path.

I'm not claiming this is the sole cause of those drain reports — just that the mechanism here is one they'd reasonably produce.

Request

The smoosh is clearly intended (comment at line 2328 in messages.ts explains the tengu_chair_sermon gate's purpose). But for the smoosh to be safe under CC's current architecture, it needs at least one of:

  1. Idempotency over persisted transcripts: if the smoosh output is going to be re-computed from JSONL every turn, the inputs it consumes must be byte-stable between turns. Dynamic reminders (token_usage, etc.) should be injected AFTER the smoosh pass, or excluded from it.
  2. Store smooshed transcripts: persist the post-smoosh message to JSONL, not the pre-smoosh attachment list. Then re-runs produce identical output.
  3. Gate the smoosh off historical messages: only smoosh the most-recent user message (the one being constructed), never re-smoosh history.
  4. Opt-out env var: CLAUDE_CODE_DISABLE_TOOLRESULT_SMOOSH=1 or similar for quota-sensitive power users — similar to the CLAUDE_CODE_ATTRIBUTION_HEADER=false workaround added for the #40524 family.

The simulation file and our captured miss-pair bodies (three trios with full /v1/messages request bodies showing the before/after fusion) are available on request for internal repro.

Cross-references

  • Closed / partially-fixed family: #40524 (sentinel + scatter, 2.1.90/91), community RE by @jmarianski / @whiletrue0x / @VictorSun92 / @FlorianBruniaux / @ArkNill
  • Related open bugs in same class: #48734 #38542 #43657 #44045 #48644 #49038 #42260 #43044 #42052
  • Reporter-side mitigation that would help: same "post-cache-fix" style env var lever used for #40524 bug 3

Leaving our earlier #48734 open as a separate data point covering the same mechanism from a different angle — happy for maintainers to consolidate if they prefer.

extent analysis

TL;DR

To fix the issue, consider modifying the smooshSystemReminderSiblings function to exclude dynamic reminders or inject them after the smoosh pass, ensuring idempotency over persisted transcripts.

Guidance

  • Review the normalizeMessagesForAPI function to understand the pipeline that leads to the smoosh pass and identify potential points of intervention.
  • Investigate modifying the smooshIntoToolResult function to handle dynamic reminders in a way that maintains byte stability between turns.
  • Consider adding an opt-out environment variable, such as CLAUDE_CODE_DISABLE_TOOLRESULT_SMOOSH, to allow quota-sensitive power users to disable the smoosh feature.
  • Evaluate the feasibility of storing smooshed transcripts instead of pre-smoosh attachment lists to ensure consistent output.

Example

No code snippet is provided due to the complexity of the issue and the need for a more comprehensive solution.

Notes

The provided simulation and captured miss-pair bodies may be useful for internal repro and further investigation. The issue is related to other open bugs in the same class, and consolidating these issues may be beneficial.

Recommendation

Apply a workaround by injecting dynamic reminders after the smoosh pass or excluding them from the smoosh process to ensure idempotency over persisted transcripts. This approach addresses the root cause of the issue and provides a more stable solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING