claude-code - 💡(How to fix) Fix [BUG] Hidden `isMeta` system-reminder causes 200k+ cache_creation burst mid-session (Opus 4.6) [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48644Fetched 2026-04-16 06:54:51
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
1
Author
Participants
Timeline (top)
labeled ×5cross-referenced ×1

Error Message

Error Messages/Logs

There is no error message. The only visible signal is the usage.cache_creation_input_tokens field in the JSONL transcript at ~/.claude/projects/<project>/<sid>.jsonl.

Fix Action

Fix / Workaround

Workarounds verified

Code Example

// constant
am7 = "<system-reminder>Respond with just the action or changes and without a thinking block, unless this is a redesign or requires fresh reasoning.</system-reminder>"

// dynamic boundary marker used elsewhere to split system prompt into
// cached (global) vs. non-cached (null) blocks
WMH = "__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__"

// injection site (paraphrased from minified source)
mainLoopModel && $?.some(U => U.type === "assistant") &&
  F.messages.push(n_({ content: am7, isMeta: true }));
return F;

---

There is no error message. The only visible signal is the `usage.cache_creation_input_tokens` field in the JSONL transcript at `~/.claude/projects/<project>/<sid>.jsonl`.

### Observed real session (Opus 4.6, 183 turns)

| Metric | Value |
|---|---|
| Total assistant turns | 183 |
| Normal turn `cache_creation` | 1k – 3k |
| Burst turn `cache_creation` | 200k – 282k |
| Burst turns | 9 |
| Share of total `cache_creation` from burst turns | ~56% |
| Total `cache_creation` | 2.66M |
| Total `cache_read` (normal caching working) | 24.6M |

Bursts were not user-initiated (no `/clear`, no `/compact`, no large tool results). They appeared at internal state transitions.

### Controlled repro (see "Steps to Reproduce")

Switching output style while in an Opus 4.6 session produces a one-shot burst within 2 turns of the switch, matching the pattern observed in normal sessions:

| Step | outputStyle | User prompt | `cache_creation` |
|---|---|---|---|
| baseline (normal work) | Explanatory || 9321,417 |
| `/config`default | default | `ok` | 589824 |
| next turn | default | `` | **335,411** |
| next turn | default | `1` | 278354 |
| `/config`Explanatory | Explanatory | `2` | 365 |
| next turn | Explanatory | `3` | 432 |

The `"네"` turn is two bytes; the 335k spike is consistent with the cached prefix up to the last breakpoint being re-created.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Every Opus 4.6 turn that satisfies a specific internal gate pushes a hidden meta-message onto the messages array. On certain turns this push correlates with the prompt cache being invalidated, so cache_creation_input_tokens spikes from the normal ~1–3k range to 200k–282k per turn. The user prompt that triggers a burst is often a single token ("ok", "1"); the cost is hidden in the transcript and invisible in the UI.

In a 183-turn session on Opus 4.6, 9 such burst turns accounted for ~56% of total cache_creation (1.5M of 2.66M). The reminder content is never shown to the user and the injection site is the model's messages array, not the system-prompt block — so even when the string itself is stable, the breakpoint position moves per turn and invalidates the cached prefix.

Binary evidence

// constant
am7 = "<system-reminder>Respond with just the action or changes and without a thinking block, unless this is a redesign or requires fresh reasoning.</system-reminder>"

// dynamic boundary marker used elsewhere to split system prompt into
// cached (global) vs. non-cached (null) blocks
WMH = "__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__"

// injection site (paraphrased from minified source)
mainLoopModel && $?.some(U => U.type === "assistant") &&
  F.messages.push(n_({ content: am7, isMeta: true }));
return F;

Injection gate (traced from binary)

All of the following must hold for the push to occur:

  • input mode is prompt
  • no external-loading path
  • no customSystemPrompt set
  • thinking is not disabled
  • Pz8(mainLoopModel) returns true
  • at least one prior assistant message exists

Pz8 returns true only when:

  • the model belongs to the opus-4-6 family, and
  • D_().clientDataCache?.loud_sugary_rock === "true" (remote experiment gate)

What Should Happen?

The reminder string is a constant per product version; it should either:

  • Live inside the cacheable global system-prompt block (before __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__), so it cannot invalidate the prefix on subsequent turns, or
  • Be placed after the last cache_control breakpoint so its turn-to-turn presence doesn't move the cached prefix, or
  • Be controllable with an environment variable (e.g., CLAUDE_CODE_DISABLE_ACTION_REMINDER=1) so quota-sensitive users can opt out.

Short trivial user prompts should never cost 200k+ cache_creation tokens.

Error Messages/Logs

There is no error message. The only visible signal is the `usage.cache_creation_input_tokens` field in the JSONL transcript at `~/.claude/projects/<project>/<sid>.jsonl`.

### Observed real session (Opus 4.6, 183 turns)

| Metric | Value |
|---|---|
| Total assistant turns | 183 |
| Normal turn `cache_creation` | 1k – 3k |
| Burst turn `cache_creation` | 200k – 282k |
| Burst turns | 9 |
| Share of total `cache_creation` from burst turns | ~56% |
| Total `cache_creation` | 2.66M |
| Total `cache_read` (normal caching working) | 24.6M |

Bursts were not user-initiated (no `/clear`, no `/compact`, no large tool results). They appeared at internal state transitions.

### Controlled repro (see "Steps to Reproduce")

Switching output style while in an Opus 4.6 session produces a one-shot burst within 2 turns of the switch, matching the pattern observed in normal sessions:

| Step | outputStyle | User prompt | `cache_creation` |
|---|---|---|---|
| baseline (normal work) | Explanatory || 9321,417 |
| `/config` → default | default | `ok` | 589824 |
| next turn | default | `` | **335,411** |
| next turn | default | `1` | 278354 |
| `/config` → Explanatory | Explanatory | `2` | 365 |
| next turn | Explanatory | `3` | 432 |

The `"네"` turn is two bytes; the 335k spike is consistent with the cached prefix up to the last breakpoint being re-created.

Steps to Reproduce

  1. Start a fresh session on claude-opus-4-6 with thinking enabled. Do not set --system-prompt.
  2. Run a few normal turns to ensure at least one prior assistant message exists.
  3. Issue /config and switch output style from the current value to default (or from default to any named style). Send a short prompt such as ok.
  4. Send another short prompt such as 1 or .
  5. Open the session JSONL at ~/.claude/projects/<project>/<sid>.jsonl. For each type:"assistant" entry, inspect message.usage.cache_creation_input_tokens.
  6. Expect the second turn after the style change to show a cache_creation spike of ~200k+ with cache_read_input_tokens = 0, while surrounding turns stay under 2k.

Alternative (non-deliberate) repro: running any long Opus 4.6 session will exhibit the same pattern at internal state transitions (e.g., when the loud_sugary_rock gate flips or on the first turn that gains a prior assistant message).

Claude Model

Opus

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

2.1.109

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Terminal.app (macOS)

Additional Information

Workarounds verified

  • --thinking disabled short-circuits one of the gate conditions. Empirically reduces burst frequency in the same session; needs broader confirmation.
  • Switching the model family away from opus-4-6 appears to fail Pz8, avoiding the push.
  • --system-prompt bypasses the gate but is too heavy-handed for normal use.
  • --append-system-prompt does not help (verified).
  • --exclude-dynamic-system-prompt-sections does not help (verified).
  • No settings.json key appears to control this path.

Suggested fix

Preferred: move the reminder into the static cacheable system-prompt block (Option A above). The reminder content is constant per version and does not need to be a per-turn dynamic isMeta push.

Backup: expose an env-var opt-out. Users on quota-sensitive plans can mitigate immediately while the caching fix is being designed.

Additional notes

  • The reminder content itself is reasonable guidance; the concern is the insertion mechanism, not the text.
  • The binary already has a Xz8 / WMH two-tier (global vs. null) cache-scope mechanism; routing am7 through the global tier would resolve this cleanly.
  • Happy to provide redacted transcript snippets, binary offsets, and additional JSONL samples if useful — kept out of the public body to avoid leaking private content.

extent analysis

TL;DR

To fix the issue, move the reminder into the static cacheable system-prompt block or expose an environment variable to opt-out of the reminder.

Guidance

  • Identify the conditions under which the reminder is pushed onto the messages array and verify that it's indeed causing the caching issue.
  • Consider moving the reminder into the static cacheable system-prompt block to prevent it from invalidating the cached prefix.
  • If moving the reminder is not feasible, expose an environment variable (e.g., CLAUDE_CODE_DISABLE_ACTION_REMINDER) to allow users to opt-out of the reminder.
  • Test the fix by running a long Opus 4.6 session and verifying that the cache_creation spikes are no longer present.
  • Verify that the reminder is still displayed correctly when the fix is applied.

Example

No code snippet is provided as the issue is more related to the logic and architecture of the system rather than a specific code block.

Notes

The fix may require changes to the binary and/or the system's architecture. It's essential to test the fix thoroughly to ensure that it doesn't introduce any new issues.

Recommendation

Apply the workaround by exposing an environment variable to opt-out of the reminder, as this is a more immediate solution that can be implemented while the caching fix is being designed. This will allow quota-sensitive users to mitigate the issue immediately.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING