claude-code - 💡(How to fix) Fix [BUG] Hidden `isMeta` system-reminder causes 200k+ cache_creation burst mid-session (Opus 4.6) [1 participants]

mqzkim · 2026-04-15T15:15:15Z

[claude-code] Preflight Checklist - x I have searched existing issues https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abu… ## Fix / Workaround ## Workarounds verified ### Preflight Checklist - [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? Every Opus 4.6 turn that satisfies a specific internal gate pushes a hidden meta-message onto the `messages` array. On certain turns this push correlates with the prompt cache being invalidated, so `cache_creation_input_tokens` spikes from the normal ~1–3k range to **200k–282k per turn**. The user prompt that triggers a burst is often a single token (`"ok"`, `"1"`); the cost is hidden in the transcript and invisible in the UI. In a 183-turn session on Opus 4.6, 9 such burst turns accounted for **~56% of total `cache_creation`** (1.5M of 2.66M). The reminder content is never shown to the user and the injection site is the model's `messages` array, not the system-prompt block — so even when the string itself is stable, the breakpoint position moves per turn and invalidates the cached prefix. ### Binary evidence ```js // constant am7 = " Respond with just the action or changes and without a thinking block, unless this is a redesign or requires fresh reasoning. " // dynamic boundary marker used elsewhere to split system prompt into // cached (global) vs. non-cached (null) blocks WMH = "__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__" // injection site (paraphrased from minified source) mainLoopModel && $?.some(U => U.type === "assistant") && F.messages.push(n_({ content: am7, isMeta: true })); return F; ``` ### Injection gate (traced from binary) All of the following must hold for the push to occur: - input mode is `prompt` - no external-loading path - no `customSystemPrompt` set - thinking is **not** disabled - `Pz8(mainLoopModel)` returns true - at least one prior assistant message exists `Pz8` returns true only when: - the model belongs to the `opus-4-6` family, **and** - `D_().clientDataCache?.loud_sugary_rock === "true"` (remote experiment gate) ### What Should Happen? The reminder string is a constant per product version; it should either: - Live inside the cacheable global system-prompt block (before `__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__`), so it cannot invalidate the prefix on subsequent turns, **or** - Be placed after the last `cache_control` breakpoint so its turn-to-turn presence doesn't move the cached prefix, **or** - Be controllable with an environment variable (e.g., `CLAUDE_CODE_DISABLE_ACTION_REMINDER=1`) so quota-sensitive users can opt out. Short trivial user prompts should never cost 200k+ `cache_creation` tokens. ### Error Messages/Logs ```shell There is no error message. The only visible signal is the `usage.cache_creation_input_tokens` field in the JSONL transcript at `~/.claude/projects/ / .jsonl`. ### Observed real session (Opus 4.6, 183 turns) | Metric | Value | |---|---| | Total assistant turns | 183 | | Normal turn `cache_creation` | 1k – 3k | | Burst turn `cache_creation` | 200k – 282k | | Burst turns | 9 | | Share of total `cache_creation` from burst turns | ~56% | | Total `cache_creation` | 2.66M | | Total `cache_read` (normal caching working) | 24.6M | Bursts were not user-initiated (no `/clear`, no `/compact`, no large tool results). They appeared at internal state transitions. ### Controlled repro (see "Steps to Reproduce") Switching output style while in an Opus 4.6 session produces a one-shot burst within 2 turns of the switch, matching the pattern observed in normal sessions: | Step | outputStyle | User prompt | `cache_creation` | |---|---|---|---| | baseline (normal work) | Explanatory | — | 932 – 1,417 | | `/config` → default | default | `ok` | 589 – 824 | | next turn | default | `네` | **335,411** | | next turn | default | `1` | 278 – 354 | | `/config` → Explanatory | Explanatory | `2` | 365 | | next turn | Explanatory | `3` | 432 | The `"네"` turn is two bytes; the 335k spike is consistent with the cached prefix up to the last breakpoint being re-created. ``` ### Steps to Reproduce 1. Start a fresh session on `claude-opus-4-6` with thinking enabled. Do not set `--system-prompt`. 2. Run a few normal turns to ensure at least one prior assistant message exists. 3. Issue `/config` and switch output style from the current value to `default` (or from `default` to any named style). Send a short prompt such as `ok`. 4. Send another short prompt such as `1` or `네`. 5. Open the session JSONL at `~/.claude/projects/ / .jsonl`. For each `type:"assistant"` entry, inspect `message.usage.cache_creation_input_tokens`. 6. Expect the **second turn after the style change** to show a `cache_creation` spike of ~200

claude-code2026-04-15 15:15:15

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#48644•Fetched 2026-04-16 06:54:51

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mqzkim

Participants

mqzkim

Timeline (top)

labeled ×5cross-referenced ×1

Error Message

Error Messages/Logs

There is no error message. The only visible signal is the usage.cache_creation_input_tokens field in the JSONL transcript at ~/.claude/projects/<project>/<sid>.jsonl.

Fix Action

Fix / Workaround

Workarounds verified

Code Example

// constant
am7 = "<system-reminder>Respond with just the action or changes and without a thinking block, unless this is a redesign or requires fresh reasoning.</system-reminder>"

// dynamic boundary marker used elsewhere to split system prompt into
// cached (global) vs. non-cached (null) blocks
WMH = "__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__"

// injection site (paraphrased from minified source)
mainLoopModel && $?.some(U => U.type === "assistant") &&
  F.messages.push(n_({ content: am7, isMeta: true }));
return F;

---

There is no error message. The only visible signal is the `usage.cache_creation_input_tokens` field in the JSONL transcript at `~/.claude/projects/<project>/<sid>.jsonl`.

### Observed real session (Opus 4.6, 183 turns)

| Metric | Value |
|---|---|
| Total assistant turns | 183 |
| Normal turn `cache_creation` | 1k – 3k |
| Burst turn `cache_creation` | 200k – 282k |
| Burst turns | 9 |
| Share of total `cache_creation` from burst turns | ~56% |
| Total `cache_creation` | 2.66M |
| Total `cache_read` (normal caching working) | 24.6M |

Bursts were not user-initiated (no `/clear`, no `/compact`, no large tool results). They appeared at internal state transitions.

### Controlled repro (see "Steps to Reproduce")

Switching output style while in an Opus 4.6 session produces a one-shot burst within 2 turns of the switch, matching the pattern observed in normal sessions:

| Step | outputStyle | User prompt | `cache_creation` |
|---|---|---|---|
| baseline (normal work) | Explanatory | — | 932 – 1,417 |
| `/config` → default | default | `ok` | 589 – 824 |
| next turn | default | `네` | **335,411** |
| next turn | default | `1` | 278 – 354 |
| `/config` → Explanatory | Explanatory | `2` | 365 |
| next turn | Explanatory | `3` | 432 |

The `"네"` turn is two bytes; the 335k spike is consistent with the cached prefix up to the last breakpoint being re-created.

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Every Opus 4.6 turn that satisfies a specific internal gate pushes a hidden meta-message onto the messages array. On certain turns this push correlates with the prompt cache being invalidated, so cache_creation_input_tokens spikes from the normal ~1–3k range to 200k–282k per turn. The user prompt that triggers a burst is often a single token ("ok", "1"); the cost is hidden in the transcript and invisible in the UI.

In a 183-turn session on Opus 4.6, 9 such burst turns accounted for ~56% of total cache_creation (1.5M of 2.66M). The reminder content is never shown to the user and the injection site is the model's messages array, not the system-prompt block — so even when the string itself is stable, the breakpoint position moves per turn and invalidates the cached prefix.

Binary evidence

// constant
am7 = "<system-reminder>Respond with just the action or changes and without a thinking block, unless this is a redesign or requires fresh reasoning.</system-reminder>"

// dynamic boundary marker used elsewhere to split system prompt into
// cached (global) vs. non-cached (null) blocks
WMH = "__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__"

// injection site (paraphrased from minified source)
mainLoopModel && $?.some(U => U.type === "assistant") &&
  F.messages.push(n_({ content: am7, isMeta: true }));
return F;

Injection gate (traced from binary)

All of the following must hold for the push to occur:

input mode is prompt
no external-loading path
no customSystemPrompt set
thinking is not disabled
Pz8(mainLoopModel) returns true
at least one prior assistant message exists

Pz8 returns true only when:

the model belongs to the opus-4-6 family, and
D_().clientDataCache?.loud_sugary_rock === "true" (remote experiment gate)

What Should Happen?

The reminder string is a constant per product version; it should either:

Live inside the cacheable global system-prompt block (before __SYSTEM_PROMPT_DYNAMIC_BOUNDARY__), so it cannot invalidate the prefix on subsequent turns, or
Be placed after the last cache_control breakpoint so its turn-to-turn presence doesn't move the cached prefix, or
Be controllable with an environment variable (e.g., CLAUDE_CODE_DISABLE_ACTION_REMINDER=1) so quota-sensitive users can opt out.

Short trivial user prompts should never cost 200k+ cache_creation tokens.

Error Messages/Logs

There is no error message. The only visible signal is the `usage.cache_creation_input_tokens` field in the JSONL transcript at `~/.claude/projects/<project>/<sid>.jsonl`.

### Observed real session (Opus 4.6, 183 turns)

| Metric | Value |
|---|---|
| Total assistant turns | 183 |
| Normal turn `cache_creation` | 1k – 3k |
| Burst turn `cache_creation` | 200k – 282k |
| Burst turns | 9 |
| Share of total `cache_creation` from burst turns | ~56% |
| Total `cache_creation` | 2.66M |
| Total `cache_read` (normal caching working) | 24.6M |

Bursts were not user-initiated (no `/clear`, no `/compact`, no large tool results). They appeared at internal state transitions.

### Controlled repro (see "Steps to Reproduce")

Switching output style while in an Opus 4.6 session produces a one-shot burst within 2 turns of the switch, matching the pattern observed in normal sessions:

| Step | outputStyle | User prompt | `cache_creation` |
|---|---|---|---|
| baseline (normal work) | Explanatory | — | 932 – 1,417 |
| `/config` → default | default | `ok` | 589 – 824 |
| next turn | default | `네` | **335,411** |
| next turn | default | `1` | 278 – 354 |
| `/config` → Explanatory | Explanatory | `2` | 365 |
| next turn | Explanatory | `3` | 432 |

The `"네"` turn is two bytes; the 335k spike is consistent with the cached prefix up to the last breakpoint being re-created.

Steps to Reproduce

Start a fresh session on claude-opus-4-6 with thinking enabled. Do not set --system-prompt.
Run a few normal turns to ensure at least one prior assistant message exists.
Issue /config and switch output style from the current value to default (or from default to any named style). Send a short prompt such as ok.
Send another short prompt such as 1 or 네.
Open the session JSONL at ~/.claude/projects/<project>/<sid>.jsonl. For each type:"assistant" entry, inspect message.usage.cache_creation_input_tokens.
Expect the second turn after the style change to show a cache_creation spike of ~200k+ with cache_read_input_tokens = 0, while surrounding turns stay under 2k.

Alternative (non-deliberate) repro: running any long Opus 4.6 session will exhibit the same pattern at internal state transitions (e.g., when the loud_sugary_rock gate flips or on the first turn that gains a prior assistant message).

Claude Model

Opus

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

2.1.109

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Terminal.app (macOS)

Additional Information

Workarounds verified

--thinking disabled short-circuits one of the gate conditions. Empirically reduces burst frequency in the same session; needs broader confirmation.
Switching the model family away from opus-4-6 appears to fail Pz8, avoiding the push.
--system-prompt bypasses the gate but is too heavy-handed for normal use.
--append-system-prompt does not help (verified).
--exclude-dynamic-system-prompt-sections does not help (verified).
No settings.json key appears to control this path.

Suggested fix

Preferred: move the reminder into the static cacheable system-prompt block (Option A above). The reminder content is constant per version and does not need to be a per-turn dynamic isMeta push.

Backup: expose an env-var opt-out. Users on quota-sensitive plans can mitigate immediately while the caching fix is being designed.

Additional notes

The reminder content itself is reasonable guidance; the concern is the insertion mechanism, not the text.
The binary already has a Xz8 / WMH two-tier (global vs. null) cache-scope mechanism; routing am7 through the global tier would resolve this cleanly.
Happy to provide redacted transcript snippets, binary offsets, and additional JSONL samples if useful — kept out of the public body to avoid leaking private content.

extent analysis

TL;DR

To fix the issue, move the reminder into the static cacheable system-prompt block or expose an environment variable to opt-out of the reminder.

Guidance

Identify the conditions under which the reminder is pushed onto the messages array and verify that it's indeed causing the caching issue.
Consider moving the reminder into the static cacheable system-prompt block to prevent it from invalidating the cached prefix.
If moving the reminder is not feasible, expose an environment variable (e.g., CLAUDE_CODE_DISABLE_ACTION_REMINDER) to allow users to opt-out of the reminder.
Test the fix by running a long Opus 4.6 session and verifying that the cache_creation spikes are no longer present.
Verify that the reminder is still displayed correctly when the fix is applied.

Example

No code snippet is provided as the issue is more related to the logic and architecture of the system rather than a specific code block.

Notes

The fix may require changes to the binary and/or the system's architecture. It's essential to test the fix thoroughly to ensure that it doesn't introduce any new issues.

Recommendation

Apply the workaround by exposing an environment variable to opt-out of the reminder, as this is a more immediate solution that can be implemented while the caching fix is being designed. This will allow quota-sensitive users to mitigate the issue immediately.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #tensor shape #autograd error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

claude-code - 💡(How to fix) Fix [BUG] Hidden `isMeta` system-reminder causes 200k+ cache_creation burst mid-session (Opus 4.6) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Fix Action

Fix / Workaround

Workarounds verified

Code Example

Preflight Checklist

What's Wrong?

Binary evidence

Injection gate (traced from binary)

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

Workarounds verified

Suggested fix

Additional notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING