claude-code - 💡(How to fix) Fix Docs: `/effort` changes documented as "no effect on the cache" but empirically cause full/partial cache misses (Opus 4.7, v2.1.150)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

Change effort level?
Your next response will be slower and use more tokens

This conversation is cached for the current effort level. Switching to xhigh means the full
history gets re-read on your next message.
RAW_BUFFERClick to expand / collapse

Documentation Type

Unclear/confusing documentation

Documentation Location

https://code.claude.com/docs/en/prompt-caching

Section/Topic

How the cache is organized

Current Documentation

From How Claude Code uses prompt caching:

Effort level: not part of the cache key or the prompt, so changing it mid-session has no effect on the cache.

What's Wrong or Missing?

The prompt-caching docs state that changing the reasoning effort level (/effort) has no effect on the cache. In practice, on Claude Code v2.1.150 with Opus 4.7, changing /effort mid-session causes a full or near-full cache miss in 5 of 6 measured transitions — the next request re-reads / rewrites most or all of the conversation. This also contradicts Claude Code's own in-app confirmation dialog, which explicitly warns the full history will be re-read.

Suggested Improvement

  1. If the cache miss is expected: fix the prompt-caching doc — remove/replace the "no effect on the cache" statement and document that effort changes invalidate the cache (ideally noting the cold/warm + up/down nuance, or at minimum "treat an effort change like a model switch").
  2. If the full READ→0 on a cold level is unintended: investigate why effort (an output_config parameter the docs say is not part of the cache key) busts the entire prefix cache.

Impact

Medium - Makes feature difficult to understand

Additional Context

Summary

The prompt-caching docs state that changing the reasoning effort level (/effort) has no effect on the cache. In practice, on Claude Code v2.1.150 with Opus 4.7, changing /effort mid-session causes a full or near-full cache miss in 5 of 6 measured transitions — the next request re-reads / rewrites most or all of the conversation. This also contradicts Claude Code's own in-app confirmation dialog, which explicitly warns the full history will be re-read.

The documented claim

From How Claude Code uses prompt caching:

Effort level: not part of the cache key or the prompt, so changing it mid-session has no effect on the cache.

The contradicting in-app behavior

Changing /effort mid-session (with prior conversation output) shows this confirmation dialog:

Change effort level?
Your next response will be slower and use more tokens

This conversation is cached for the current effort level. Switching to xhigh means the full
history gets re-read on your next message.

So the CLI itself states the opposite of its own caching docs.

Measurements

Method: read cache_read_input_tokens (READ) and cache_creation_input_tokens (CREATE) per request from the session transcript JSONL (message.usage), which mirrors the statusline context_window.current_usage object. Steady-state (no effort change) turns show READ growing monotonically with small CREATE — a healthy prefix cache. READ only ever dropped at effort-change boundaries.

Environment: Claude Code v2.1.150, claude-opus-4-7, Claude subscription (1h cache TTL — verified: idle gaps of 5+ minutes did not reset the cache, so the misses below are not TTL expiry). Context ~110K–197K tokens over the session.

#switchdirectiontarget levelREADCREATE
1xhigh → mediumdowncold (first use)0117,244
2medium → xhighupwarm117,08113,516
3xhigh → mediumdownwarm14,278154,248
4medium → maxupcold (first use)0177,785
5max → xhighdownwarm14,278176,197
6xhigh → maxupwarm178,55217,922

For reference, a normal same-effort turn at this context size has READ ≈ full context and CREATE ≈ a few hundred to a few thousand tokens.

Observed model

Cache behavior on an effort switch depends on two factors — whether the target level was already used this session ("warm") and the direction of change:

target warmtarget cold
upfull reuse, cheap (#2, #6)full miss, READ 0 (#4)
downonly the ~14.3K fixed prefix reused, body rewritten (#3, #5)full miss, READ 0 (#1)

Notable, and suggestive of the mechanism:

  • Both down-to-warm transitions reused exactly 14,278 tokens — apparently the effort-independent system + tool-definitions + CLAUDE.md/memory prefix; the entire conversation body was rewritten.
  • The one up-to-warm transition (#6) reused exactly the target level's prior cache size (177,785 + 767 = 178,552), writing only the delta accumulated while away from that level.
  • This is consistent with the prompt cache being namespaced per effort level (like it is per model), plus an additional invalidation of the conversation body when moving to a lower effort.

Impact

  • The docs tell users effort changes are free, so they may toggle /effort mid-task expecting zero cost. On long contexts this silently triggers a model-switch-grade reprocess (here up to ~177K cache-creation tokens per switch), inflating latency, cost, and subscription usage.
  • The docs also directly contradict the CLI's own confirmation dialog, which is confusing.

Suggested resolution

  1. If the cache miss is expected: fix the prompt-caching doc — remove/replace the "no effect on the cache" statement and document that effort changes invalidate the cache (ideally noting the cold/warm + up/down nuance, or at minimum "treat an effort change like a model switch").
  2. If the full READ→0 on a cold level is unintended: investigate why effort (an output_config parameter the docs say is not part of the cache key) busts the entire prefix cache.

Repro

  1. Start a session with Opus 4.7 and build up a long context (e.g. 100K+ tokens).
  2. /effort to a level not yet used this session; confirm the dialog.
  3. Observe the next turn: context_window.current_usage.cache_read_input_tokens drops to ~0 and cache_creation_input_tokens spikes to ~the full context size.
  4. Switch back up to a previously-used level and observe it is cheap; switch down and observe only the fixed prefix survives.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Docs: `/effort` changes documented as "no effect on the cache" but empirically cause full/partial cache misses (Opus 4.7, v2.1.150)