claude-code - 💡(How to fix) Fix Docs: `/effort` changes documented as "no effect on the cache" but empirically cause full/partial cache misses (Opus 4.7, v2.1.150)

Change effort level? Your next response will be slower and use more tokens This conversation is cached for the current effort level. Switching to xhigh means the full history gets re-read on your next message.

Documentation Type

Unclear/confusing documentation

Documentation Location

https://code.claude.com/docs/en/prompt-caching

Section/Topic

How the cache is organized

Current Documentation

From How Claude Code uses prompt caching:

Effort level: not part of the cache key or the prompt, so changing it mid-session has no effect on the cache.

What's Wrong or Missing?

The prompt-caching docs state that changing the reasoning effort level (/effort) has no effect on the cache. In practice, on Claude Code v2.1.150 with Opus 4.7, changing /effort mid-session causes a full or near-full cache miss in 5 of 6 measured transitions — the next request re-reads / rewrites most or all of the conversation. This also contradicts Claude Code's own in-app confirmation dialog, which explicitly warns the full history will be re-read.

Suggested Improvement

If the cache miss is expected: fix the prompt-caching doc — remove/replace the "no effect on the cache" statement and document that effort changes invalidate the cache (ideally noting the cold/warm + up/down nuance, or at minimum "treat an effort change like a model switch").
If the full READ→0 on a cold level is unintended: investigate why effort (an output_config parameter the docs say is not part of the cache key) busts the entire prefix cache.

Impact

Medium - Makes feature difficult to understand

Additional Context

Summary

The documented claim

From How Claude Code uses prompt caching:

Effort level: not part of the cache key or the prompt, so changing it mid-session has no effect on the cache.

The contradicting in-app behavior

Changing /effort mid-session (with prior conversation output) shows this confirmation dialog:

Change effort level?
Your next response will be slower and use more tokens

This conversation is cached for the current effort level. Switching to xhigh means the full
history gets re-read on your next message.

So the CLI itself states the opposite of its own caching docs.

Measurements

Method: read cache_read_input_tokens (READ) and cache_creation_input_tokens (CREATE) per request from the session transcript JSONL (message.usage), which mirrors the statusline context_window.current_usage object. Steady-state (no effort change) turns show READ growing monotonically with small CREATE — a healthy prefix cache. READ only ever dropped at effort-change boundaries.

Environment: Claude Code v2.1.150, claude-opus-4-7, Claude subscription (1h cache TTL — verified: idle gaps of 5+ minutes did not reset the cache, so the misses below are not TTL expiry). Context ~110K–197K tokens over the session.

#	switch	direction	target level	READ	CREATE
1	xhigh → medium	down	cold (first use)	0	117,244
2	medium → xhigh	up	warm	117,081	13,516
3	xhigh → medium	down	warm	14,278	154,248
4	medium → max	up	cold (first use)	0	177,785
5	max → xhigh	down	warm	14,278	176,197
6	xhigh → max	up	warm	178,552	17,922

For reference, a normal same-effort turn at this context size has READ ≈ full context and CREATE ≈ a few hundred to a few thousand tokens.

Observed model

Cache behavior on an effort switch depends on two factors — whether the target level was already used this session ("warm") and the direction of change:

	target warm	target cold
up	full reuse, cheap (#2, #6)	full miss, READ 0 (#4)
down	only the ~14.3K fixed prefix reused, body rewritten (#3, #5)	full miss, READ 0 (#1)

Notable, and suggestive of the mechanism:

Both down-to-warm transitions reused exactly 14,278 tokens — apparently the effort-independent system + tool-definitions + CLAUDE.md/memory prefix; the entire conversation body was rewritten.
The one up-to-warm transition (#6) reused exactly the target level's prior cache size (177,785 + 767 = 178,552), writing only the delta accumulated while away from that level.
This is consistent with the prompt cache being namespaced per effort level (like it is per model), plus an additional invalidation of the conversation body when moving to a lower effort.

Impact

The docs tell users effort changes are free, so they may toggle /effort mid-task expecting zero cost. On long contexts this silently triggers a model-switch-grade reprocess (here up to ~177K cache-creation tokens per switch), inflating latency, cost, and subscription usage.
The docs also directly contradict the CLI's own confirmation dialog, which is confusing.

Suggested resolution

If the cache miss is expected: fix the prompt-caching doc — remove/replace the "no effect on the cache" statement and document that effort changes invalidate the cache (ideally noting the cold/warm + up/down nuance, or at minimum "treat an effort change like a model switch").
If the full READ→0 on a cold level is unintended: investigate why effort (an output_config parameter the docs say is not part of the cache key) busts the entire prefix cache.

Repro

Start a session with Opus 4.7 and build up a long context (e.g. 100K+ tokens).
/effort to a level not yet used this session; confirm the dialog.
Observe the next turn: context_window.current_usage.cache_read_input_tokens drops to ~0 and cache_creation_input_tokens spikes to ~the full context size.
Switch back up to a previously-used level and observe it is cheap; switch down and observe only the fixed prefix survives.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Docs: `/effort` changes documented as "no effect on the cache" but empirically cause full/partial cache misses (Opus 4.7, v2.1.150)

Recommended Tools

GitHub issue graph ai analysis

Code Example

Documentation Type

Documentation Location

Section/Topic

Current Documentation

What's Wrong or Missing?

Suggested Improvement

Impact

Additional Context

Summary

The documented claim

The contradicting in-app behavior

Measurements

Observed model

Impact

Suggested resolution

Repro

Still need to ship something?

TRENDING