claude-code - ✅(Solved) Fix CC v2.1.100+ inflates cache_creation by ~20K tokens vs v2.1.98 — same payload, server-side [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46917Fetched 2026-04-12 13:29:44
View on GitHub
Comments
2
Participants
3
Timeline
13
Reactions
21
Timeline (top)
labeled ×5subscribed ×5commented ×2cross-referenced ×1

Claude Code versions 2.1.100 and 2.1.101 consume ~20,000 more cache_creation_input_tokens per request than v2.1.98, despite sending fewer bytes in the request payload. The inflation is server-side and version-specific (likely User-Agent routing).

This is not a billing-only issue — these tokens enter the model's context window and may affect output quality.

Root Cause

We observed qualitative differences between v2.1.98 and v2.1.100+ sessions (instruction adherence, tool selection accuracy), but cannot isolate whether this is caused by the extra hidden tokens or other version changes. The lack of transparency makes it impossible to tell.

Fix Action

Workaround

Downgrade to v2.1.98 or earlier. Check ~/.local/share/claude/versions/ for available binaries, or use npx [email protected].

Note: Auto-updates may overwrite older versions. v2.1.98 is no longer retained in the local versions directory after v2.1.104 installed.

PR fix notes

PR #206: feat: shared overlay, claude-code tuning, formatter & flake update

Description (problem / solution / changelog)

Summary

  • Shared overlay: Add overlays/shared.nix applied to all hosts via baseOverlays. Refactor Nexus, CauldronLake, NightSprings, WorkLaptop to use baseOverlays ++ [host-overlay] (same pattern as mkBrightFalls), fixing NUR ordering inconsistency
  • Claude Code tuning: Override claude-code wrapper with env vars for max effort, disabled adaptive thinking, fixed 64K thinking budget, 400K auto-compact window, and custom User-Agent header (refs: anthropics/claude-code#46917, anthropics/claude-code#42796)
  • Formatter: Add nixfmt as flake formatter output, run on all .nix files
  • Flake update: All inputs updated

🤖 Generated with Claude Code

Changed files

  • flake.lock (modified, +15/-15)
  • flake.nix (modified, +11/-16)
  • hosts/Brightfalls/hardware.nix (modified, +4/-1)
  • hosts/CauldronLake/users/debora/gnome.nix (modified, +6/-3)
  • hosts/Nexus/gpu.nix (modified, +4/-4)
  • hosts/Nexus/services/n8n/docker-compose.nix (modified, +13/-6)
  • hosts/Nexus/services/polkit.nix (modified, +0/-1)
  • hosts/Nexus/services/radarr.nix (modified, +8/-2)
  • hosts/Nexus/services/sonarr.nix (modified, +8/-2)
  • hosts/Nexus/snapraid.nix (modified, +5/-1)
  • hosts/WorkLaptop/users/matteo.pacini/browser.nix (modified, +0/-1)
  • modules/home-manager/atuin.nix (modified, +3/-1)
  • modules/home-manager/git.nix (modified, +41/-39)
  • modules/home-manager/shell-tools.nix (modified, +28/-26)
  • modules/home-manager/ssh.nix (modified, +87/-85)
  • modules/home-manager/zsh.nix (modified, +35/-32)
  • modules/nixos/nix-core.nix (modified, +25/-23)
  • overlays/shared.nix (added, +17/-0)

Code Example

# 1. Install proxy to capture full API request/response bodies
mkdir /tmp/cc-test && cd /tmp/cc-test
npx -y claude-code-logger@1.0.2 start --port 8000 --log-body --merge-sse

# 2. In another terminal — test with older version
export ANTHROPIC_BASE_URL="http://localhost:8000"
claude-2.1.98 --print "1+1"
# Note cache_creation_input_tokens in response

# 3. Same setup — test with newer version
claude-2.1.100 --print "1+1"
# Note cache_creation_input_tokens in response
RAW_BUFFERClick to expand / collapse

Summary

Claude Code versions 2.1.100 and 2.1.101 consume ~20,000 more cache_creation_input_tokens per request than v2.1.98, despite sending fewer bytes in the request payload. The inflation is server-side and version-specific (likely User-Agent routing).

This is not a billing-only issue — these tokens enter the model's context window and may affect output quality.

Reproduction

# 1. Install proxy to capture full API request/response bodies
mkdir /tmp/cc-test && cd /tmp/cc-test
npx -y [email protected] start --port 8000 --log-body --merge-sse

# 2. In another terminal — test with older version
export ANTHROPIC_BASE_URL="http://localhost:8000"
claude-2.1.98 --print "1+1"
# Note cache_creation_input_tokens in response

# 3. Same setup — test with newer version
claude-2.1.100 --print "1+1"
# Note cache_creation_input_tokens in response

Each test is a cold cache, single API call, no session state (--print mode). Same machine, same project, same account, minutes apart.

Evidence

VersionContent-Length (bytes)cache_creation_input_tokenscache_readTotal
v2.1.98169,51449,726049,726
v2.1.100168,536 (-978 B)69,922069,922
v2.1.101171,903 (+2,389 B)~72,0000~72,000

v2.1.100 sends 978 fewer bytes than v2.1.98 but is billed 20,196 MORE tokens. This rules out any client-side payload difference — the inflation happens server-side after the request is received.

Cross-account test (same v2.1.98, two different Max accounts): delta < 500 tokens = noise. Not account-specific.

Interactive mode confirms

/tmp/claude-cache-*.json files across 40+ sessions show bimodal distribution:

  • Group A: ~50K (cache_read=0, cache_create=~50K) — matches v2.1.98 --print
  • Group B: ~71K (cache_read=0, cache_create=~71K) — matches v2.1.100+ --print

Some sessions start cold at 71K with cache_read=0, confirming this is not accumulated cache — it's the baseline.

Quality concern

The 20K extra tokens are cache_creation_input_tokens — this means they enter the model's context window, not just the billing ledger. If the server injects additional content invisible to the user:

  • Instruction dilution: Hidden system content may compete with user-provided CLAUDE.md rules, leading to inconsistent agent behavior
  • Reduced effective context: 20K fewer tokens available for actual conversation history — in long sessions this compounds with every turn
  • Unverifiable behavior: Users cannot audit what the model actually "sees" vs what they sent — makes debugging agent misbehavior significantly harder

We observed qualitative differences between v2.1.98 and v2.1.100+ sessions (instruction adherence, tool selection accuracy), but cannot isolate whether this is caused by the extra hidden tokens or other version changes. The lack of transparency makes it impossible to tell.

Additional findings

  • After /login account switch, statusline can jump ±20K — this is cache invalidation (new account_uuid = new cache key), not a billing difference between accounts.
  • 56 count_tokens burst calls observed immediately after first interactive prompt — inflates the apparent "first visible" context number in statusline.
  • Current v2.1.104 is untested — the investigation was done on v2.1.98/100/101. We cannot confirm whether v2.1.104 carries the same inflation.

Impact

~20K extra tokens per session = ~40% overhead on a clean project. On Max plan with usage limits, this means hitting the 5-hour cap significantly faster on newer CC versions.

Combined with the quality concern: users are paying more for potentially degraded output, with no visibility into why.

Workaround

Downgrade to v2.1.98 or earlier. Check ~/.local/share/claude/versions/ for available binaries, or use npx [email protected].

Note: Auto-updates may overwrite older versions. v2.1.98 is no longer retained in the local versions directory after v2.1.104 installed.

Related

  • #45515 — Original phantom token report (account-specific delta, now understood as cache invalidation artifact)
  • Reddit investigation with full proxy data: https://www.reddit.com/r/ClaudeCode/comments/1sj10ou/
  • Community reports of rapid limit exhaustion correlate with v2.1.100+ rollout timeline

Environment

  • Claude Code: v2.1.98 / v2.1.100 / v2.1.101 (tested all three)
  • OS: Linux (WSL2), Windows 11
  • Plan: Max (5x)
  • Install method: native
  • Measurement: HTTP proxy capturing full request/response bodies

extent analysis

TL;DR

Downgrade to Claude Code version 2.1.98 to avoid the ~20,000 extra cache_creation_input_tokens per request.

Guidance

  • Verify the issue by running the reproduction steps provided, comparing the cache_creation_input_tokens between versions 2.1.98 and 2.1.100/2.1.101.
  • Check the ~/.local/share/claude/versions/ directory for available older binaries to downgrade to version 2.1.98.
  • Be aware that auto-updates may overwrite older versions, and consider using npx [email protected] as a workaround.
  • Monitor the issue for updates on newer versions, such as 2.1.104, which has not been tested.

Example

No code snippet is provided as the issue is related to version-specific behavior and not a code-level problem.

Notes

The root cause of the issue is likely related to server-side changes in versions 2.1.100 and 2.1.101, possibly due to User-Agent routing. The exact cause is not specified, but downgrading to version 2.1.98 resolves the issue.

Recommendation

Apply the workaround by downgrading to version 2.1.98, as it is the only known way to avoid the extra cache_creation_input_tokens and potential quality concerns.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING