claude-code - 💡(How to fix) Fix [BUG] Token estimation vs API-reported tokens diverge significantly for cached context [1 comments, 2 participants]

claude-code2026-04-10 21:22:56

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#46422•Fetched 2026-04-11 06:20:43

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Fearvox

Participants

Fearvox

github-actions[bot]

Timeline (top)

labeled ×2commented ×1

Root Cause

Root Cause Hypothesis

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this has not been reported yet
This is a single bug report
I am using the latest version of Claude Code

What's Wrong?

Claude Code's local token estimation (used for AutoCompact triggering, context bar display, and status line) diverges significantly from the API-reported token counts in the response usage field, particularly for sessions with high cache_read_input_tokens and cache_creation_input_tokens.

This causes:

AutoCompact fires at wrong threshold — local estimation shows e.g. "80% full" but API has already consumed much more of the context window
Context bar misreads — user sees "X% used" but the real percentage is different
Silent context overruns — session appears fine locally but hits API limits unexpectedly

Reproduction

Start a long session with multiple rounds of tool use
Note the local context bar percentage vs the API-reported input_tokens in responses
Compare: local estimation may show 60% while API shows 85%+ of effective window consumed

Technical Detail

The divergence appears in how Claude Code estimates tokens vs how the API counts them:

Local estimation (from tokenCountWithEstimation / context bar):

Counts raw token content (messages + tools + system prompt)
Does not always correctly account for cache hit savings
Shows a number that may not reflect actual API context consumption

API-reported (from response usage field):

input_tokens: fresh tokens consumed this request
cache_read_input_tokens: tokens served from cache (5-minute ephemeral cache)
cache_creation_input_tokens: tokens written to cache this request
Both cache fields are not subtracted from the effective context window in local estimation

Result: A session with 100K cache_read_input_tokens might show as "60% full" locally while actually having consumed far more of the API's context budget.

Root Cause Hypothesis

The gap is in how cache_read_input_tokens are treated in getEffectiveContextWindowSize(). The API counts cache reads against the context window differently than Claude Code's local estimation does.

Impact

AutoCompact fires at wrong time — too late or too early depending on cache ratio
Context bar misleading — users cannot trust the displayed percentage
Budget surprises — sessions unexpectedly hit limits when local estimation said there was headroom

Suggested Labels

area:core, area:cost, bug

extent analysis

TL;DR

Adjust the local token estimation to correctly account for cache hit savings by incorporating cache_read_input_tokens and cache_creation_input_tokens into the calculation.

Guidance

Review the tokenCountWithEstimation function to ensure it accurately reflects the API's context consumption, including cache hits and misses.
Modify the getEffectiveContextWindowSize() function to subtract cache_read_input_tokens and cache_creation_input_tokens from the effective context window, aligning with the API's counting method.
Verify the changes by comparing local estimations with API-reported token counts in various scenarios, including high cache usage.
Consider adding logging or debugging statements to monitor the accuracy of local token estimations and identify potential discrepancies.

Example

// Pseudocode example, actual implementation may vary
function getEffectiveContextWindowSize(apiResponse) {
  const cacheReadTokens = apiResponse.cache_read_input_tokens;
  const cacheCreationTokens = apiResponse.cache_creation_input_tokens;
  const effectiveWindowSize = apiResponse.input_tokens - cacheReadTokens - cacheCreationTokens;
  return effectiveWindowSize;
}

Notes

The provided solution assumes that the API's counting method is the authoritative source for token consumption. However, the actual implementation may require additional considerations, such as handling edge cases or optimizing performance.

Recommendation

Apply the workaround by adjusting the local token estimation to match the API's counting method, as this will provide a more accurate representation of context consumption and help prevent unexpected budget surprises.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #container setup #orchestration issue #cache issue #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Token estimation vs API-reported tokens diverge significantly for cached context [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root Cause Hypothesis

Preflight Checklist

What's Wrong?

Reproduction

Technical Detail

Root Cause Hypothesis

Impact

Suggested Labels

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Token estimation vs API-reported tokens diverge significantly for cached context [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root Cause Hypothesis

Preflight Checklist

What's Wrong?

Reproduction

Technical Detail

Root Cause Hypothesis

Impact

Suggested Labels

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING