claude-code - 💡(How to fix) Fix [Critical] advisor() double-counts its sub-inference into top-level context usage → silent mid-task context destruction + uncapped uncached re-billing; spec violation; advisor double-count open & unfixed for 5+ weeks (#53065) in a class reported since Feb (#24976)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

The auto-compaction-misaccounting class has been reported since February. The advisor-specific double-count (#53065) has sat open with 8 comments and no fix since April 25. This issue re-files it with a complete evidence table (main + subagents), the spec contradiction, and hard billing numbers, because the existing report has not moved.

Fix Action

Fix / Workaround

Calling advisor() on an extended-context ([1m]) session does two damaging things at once, with no working user-side mitigation:

  1. Destroys working context mid-task. The advisor turn reports ≈2× the executor's real context (it double-counts its own sub-inference into the top-level usage). The auto-compaction threshold reads that phantom number and force-compacts — wiping hundreds of thousands of tokens of live working state in the middle of a task. One reporter saw 513K → wiped to 36K (#53065). With auto-compaction turned off as a workaround, the session instead hard-wedges at "Context limit reached" on every prompt (including /exit), which is worse.
  2. Bills for it, twice. Each advisor call re-forwards the full transcript uncached (advisor_message.cache_read_input_tokens = 0), and Claude Code does not expose the advisor caching option, so there is no way to amortize it. The phantom compaction it triggers then forces a context rebuild — paying for the same tokens again.

Code Example

top-level: input_tokens=4  cache_read_input_tokens=1,044,130  cache_creation=10,031
iterations:
  [0] type=message          input=2        cache_read=519,618    out=3,681
  [1] type=advisor_message  input=529,512  cache_read=0          out=11,450   (model=claude-opus-4-8)
  [2] type=message          input=2        cache_read=524,512    out=3,626
RAW_BUFFERClick to expand / collapse

Severity: critical — silent data destruction + silent overbilling, on the advisor tool's own advertised use case

Calling advisor() on an extended-context ([1m]) session does two damaging things at once, with no working user-side mitigation:

  1. Destroys working context mid-task. The advisor turn reports ≈2× the executor's real context (it double-counts its own sub-inference into the top-level usage). The auto-compaction threshold reads that phantom number and force-compacts — wiping hundreds of thousands of tokens of live working state in the middle of a task. One reporter saw 513K → wiped to 36K (#53065). With auto-compaction turned off as a workaround, the session instead hard-wedges at "Context limit reached" on every prompt (including /exit), which is worse.
  2. Bills for it, twice. Each advisor call re-forwards the full transcript uncached (advisor_message.cache_read_input_tokens = 0), and Claude Code does not expose the advisor caching option, so there is no way to amortize it. The phantom compaction it triggers then forces a context rebuild — paying for the same tokens again.

This is squarely the advisor tool's advertised use case — "long-horizon agentic workloads (coding agents, computer use, multi-step research)" — i.e. exactly the long sessions where executor context is large and this bug is guaranteed to fire. The tool sabotages the workload it was built for.

It is also deterministic (not a race), affects both the main session and subagents, and contradicts the documented spec.

This bug class has been open and unfixed for months

IssueFiledStateTopic
#249762026-02-11closed"Context Limit Reached" / auto-compact misaccounting (~3.5 months ago)
#502042026-04-17closedpremature auto-compact on extended-context models
#530652026-04-25OPEN, 8 commentsexact same advisor double-count — unfixed 5+ weeks
#596562026-05-16OPEN1M context, repeated auto-compaction, "unusable"

The auto-compaction-misaccounting class has been reported since February. The advisor-specific double-count (#53065) has sat open with 8 comments and no fix since April 25. This issue re-files it with a complete evidence table (main + subagents), the spec contradiction, and hard billing numbers, because the existing report has not moved.

Environment

  • Claude Code 2.1.156
  • Executor claude-opus-4-8, advisor claude-opus-4-8, 1M context (context-1m-2025-08-07)
  • advisorModel: "opus", auto-compaction at the 1M window

Contradicts the documented spec

The advisor tool docs state, verbatim:

Top-level usage fields reflect executor tokens only. Advisor tokens are not rolled into the top-level totals because they are billed at a different rate. Top-level input_tokens and cache_read_input_tokens reflect the first executor iteration only; subsequent executor iterations' inputs are not re-summed because they include prior output tokens.

Per spec, top-level cache_read_input_tokens must equal the first executor iteration (≈ base context). Observed: it equals the sum of all executor message iterations (≈ 2× base). Whatever feeds the auto-compaction threshold is summing iterations the docs explicitly say are not summed.

Evidence — main session (6 advisor calls, one session)

timeexecutor base (prior turn)advisor turn reported ctxratioresult
03:57286,927578,6572.02×survived
04:23350,947721,4762.06×survived
10:16519,6181,054,1652.03×force auto-compacted
12:02518,4571,066,7132.06×force auto-compacted

The turn immediately after each advisor call drops back to base (~294K / ~364K), proving the spike is a transient accounting artifact — the executor's real context never doubled.

Iteration breakdown (10:16 turn)

top-level: input_tokens=4  cache_read_input_tokens=1,044,130  cache_creation=10,031
iterations:
  [0] type=message          input=2        cache_read=519,618    out=3,681
  [1] type=advisor_message  input=529,512  cache_read=0          out=11,450   (model=claude-opus-4-8)
  [2] type=message          input=2        cache_read=524,512    out=3,626

1,044,130 == iter0.cache_read (519,618) + iter2.cache_read (524,512). The two executor message iterations are summed into top-level cache_read_input_tokens. The executor's actual context is ~520K (iter0 / iter2 individually) — comfortably within the 1M budget. It is force-compacted anyway.

Evidence — subagents reproduce it identically

Scanned 1,977 subagent transcripts: 59 advisor calls, 54 double (≈2.0×, median 2.013×). Example: base 157,459 → advisor turn 321,357 (2.04×); iter0 160,159 + iter2 161,198 = 321,357. The 5 non-doubling cases are advisor errors (single executor iteration, no second read). Not main-session-specific — every context that calls advisor is affected.

Billing numbers

In the single session above, 6 advisor calls = 2,125,599 advisor input tokens, all uncached (cache_read=0), billed at the advisor model's rate: 212,407 / 289,299 / 360,921 / 529,512 / 198,524 / 534,936. The advisor caching parameter that would amortize this is not exposed by Claude Code. Each forced compaction then rebuilds context = the same tokens paid again. Users are silently overbilled with no lever to stop it.

Suggested fix

Compute the auto-compaction figure from executor type: "message" iterations using the documented rule (first executor iteration's input_tokens + cache_read_input_tokens + cache_creation_input_tokens) and exclude type: "advisor_message" iterations, matching the documented statement that advisor tokens do not roll into top-level totals and do not draw from the executor's budget. Until fixed, please also expose the advisor caching option so the uncached re-forwarding can be amortized.

Reproduction

  1. [1m] Opus session, advisorModel set; grow executor context past ~500K.
  2. Trigger an advisor() call.
  3. In the session JSONL, compare the advisor turn's top-level cache_read_input_tokens against the prior turn — it ≈ doubles, and auto-compaction fires while the executor's real context is ~50%.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [Critical] advisor() double-counts its sub-inference into top-level context usage → silent mid-task context destruction + uncapped uncached re-billing; spec violation; advisor double-count open & unfixed for 5+ weeks (#53065) in a class reported since Feb (#24976)