claude-code - 💡(How to fix) Fix [Critical] advisor() double-counts its sub-inference into top-level context usage → silent mid-task context destruction + uncapped uncached re-billing; spec violation; advisor double-count open & unfixed for 5+ weeks (#53065) in a class reported since Feb (#24976)

Root Cause

The auto-compaction-misaccounting class has been reported since February. The advisor-specific double-count (#53065) has sat open with 8 comments and no fix since April 25. This issue re-files it with a complete evidence table (main + subagents), the spec contradiction, and hard billing numbers, because the existing report has not moved.

Fix Action

Fix / Workaround

Calling advisor() on an extended-context ([1m]) session does two damaging things at once, with no working user-side mitigation:

Destroys working context mid-task. The advisor turn reports ≈2× the executor's real context (it double-counts its own sub-inference into the top-level usage). The auto-compaction threshold reads that phantom number and force-compacts — wiping hundreds of thousands of tokens of live working state in the middle of a task. One reporter saw 513K → wiped to 36K (#53065). With auto-compaction turned off as a workaround, the session instead hard-wedges at "Context limit reached" on every prompt (including /exit), which is worse.
Bills for it, twice. Each advisor call re-forwards the full transcript uncached (advisor_message.cache_read_input_tokens = 0), and Claude Code does not expose the advisor caching option, so there is no way to amortize it. The phantom compaction it triggers then forces a context rebuild — paying for the same tokens again.

Code Example

top-level: input_tokens=4  cache_read_input_tokens=1,044,130  cache_creation=10,031
iterations:
  [0] type=message          input=2        cache_read=519,618    out=3,681
  [1] type=advisor_message  input=529,512  cache_read=0          out=11,450   (model=claude-opus-4-8)
  [2] type=message          input=2        cache_read=524,512    out=3,626

Severity: critical — silent data destruction + silent overbilling, on the advisor tool's own advertised use case

Calling advisor() on an extended-context ([1m]) session does two damaging things at once, with no working user-side mitigation:

Destroys working context mid-task. The advisor turn reports ≈2× the executor's real context (it double-counts its own sub-inference into the top-level usage). The auto-compaction threshold reads that phantom number and force-compacts — wiping hundreds of thousands of tokens of live working state in the middle of a task. One reporter saw 513K → wiped to 36K (#53065). With auto-compaction turned off as a workaround, the session instead hard-wedges at "Context limit reached" on every prompt (including /exit), which is worse.
Bills for it, twice. Each advisor call re-forwards the full transcript uncached (advisor_message.cache_read_input_tokens = 0), and Claude Code does not expose the advisor caching option, so there is no way to amortize it. The phantom compaction it triggers then forces a context rebuild — paying for the same tokens again.

This is squarely the advisor tool's advertised use case — "long-horizon agentic workloads (coding agents, computer use, multi-step research)" — i.e. exactly the long sessions where executor context is large and this bug is guaranteed to fire. The tool sabotages the workload it was built for.

It is also deterministic (not a race), affects both the main session and subagents, and contradicts the documented spec.

This bug class has been open and unfixed for months

Issue	Filed	State	Topic
#24976	2026-02-11	closed	"Context Limit Reached" / auto-compact misaccounting (~3.5 months ago)
#50204	2026-04-17	closed	premature auto-compact on extended-context models
#53065	2026-04-25	OPEN, 8 comments	exact same advisor double-count — unfixed 5+ weeks
#59656	2026-05-16	OPEN	1M context, repeated auto-compaction, "unusable"

Environment

Claude Code 2.1.156
Executor claude-opus-4-8, advisor claude-opus-4-8, 1M context (context-1m-2025-08-07)
advisorModel: "opus", auto-compaction at the 1M window

Contradicts the documented spec

The advisor tool docs state, verbatim:

Top-level usage fields reflect executor tokens only. Advisor tokens are not rolled into the top-level totals because they are billed at a different rate. Top-level input_tokens and cache_read_input_tokens reflect the first executor iteration only; subsequent executor iterations' inputs are not re-summed because they include prior output tokens.

Per spec, top-level cache_read_input_tokens must equal the first executor iteration (≈ base context). Observed: it equals the sum of all executor message iterations (≈ 2× base). Whatever feeds the auto-compaction threshold is summing iterations the docs explicitly say are not summed.

Evidence — main session (6 advisor calls, one session)

time	executor base (prior turn)	advisor turn reported ctx	ratio	result
03:57	286,927	578,657	2.02×	survived
04:23	350,947	721,476	2.06×	survived
10:16	519,618	1,054,165	2.03×	force auto-compacted
12:02	518,457	1,066,713	2.06×	force auto-compacted

The turn immediately after each advisor call drops back to base (~294K / ~364K), proving the spike is a transient accounting artifact — the executor's real context never doubled.

Iteration breakdown (10:16 turn)

top-level: input_tokens=4  cache_read_input_tokens=1,044,130  cache_creation=10,031
iterations:
  [0] type=message          input=2        cache_read=519,618    out=3,681
  [1] type=advisor_message  input=529,512  cache_read=0          out=11,450   (model=claude-opus-4-8)
  [2] type=message          input=2        cache_read=524,512    out=3,626

1,044,130 == iter0.cache_read (519,618) + iter2.cache_read (524,512). The two executor message iterations are summed into top-level cache_read_input_tokens. The executor's actual context is ~520K (iter0 / iter2 individually) — comfortably within the 1M budget. It is force-compacted anyway.

Evidence — subagents reproduce it identically

Scanned 1,977 subagent transcripts: 59 advisor calls, 54 double (≈2.0×, median 2.013×). Example: base 157,459 → advisor turn 321,357 (2.04×); iter0 160,159 + iter2 161,198 = 321,357. The 5 non-doubling cases are advisor errors (single executor iteration, no second read). Not main-session-specific — every context that calls advisor is affected.

Billing numbers

In the single session above, 6 advisor calls = 2,125,599 advisor input tokens, all uncached (cache_read=0), billed at the advisor model's rate: 212,407 / 289,299 / 360,921 / 529,512 / 198,524 / 534,936. The advisor caching parameter that would amortize this is not exposed by Claude Code. Each forced compaction then rebuilds context = the same tokens paid again. Users are silently overbilled with no lever to stop it.

Suggested fix

Compute the auto-compaction figure from executor type: "message" iterations using the documented rule (first executor iteration's input_tokens + cache_read_input_tokens + cache_creation_input_tokens) and exclude type: "advisor_message" iterations, matching the documented statement that advisor tokens do not roll into top-level totals and do not draw from the executor's budget. Until fixed, please also expose the advisor caching option so the uncached re-forwarding can be amortized.

Reproduction

[1m] Opus session, advisorModel set; grow executor context past ~500K.
Trigger an advisor() call.
In the session JSONL, compare the advisor turn's top-level cache_read_input_tokens against the prior turn — it ≈ doubles, and auto-compaction fires while the executor's real context is ~50%.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

Recommended Tools

GitHub issue graph ai analysis