claude-code - 💡(How to fix) Fix advisor() tool inflates reported input tokens by forwarding full transcript, triggering premature auto-compaction on extended context models [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#53065Fetched 2026-04-25 06:13:17
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Timeline (top)
labeled ×5

When advisor() is called, the full conversation transcript is forwarded to a second model (currently claude-opus-4-7). The token usage from both the main executor and the advisor sub-inference are summed in the top-level usage fields. If Claude Code's auto-compaction logic uses these summed totals, the advisor call effectively doubles the apparent context usage, triggering compaction when the main model's actual context is only ~50% full.

Root Cause

  • Main executor context: ~513K tokens (well within 1M window)
  • Advisor receives full transcript: ~701K tokens (separate model, separate inference)
  • Reported total: ~1,028K tokens (sum of both)
  • Auto-compaction threshold: likely ~95% of 1M ≈ 950K
  • Result: compaction fires at 513K actual context because 1,028K > threshold

Fix Action

Workaround

Setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=98 and/or using a PreCompact hook that blocks auto-compaction:

{
  "hooks": {
    "PreCompact": [{
      "hooks": [{
        "type": "command",
        "command": "bash -c 'INPUT=$(cat); TRIGGER=$(echo \"$INPUT\" | python3 -c \"import sys,json; print(json.load(sys.stdin).get(\\\"trigger\\\",\\\"unknown\\\"))\" 2>/dev/null); [ \"$TRIGGER\" = \"auto\" ] && echo \"{\\\"decision\\\":\\\"block\\\"}\" && exit 2; exit 0'",
        "timeout": 3
      }]
    }]
  }
}

This blocks all auto-compaction and relies on manual /compact. Not ideal but prevents the advisor from triggering premature compaction.

Code Example

{
  "input_tokens": 4,
  "cache_creation_input_tokens": 1731,
  "cache_read_input_tokens": 1027054,
  "iterations": [
    {
      "type": "message",
      "input_tokens": 3,
      "cache_read_input_tokens": 513353,
      "cache_creation_input_tokens": 348,
      "output_tokens": 35
    },
    {
      "type": "advisor_message",
      "model": "claude-opus-4-7",
      "input_tokens": 701354,
      "cache_read_input_tokens": 0,
      "cache_creation_input_tokens": 0,
      "output_tokens": 5672
    },
    {
      "type": "message",
      "input_tokens": 1,
      "cache_read_input_tokens": 513701,
      "cache_creation_input_tokens": 1383,
      "output_tokens": 544
    }
  ]
}

---

{
  "hooks": {
    "PreCompact": [{
      "hooks": [{
        "type": "command",
        "command": "bash -c 'INPUT=$(cat); TRIGGER=$(echo \"$INPUT\" | python3 -c \"import sys,json; print(json.load(sys.stdin).get(\\\"trigger\\\",\\\"unknown\\\"))\" 2>/dev/null); [ \"$TRIGGER\" = \"auto\" ] && echo \"{\\\"decision\\\":\\\"block\\\"}\" && exit 2; exit 0'",
        "timeout": 3
      }]
    }]
  }
}
RAW_BUFFERClick to expand / collapse

Summary

When advisor() is called, the full conversation transcript is forwarded to a second model (currently claude-opus-4-7). The token usage from both the main executor and the advisor sub-inference are summed in the top-level usage fields. If Claude Code's auto-compaction logic uses these summed totals, the advisor call effectively doubles the apparent context usage, triggering compaction when the main model's actual context is only ~50% full.

Reproduction

  1. Start a Claude Code session with claude-opus-4-6[1m] (1M context)
  2. Work normally until the main context reaches ~400K-500K input tokens
  3. Call advisor()
  4. Observe: auto-compaction fires immediately after the advisor call returns

Evidence from session JSONL

Session ID: c3a29188-290f-4af2-be48-8f6fb6929111 Model: claude-opus-4-6[1m] Project: CriticalSkip

Token progression at the compaction boundary

JSONL lineTotal reported inputNotes
1972513,354Normal turn, main context only
19791,028,789advisor() called — reported total doubled
1994"Conversation compacted" system message
201235,693Post-compaction context (wiped to ~36K)

Iteration breakdown for the advisor turn (line 1979)

{
  "input_tokens": 4,
  "cache_creation_input_tokens": 1731,
  "cache_read_input_tokens": 1027054,
  "iterations": [
    {
      "type": "message",
      "input_tokens": 3,
      "cache_read_input_tokens": 513353,
      "cache_creation_input_tokens": 348,
      "output_tokens": 35
    },
    {
      "type": "advisor_message",
      "model": "claude-opus-4-7",
      "input_tokens": 701354,
      "cache_read_input_tokens": 0,
      "cache_creation_input_tokens": 0,
      "output_tokens": 5672
    },
    {
      "type": "message",
      "input_tokens": 1,
      "cache_read_input_tokens": 513701,
      "cache_creation_input_tokens": 1383,
      "output_tokens": 544
    }
  ]
}

Key observation: The main model's actual context was 513-515K tokens (iterations 1 and 3). The advisor sub-inference consumed 701K tokens (iteration 2 — the full transcript forwarded uncached). The top-level cache_read_input_tokens reports 1,027,054 — the sum across all iterations — making it appear the context is at 1M when only half is actually used by the executor.

The math

  • Main executor context: ~513K tokens (well within 1M window)
  • Advisor receives full transcript: ~701K tokens (separate model, separate inference)
  • Reported total: ~1,028K tokens (sum of both)
  • Auto-compaction threshold: likely ~95% of 1M ≈ 950K
  • Result: compaction fires at 513K actual context because 1,028K > threshold

Expected behavior

Advisor sub-inference tokens should not count toward the auto-compaction threshold. The advisor is a separate model call with its own context window. The executor's context at 513K is well within the 1M budget and should not trigger compaction.

Actual behavior

The summed total (executor + advisor) exceeds the compaction threshold, and auto-compaction fires immediately. The session loses ~500K tokens of working context unnecessarily.

Impact

  • On extended context models (1M): Any advisor call past ~400K main context will trigger compaction, since 400K main + ~550K advisor = ~950K ≈ threshold
  • Effectively halves usable context: The 1M context window becomes ~400-450K when advisor is used, because the remaining ~550K is reserved for the advisor's copy of the transcript
  • Inconsistent: Early-session advisor calls work fine. Late-session calls compact. Users experience this as random/intermittent compaction
  • Counterproductive: The advisor is most valuable late in a session (complex decisions with full context), but that's exactly when it triggers compaction and destroys the context it was supposed to help with

Workaround

Setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=98 and/or using a PreCompact hook that blocks auto-compaction:

{
  "hooks": {
    "PreCompact": [{
      "hooks": [{
        "type": "command",
        "command": "bash -c 'INPUT=$(cat); TRIGGER=$(echo \"$INPUT\" | python3 -c \"import sys,json; print(json.load(sys.stdin).get(\\\"trigger\\\",\\\"unknown\\\"))\" 2>/dev/null); [ \"$TRIGGER\" = \"auto\" ] && echo \"{\\\"decision\\\":\\\"block\\\"}\" && exit 2; exit 0'",
        "timeout": 3
      }]
    }]
  }
}

This blocks all auto-compaction and relies on manual /compact. Not ideal but prevents the advisor from triggering premature compaction.

Related issues

  • #34332 — Opus 4.6 (1M context): autocompact triggers at ~76K tokens
  • #50204 — Auto-compact triggers prematurely with extended context models
  • #15377 — Conversations compacting prematurely at ~65% token capacity
  • #49994 — Sessions using advisor() become permanently unrecoverable (expired encrypted payloads)
  • #42647 — High token burn due to redundant context resubmission in compaction pipeline

Suggested fix

When calculating context usage for the auto-compaction threshold, use only the executor's input_tokens + cache_read_input_tokens + cache_creation_input_tokens from type: "message" iterations. Exclude type: "advisor_message" iterations from the calculation. The advisor tool documentation already states that "top-level max_tokens applies to executor output only" and that "the advisor's tokens do not draw from any task budget applied to the executor" — the compaction logic should follow the same principle.

Environment

  • Claude Code version: 2.1.111
  • Model: claude-opus-4-6[1m]
  • Advisor model: claude-opus-4-7
  • Platform: Windows 10 Pro
  • Settings: advisorModel: "opus", effortLevel: "medium"

extent analysis

TL;DR

Exclude advisor sub-inference tokens from the auto-compaction threshold calculation to prevent premature compaction.

Guidance

  • Review the calculation of context usage for auto-compaction and ensure it only includes the executor's tokens, excluding advisor sub-inference tokens.
  • Verify that the type: "advisor_message" iterations are not included in the compaction threshold calculation.
  • Consider implementing a fix similar to the suggested one, where only the executor's input_tokens + cache_read_input_tokens + cache_creation_input_tokens from type: "message" iterations are used for the auto-compaction threshold.
  • Test the fix with different scenarios, including various advisor calls and context sizes, to ensure it resolves the issue.

Example

No code snippet is provided as the issue is more related to the logic of the auto-compaction threshold calculation rather than a specific code implementation.

Notes

The provided workaround using CLAUDE_AUTOCOMPACT_PCT_OVERRIDE and a PreCompact hook can help mitigate the issue but may not be ideal as it blocks all auto-compaction. A proper fix should address the root cause of the problem.

Recommendation

Apply the suggested fix to exclude advisor sub-inference tokens from the auto-compaction threshold calculation, as it directly addresses the root cause of the issue and aligns with the principle that the advisor's tokens do not draw from the executor's task budget.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Advisor sub-inference tokens should not count toward the auto-compaction threshold. The advisor is a separate model call with its own context window. The executor's context at 513K is well within the 1M budget and should not trigger compaction.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING