claude-code - 💡(How to fix) Fix advisor() tool inflates reported input tokens by forwarding full transcript, triggering premature auto-compaction on extended context models [1 participants]

claude-code2026-04-25 00:10:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#53065•Fetched 2026-04-25 06:13:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

AttacktheDPoint-com

Participants

AttacktheDPoint-com

Timeline (top)

labeled ×5

When advisor() is called, the full conversation transcript is forwarded to a second model (currently claude-opus-4-7). The token usage from both the main executor and the advisor sub-inference are summed in the top-level usage fields. If Claude Code's auto-compaction logic uses these summed totals, the advisor call effectively doubles the apparent context usage, triggering compaction when the main model's actual context is only ~50% full.

Root Cause

Main executor context: ~513K tokens (well within 1M window)
Advisor receives full transcript: ~701K tokens (separate model, separate inference)
Reported total: ~1,028K tokens (sum of both)
Auto-compaction threshold: likely ~95% of 1M ≈ 950K
Result: compaction fires at 513K actual context because 1,028K > threshold

Fix Action

Workaround

Setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=98 and/or using a PreCompact hook that blocks auto-compaction:

{
  "hooks": {
    "PreCompact": [{
      "hooks": [{
        "type": "command",
        "command": "bash -c 'INPUT=$(cat); TRIGGER=$(echo \"$INPUT\" | python3 -c \"import sys,json; print(json.load(sys.stdin).get(\\\"trigger\\\",\\\"unknown\\\"))\" 2>/dev/null); [ \"$TRIGGER\" = \"auto\" ] && echo \"{\\\"decision\\\":\\\"block\\\"}\" && exit 2; exit 0'",
        "timeout": 3
      }]
    }]
  }
}

This blocks all auto-compaction and relies on manual /compact. Not ideal but prevents the advisor from triggering premature compaction.

Code Example

{
  "input_tokens": 4,
  "cache_creation_input_tokens": 1731,
  "cache_read_input_tokens": 1027054,
  "iterations": [
    {
      "type": "message",
      "input_tokens": 3,
      "cache_read_input_tokens": 513353,
      "cache_creation_input_tokens": 348,
      "output_tokens": 35
    },
    {
      "type": "advisor_message",
      "model": "claude-opus-4-7",
      "input_tokens": 701354,
      "cache_read_input_tokens": 0,
      "cache_creation_input_tokens": 0,
      "output_tokens": 5672
    },
    {
      "type": "message",
      "input_tokens": 1,
      "cache_read_input_tokens": 513701,
      "cache_creation_input_tokens": 1383,
      "output_tokens": 544
    }
  ]
}

---

{
  "hooks": {
    "PreCompact": [{
      "hooks": [{
        "type": "command",
        "command": "bash -c 'INPUT=$(cat); TRIGGER=$(echo \"$INPUT\" | python3 -c \"import sys,json; print(json.load(sys.stdin).get(\\\"trigger\\\",\\\"unknown\\\"))\" 2>/dev/null); [ \"$TRIGGER\" = \"auto\" ] && echo \"{\\\"decision\\\":\\\"block\\\"}\" && exit 2; exit 0'",
        "timeout": 3
      }]
    }]
  }
}

RAW_BUFFERClick to expand / collapse

Summary

Reproduction

Start a Claude Code session with claude-opus-4-6[1m] (1M context)
Work normally until the main context reaches ~400K-500K input tokens
Call advisor()
Observe: auto-compaction fires immediately after the advisor call returns

Evidence from session JSONL

Session ID: c3a29188-290f-4af2-be48-8f6fb6929111 Model: claude-opus-4-6[1m] Project: CriticalSkip

Token progression at the compaction boundary

JSONL line	Total reported input	Notes
1972	513,354	Normal turn, main context only
1979	1,028,789	advisor() called — reported total doubled
1994	—	`"Conversation compacted"` system message
2012	35,693	Post-compaction context (wiped to ~36K)

Iteration breakdown for the advisor turn (line 1979)

{
  "input_tokens": 4,
  "cache_creation_input_tokens": 1731,
  "cache_read_input_tokens": 1027054,
  "iterations": [
    {
      "type": "message",
      "input_tokens": 3,
      "cache_read_input_tokens": 513353,
      "cache_creation_input_tokens": 348,
      "output_tokens": 35
    },
    {
      "type": "advisor_message",
      "model": "claude-opus-4-7",
      "input_tokens": 701354,
      "cache_read_input_tokens": 0,
      "cache_creation_input_tokens": 0,
      "output_tokens": 5672
    },
    {
      "type": "message",
      "input_tokens": 1,
      "cache_read_input_tokens": 513701,
      "cache_creation_input_tokens": 1383,
      "output_tokens": 544
    }
  ]
}

Key observation: The main model's actual context was 513-515K tokens (iterations 1 and 3). The advisor sub-inference consumed 701K tokens (iteration 2 — the full transcript forwarded uncached). The top-level cache_read_input_tokens reports 1,027,054 — the sum across all iterations — making it appear the context is at 1M when only half is actually used by the executor.

The math

Main executor context: ~513K tokens (well within 1M window)
Advisor receives full transcript: ~701K tokens (separate model, separate inference)
Reported total: ~1,028K tokens (sum of both)
Auto-compaction threshold: likely ~95% of 1M ≈ 950K
Result: compaction fires at 513K actual context because 1,028K > threshold

Expected behavior

Advisor sub-inference tokens should not count toward the auto-compaction threshold. The advisor is a separate model call with its own context window. The executor's context at 513K is well within the 1M budget and should not trigger compaction.

Actual behavior

The summed total (executor + advisor) exceeds the compaction threshold, and auto-compaction fires immediately. The session loses ~500K tokens of working context unnecessarily.

Impact

On extended context models (1M): Any advisor call past ~400K main context will trigger compaction, since 400K main + ~550K advisor = ~950K ≈ threshold
Effectively halves usable context: The 1M context window becomes ~400-450K when advisor is used, because the remaining ~550K is reserved for the advisor's copy of the transcript
Inconsistent: Early-session advisor calls work fine. Late-session calls compact. Users experience this as random/intermittent compaction
Counterproductive: The advisor is most valuable late in a session (complex decisions with full context), but that's exactly when it triggers compaction and destroys the context it was supposed to help with

Workaround

Setting CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=98 and/or using a PreCompact hook that blocks auto-compaction:

{
  "hooks": {
    "PreCompact": [{
      "hooks": [{
        "type": "command",
        "command": "bash -c 'INPUT=$(cat); TRIGGER=$(echo \"$INPUT\" | python3 -c \"import sys,json; print(json.load(sys.stdin).get(\\\"trigger\\\",\\\"unknown\\\"))\" 2>/dev/null); [ \"$TRIGGER\" = \"auto\" ] && echo \"{\\\"decision\\\":\\\"block\\\"}\" && exit 2; exit 0'",
        "timeout": 3
      }]
    }]
  }
}

This blocks all auto-compaction and relies on manual /compact. Not ideal but prevents the advisor from triggering premature compaction.

Related issues

#34332 — Opus 4.6 (1M context): autocompact triggers at ~76K tokens
#50204 — Auto-compact triggers prematurely with extended context models
#15377 — Conversations compacting prematurely at ~65% token capacity
#49994 — Sessions using advisor() become permanently unrecoverable (expired encrypted payloads)
#42647 — High token burn due to redundant context resubmission in compaction pipeline

Suggested fix

When calculating context usage for the auto-compaction threshold, use only the executor's input_tokens + cache_read_input_tokens + cache_creation_input_tokens from type: "message" iterations. Exclude type: "advisor_message" iterations from the calculation. The advisor tool documentation already states that "top-level max_tokens applies to executor output only" and that "the advisor's tokens do not draw from any task budget applied to the executor" — the compaction logic should follow the same principle.

Environment

Claude Code version: 2.1.111
Model: claude-opus-4-6[1m]
Advisor model: claude-opus-4-7
Platform: Windows 10 Pro
Settings: advisorModel: "opus", effortLevel: "medium"

extent analysis

TL;DR

Exclude advisor sub-inference tokens from the auto-compaction threshold calculation to prevent premature compaction.

Guidance

Review the calculation of context usage for auto-compaction and ensure it only includes the executor's tokens, excluding advisor sub-inference tokens.
Verify that the type: "advisor_message" iterations are not included in the compaction threshold calculation.
Consider implementing a fix similar to the suggested one, where only the executor's input_tokens + cache_read_input_tokens + cache_creation_input_tokens from type: "message" iterations are used for the auto-compaction threshold.
Test the fix with different scenarios, including various advisor calls and context sizes, to ensure it resolves the issue.

Example

No code snippet is provided as the issue is more related to the logic of the auto-compaction threshold calculation rather than a specific code implementation.

Notes

The provided workaround using CLAUDE_AUTOCOMPACT_PCT_OVERRIDE and a PreCompact hook can help mitigate the issue but may not be ideal as it blocks all auto-compaction. A proper fix should address the root cause of the problem.

Recommendation

Apply the suggested fix to exclude advisor sub-inference tokens from the auto-compaction threshold calculation, as it directly addresses the root cause of the issue and aligns with the principle that the advisor's tokens do not draw from the executor's task budget.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

claude-code - 💡(How to fix) Fix advisor() tool inflates reported input tokens by forwarding full transcript, triggering premature auto-compaction on extended context models [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Reproduction

Evidence from session JSONL

Token progression at the compaction boundary

Iteration breakdown for the advisor turn (line 1979)

The math

Expected behavior

Actual behavior

Impact

Workaround

Related issues

Suggested fix

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING