claude-code - 💡(How to fix) Fix System prompt size grew ~70K tokens between v2.1.89 and v2.1.96, making sessions unusable without frequent manual /compact [4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#45188Fetched 2026-04-09 08:11:14
View on GitHub
Comments
4
Participants
3
Timeline
9
Reactions
1
Timeline (top)
labeled ×5commented ×4

Between Claude Code 2.1.89 and 2.1.96 (approximately 5 days, early April 2026), the initial system prompt size grew by approximately 70K tokens. This is large enough to make sessions effectively unusable: with auto-compact disabled (as I had it configured at the time), the context would fill before I could complete any non-trivial task, and Claude Code stopped accepting input entirely. I had to introduce a manual /compact workflow as an emergency measure just to keep working.

Error Message

import json from pathlib import Path from collections import defaultdict

proj = Path.home() / ".claude" / "projects" / "<your-project-id>" by_version = defaultdict(list)

for jsonl in proj.glob("*.jsonl"): first_cache = None version = None with jsonl.open(encoding="utf-8") as f: for line in f: try: d = json.loads(line) except json.JSONDecodeError: continue if version is None and d.get("version"): version = d["version"] usage = (d.get("message") or {}).get("usage") or {} cct = usage.get("cache_creation_input_tokens") if cct and first_cache is None: first_cache = cct break if first_cache and version: by_version[version].append(first_cache)

for v in sorted(by_version): vals = sorted(by_version[v]) print(f"{v}: n={len(vals):3d} median={vals[len(vals)//2]:6d} max={max(vals):6d}")

Root Cause

Between Claude Code 2.1.89 and 2.1.96 (approximately 5 days, early April 2026), the initial system prompt size grew by approximately 70K tokens. This is large enough to make sessions effectively unusable: with auto-compact disabled (as I had it configured at the time), the context would fill before I could complete any non-trivial task, and Claude Code stopped accepting input entirely. I had to introduce a manual /compact workflow as an emergency measure just to keep working.

Fix Action

Fix / Workaround

I had auto-compact disabled during this period (I was testing with it off). As the system prompt grew past ~100K+, sessions would fill up before completing any multi-step task. Claude Code stopped accepting new input. I had to run manual /compact continuously as an emergency workaround just to keep the tool usable. This persisted until I re-enabled auto-compact and tuned the threshold.

Code Example

import json
from pathlib import Path
from collections import defaultdict

proj = Path.home() / ".claude" / "projects" / "<your-project-id>"
by_version = defaultdict(list)

for jsonl in proj.glob("*.jsonl"):
    first_cache = None
    version = None
    with jsonl.open(encoding="utf-8") as f:
        for line in f:
            try:
                d = json.loads(line)
            except json.JSONDecodeError:
                continue
            if version is None and d.get("version"):
                version = d["version"]
            usage = (d.get("message") or {}).get("usage") or {}
            cct = usage.get("cache_creation_input_tokens")
            if cct and first_cache is None:
                first_cache = cct
                break
    if first_cache and version:
        by_version[version].append(first_cache)

for v in sorted(by_version):
    vals = sorted(by_version[v])
    print(f"{v}: n={len(vals):3d}  median={vals[len(vals)//2]:6d}  max={max(vals):6d}")
RAW_BUFFERClick to expand / collapse

Summary

Between Claude Code 2.1.89 and 2.1.96 (approximately 5 days, early April 2026), the initial system prompt size grew by approximately 70K tokens. This is large enough to make sessions effectively unusable: with auto-compact disabled (as I had it configured at the time), the context would fill before I could complete any non-trivial task, and Claude Code stopped accepting input entirely. I had to introduce a manual /compact workflow as an emergency measure just to keep working.

Environment

  • Claude Code versions affected: 2.1.89 → 2.1.91 → 2.1.92 → 2.1.96
  • OS: Windows 11 (Git Bash shell)
  • Models: claude-opus-4-6 / claude-sonnet-4-6
  • Setup: standard user install + a few user plugins (LSPs, code-review, feature-dev, hookify)

Measurement Method

I correlated the usage.cache_creation_input_tokens field of the first assistant message in each session JSONL against the version field, across all sessions under ~/.claude/projects/<id>/*.jsonl.

The first assistant message has no cache hit yet, so its cache_creation_input_tokens reflects the full initial system prompt assembled by the binary — independent of any additionalContext injected by UserPromptSubmit hooks.

import json
from pathlib import Path
from collections import defaultdict

proj = Path.home() / ".claude" / "projects" / "<your-project-id>"
by_version = defaultdict(list)

for jsonl in proj.glob("*.jsonl"):
    first_cache = None
    version = None
    with jsonl.open(encoding="utf-8") as f:
        for line in f:
            try:
                d = json.loads(line)
            except json.JSONDecodeError:
                continue
            if version is None and d.get("version"):
                version = d["version"]
            usage = (d.get("message") or {}).get("usage") or {}
            cct = usage.get("cache_creation_input_tokens")
            if cct and first_cache is None:
                first_cache = cct
                break
    if first_cache and version:
        by_version[version].append(first_cache)

for v in sorted(by_version):
    vals = sorted(by_version[v])
    print(f"{v}: n={len(vals):3d}  median={vals[len(vals)//2]:6d}  max={max(vals):6d}")

Observed Data

DateCC VersionMedian first-cacheΔ vs baseline
04-01..04-022.1.89 / 2.1.90~38–52Kbaseline
04-032.1.91~80–88K+40K
04-04..04-072.1.92~85–90K+40K (stable)
04-082.1.96~112–119K+70K total

Two distinct step changes: +40K at 2.1.91, +30K at 2.1.96. Total growth: ~70K in 5 days.

Impact

I had auto-compact disabled during this period (I was testing with it off). As the system prompt grew past ~100K+, sessions would fill up before completing any multi-step task. Claude Code stopped accepting new input. I had to run manual /compact continuously as an emergency workaround just to keep the tool usable. This persisted until I re-enabled auto-compact and tuned the threshold.

Even with auto-compact enabled at the default threshold, the effective usable context has shrunk dramatically: a 70K larger baseline means 70K less room for actual conversation and tool output.

What I Already Ruled Out

  • UserPromptSubmit hooks (additionalContext): affects later cache blocks, not the first cache_creation_input_tokens. Verified by disabling my hook chain — first-cache size unchanged.
  • MCP servers: removed candidate MCPs, re-measured — first-cache size unchanged.
  • Plugin count: removed 5 stale LSP plugins, re-measured — first-cache size unchanged.
  • Session state / bistability: sessions started with /clear show ~50–60K smaller first-cache than sessions resuming work. Both classes show the same version-correlated step changes, so the jumps are not session-state artifacts.

The growth correlates exactly and exclusively with binary version bumps.

Suspected Sources of Growth

Based on inspection of what appears to have changed in the system prompt area between these versions:

  • Skill listing attachment (skill_listing type) — my session shows ~7KB for 47 skills injected on every turn
  • New or expanded injection-defense / safety layers
  • New tool descriptions (Skill tool, browser-automation tool suite, Plugin subagent descriptions)

Asks

  1. Confirm whether the ~70K growth between 2.1.89 and 2.1.96 is intentional and which components account for it.
  2. Consider lazy-loading or deduplicating large fixed blocks (skill listings, agent descriptions, plugin metadata) so they only appear when the relevant capability is actually needed.
  3. Expose first-cache / baseline system-prompt size in /status or a debug flag so users can monitor it without parsing session JSONLs.
  4. Document the expected baseline range per version in the CHANGELOG so users can anticipate and plan around context budget changes.

Reproducibility

Anyone with multiple session JSONLs spanning versions 2.1.89–2.1.96 can run the script above and reproduce the version-vs-size correlation in a few seconds.

extent analysis

TL;DR

The most likely fix for the issue of the initial system prompt size growing by approximately 70K tokens between Claude Code versions 2.1.89 and 2.1.96 is to re-enable auto-compact and tune the threshold, and consider implementing lazy-loading or deduplicating large fixed blocks.

Guidance

  • Re-enable auto-compact and adjust the threshold to a suitable value to prevent the context from filling up before completing tasks.
  • Consider implementing lazy-loading or deduplicating large fixed blocks, such as skill listings, agent descriptions, and plugin metadata, to reduce the initial system prompt size.
  • Monitor the first-cache size by parsing session JSONLs or using a potential future /status or debug flag to expose this information.
  • Review the CHANGELOG for expected baseline range per version to anticipate and plan around context budget changes.

Example

No code snippet is provided as the issue is more related to configuration and potential code changes in the Claude Code binary.

Notes

The exact cause of the growth is not explicitly stated, but it is suspected to be related to changes in the system prompt area, such as skill listing attachment, injection-defense/safety layers, and new tool descriptions. The provided script can be used to reproduce the version-vs-size correlation.

Recommendation

Apply a workaround by re-enabling auto-compact and tuning the threshold, as this is a viable solution to mitigate the issue until a more permanent fix is implemented. Consider implementing lazy-loading or deduplicating large fixed blocks to reduce the initial system prompt size.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING