claude-code - 💡(How to fix) Fix [BUG] 2x - 3x increase in output tokens starting with version 2.1.96 and increasing in 2.1.101 [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48808Fetched 2026-04-16 06:50:23
View on GitHub
Comments
3
Participants
2
Timeline
10
Reactions
1
Author
Timeline (top)
labeled ×6commented ×3cross-referenced ×1

Error Message

#!/usr/bin/env bash

A/B harness — arm A: Claude Code v2.1.96

Runs the penguins-report prompt non-interactively against the pinned version.

Session JSONL lands at ~/.claude/projects/<cwd-slug>/<uuid>.jsonl

(Claude Code replaces '/' and '.' in the cwd with '-' to form the slug.)

set -euo pipefail

change for version you want to run, version must be installed.

CC_VERSION="2.1.96" CC_BIN="$HOME/.local/share/claude/versions/$CC_VERSION"

if [[ ! -x "$CC_BIN" ]]; then echo "ERROR: Claude Code $CC_VERSION not found at $CC_BIN" >&2 echo "Install with: npm install -g @anthropic-ai/claude-code@$CC_VERSION" >&2 exit 1 fi

HERE="$(cd "$(dirname "$0")" && pwd)" cd "$HERE"

Ensure dataset is present (self-contained run dir)

if [[ ! -f data/penguins.csv ]]; then mkdir -p data cp ../data/penguins.csv data/penguins.csv fi

Prompt lives one level up, shared across arms

PROMPT_FILE="../prompt.md"

export DISABLE_AUTOUPDATER=1 export CLAUDE_CODE_VERSION_PINNED="$CC_VERSION"

echo "=== Claude Code $CC_VERSION — penguins report ===" echo "Working dir: $HERE" echo "Prompt: $PROMPT_FILE" echo "Started: $(date -Iseconds)" echo

"$CC_BIN"
--model claude-sonnet-4-6
--print
--permission-mode acceptEdits
"$(cat "$PROMPT_FILE")"

echo echo "Finished: $(date -Iseconds)" echo "Session JSONL: ~/.claude/projects/$(pwd | sed 's|/|-|g')/*.jsonl"

Root Cause

Cache-write is nearly flat across arms (1.00× / 1.06× / 1.13×) because all runs are back-to-back with no idle gaps that would expire the 5-minute prompt cache TTL. Earlier real-workload runs over a 6-hour session showed a 2.04× cache-write delta that vanished here, so cache-writes are partly an idle-gap artefact — output tokens are the load-bearing signal.

Code Example

# from the harness root
cd run-2.1.96 && bash run.sh     # ~60s
# wait ≥ 5 min for prompt cache TTL to expire
cd run-2.1.98 && bash run.sh     # ~60s
# wait ≥ 5 min
cd run-2.1.101 && bash run.sh    # ~60s
# wait ≥ 5 min
cd run-2.1.107 && bash run.sh    # ~60s

# compare
cd .. && python3 analyze.py

---

"$HOME/.local/share/claude/versions/$CC_VERSION" \
  --model claude-sonnet-4-6 \
  --print \
  --permission-mode acceptEdits \
  "$(cat ../prompt.md)"

---



---

#!/usr/bin/env bash
# A/B harness — arm A: Claude Code v2.1.96
# Runs the penguins-report prompt non-interactively against the pinned version.
# Session JSONL lands at ~/.claude/projects/<cwd-slug>/<uuid>.jsonl
# (Claude Code replaces '/' and '.' in the cwd with '-' to form the slug.)
set -euo pipefail

# change for version you want to run,  version must be installed. 
CC_VERSION="2.1.96" 
CC_BIN="$HOME/.local/share/claude/versions/$CC_VERSION"

if [[ ! -x "$CC_BIN" ]]; then
  echo "ERROR: Claude Code $CC_VERSION not found at $CC_BIN" >&2
  echo "Install with: npm install -g @anthropic-ai/claude-code@$CC_VERSION" >&2
  exit 1
fi

HERE="$(cd "$(dirname "$0")" && pwd)"
cd "$HERE"

# Ensure dataset is present (self-contained run dir)
if [[ ! -f data/penguins.csv ]]; then
  mkdir -p data
  cp ../data/penguins.csv data/penguins.csv
fi

# Prompt lives one level up, shared across arms
PROMPT_FILE="../prompt.md"

export DISABLE_AUTOUPDATER=1
export CLAUDE_CODE_VERSION_PINNED="$CC_VERSION"

echo "=== Claude Code $CC_VERSION — penguins report ==="
echo "Working dir: $HERE"
echo "Prompt:      $PROMPT_FILE"
echo "Started:     $(date -Iseconds)"
echo

"$CC_BIN" \
  --model claude-sonnet-4-6 \
  --print \
  --permission-mode acceptEdits \
  "$(cat "$PROMPT_FILE")"

echo
echo "Finished: $(date -Iseconds)"
echo "Session JSONL: ~/.claude/projects/$(pwd | sed 's|/|-|g')/*.jsonl"
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

While there's a ton of other issues here the one I want to address is the increase in output tokens starting with version 2.1.98 (2x increase in output tokens) and increasing from there (2.1.101+) to 3x+ in output tokens.

Environment

  • Claude Code versions tested (pinned via ~/.local/share/claude/versions/<ver>, DISABLE_AUTOUPDATER=1): 2.1.96, 2.1.98, 2.1.101, 2.1.107
  • Model: claude-sonnet-4-6 (same in all arms)
  • Mode: --print --permission-mode acceptEdits (non-interactive)
  • OS: macOS 15 (Darwin 25.3.0)
  • Installer: native (version file is the executable)

TL;DR

Running identical prompts against identical tasks on identical models, Claude Code's per-assistant-turn output token usage jumps in two discrete steps inside the v2.1.96 → v2.1.101 range, then plateaus:

VersionOutput / asst msgΔ vs 2.1.96Step change
2.1.967911.00×baseline
2.1.981,6222.05×+105% (first jump)
2.1.1012,2212.81×+37% on top of 2.1.98 (second jump)
2.1.1072,0772.62×flat vs 2.1.101

Both arms reproduce in ~60 seconds in an empty directory with public data. This is not a session-length, tooling, or workflow artefact — the prompt is byte-identical and the task is a one-shot CSV-to-HTML report.

Reproducer

Minimal harness attached in cc-ab/ — empty working directory, one public CSV (Palmer Penguins, CC0), one prompt that asks for a self-contained HTML report with three charts. Four run directories, each pinning one Claude Code version:

# from the harness root
cd run-2.1.96 && bash run.sh     # ~60s
# wait ≥ 5 min for prompt cache TTL to expire
cd run-2.1.98 && bash run.sh     # ~60s
# wait ≥ 5 min
cd run-2.1.101 && bash run.sh    # ~60s
# wait ≥ 5 min
cd run-2.1.107 && bash run.sh    # ~60s

# compare
cd .. && python3 analyze.py

Each run.sh invokes:

"$HOME/.local/share/claude/versions/$CC_VERSION" \
  --model claude-sonnet-4-6 \
  --print \
  --permission-mode acceptEdits \
  "$(cat ../prompt.md)"

Full metrics

Metricv2.1.96v2.1.98Δv2.1.101Δv2.1.107Δ
User messages331.00×31.00×41.33×
Assistant messages551.00×61.20×71.40×
Tool uses221.00×21.00×31.50×
Output tokens3,9568,1122.05×13,3273.37×14,5363.67×
Output / asst msg7911,6222.05×2,2212.81×2,0772.62×
Cache-write tokens45,77945,9251.00×48,6661.06×51,6761.13×
Cache-read tokens61,94261,7421.00×85,1641.37×110,2511.78×
Input tokens991.00×101.11×111.22×

Output-per-assistant-message is the cleanest signal — it normalises for session length and prompt count. It doubled at 2.1.98 on the same workload.

Cache-write is nearly flat across arms (1.00× / 1.06× / 1.13×) because all runs are back-to-back with no idle gaps that would expire the 5-minute prompt cache TTL. Earlier real-workload runs over a 6-hour session showed a 2.04× cache-write delta that vanished here, so cache-writes are partly an idle-gap artefact — output tokens are the load-bearing signal.

Session JSONLs (attached, email-redacted, otherwise intact)

VersionSession IDFile
2.1.96f80549ad-7497-4074-8142-2cee21e89b3drun-2.1.96/session.sanitized.jsonl
2.1.98e032b896-1a58-4c67-9696-3dd5d748197drun-2.1.98/session.sanitized.jsonl
2.1.101b0e92cab-1ac5-4c2a-9fa3-296a3e2aa5bcrun-2.1.101/session.sanitized.jsonl
2.1.1074d29fec2-3f98-4429-b457-ea79b8bc7ad9run-2.1.107/session.sanitized.jsonl

Each JSONL contains: all per-message usage metrics, version, model, all tool-use blocks, MCP tool advertisements (deferred_tools_delta), MCP instruction blocks, skill listings, and SessionStart hook output. Only the reporter's email address is redacted — everything else is intact to help triage.

Matched pairs of report.html (the actual deliverables each run produced) are also attached under run-*/report.html and are substantially similar in output — i.e. v2.1.107 is not producing a meaningfully larger artefact to justify 3.7× the output tokens.

What this rules out

  • Prompt / work-mix — prompts are byte-identical across arms.
  • Model — all arms run claude-sonnet-4-6.
  • User behaviour / machine — same user, same machine, back-to-back runs.
  • Project size / complexity — minimal reproducer runs in an empty directory with a 15 KB CSV. The regression still shows up.
  • Workflow shape — no sub-agents invoked, no orchestration, no resume, no compaction. Single turn with two file writes.
  • More actual work done — v2.1.98 produces the same session shape as v2.1.96 (5 asst messages, 2 tool uses, identical user turns) yet doubles output tokens. v2.1.107 produces one extra Read call, same final artefact.

What's left as the variable

Claude Code itself, between 2.1.96 and 2.1.101 — and specifically two separate changes in that range, each amplifying per-turn output.

Changelog context (from Anthropic's public changelog):

  • 2.1.98 — sub-agent partial-progress now reports up to the parent session (attributed to parent output tokens); live /agents layout; Monitor tool added. This is the first-jump inflection in the harness. But the harness invokes zero sub-agents, which suggests either (a) the sub-agent attribution path also runs in no-sub-agent sessions, or (b) another 2.1.98 change is responsible.
  • 2.1.99 / 2.1.100 — would need Anthropic to identify which behaviour-affecting change in this range caused the second step.
  • 2.1.101--resume preserves more context on large sessions; sub-agents inherit MCP tools. By this version, per-turn cost has reached the full ~2.7× plateau.

2.1.105 (skill description cap 250 → 1,536 chars, 6×) is a separate known cost amplifier but can't explain this harness — full regression is visible at 2.1.101, which predates 2.1.105.

Ask

  1. Confirm whether the per-turn output increase at 2.1.98 and the further increase by 2.1.101 are intentional at this cost profile. User-visible result: sessions that used to run cheaply now cost ~2.7× more in output tokens for the same work.
  2. Identify the changelog entries responsible for each of the two step changes (2.1.96 → 2.1.98 and 2.1.98 → 2.1.101). If it's sub-agent partial-progress attribution, please clarify why it affects sessions that invoke zero sub-agents.
  3. Expose knobs for the highest-impact drivers:
    • An opt-out (or cap) on sub-agent output-token attribution when the sub-agent output isn't consumed by the parent.
    • An opt-out (or cap) on the 2.1.105 skill description expansion for users with many installed skills.
    • A higher or user-configurable context threshold so mid-session compactions (each ~5–15 K output tokens) are less frequent.

What Should Happen?

Different versions running the same task should have, roughly, the same token usage.

Error Messages/Logs

Steps to Reproduce

  1. Install multiple versions of claude using curl -fsSL https://claude.ai/install.sh | bash -s 2.1.98 where "2.1.98" if the version number to install recommend 2.1.96 (last good version, 2.1.98 (first broken version) and 2.1.101 (most broken version)
  2. Create run.sh with the following:
#!/usr/bin/env bash
# A/B harness — arm A: Claude Code v2.1.96
# Runs the penguins-report prompt non-interactively against the pinned version.
# Session JSONL lands at ~/.claude/projects/<cwd-slug>/<uuid>.jsonl
# (Claude Code replaces '/' and '.' in the cwd with '-' to form the slug.)
set -euo pipefail

# change for version you want to run,  version must be installed. 
CC_VERSION="2.1.96" 
CC_BIN="$HOME/.local/share/claude/versions/$CC_VERSION"

if [[ ! -x "$CC_BIN" ]]; then
  echo "ERROR: Claude Code $CC_VERSION not found at $CC_BIN" >&2
  echo "Install with: npm install -g @anthropic-ai/claude-code@$CC_VERSION" >&2
  exit 1
fi

HERE="$(cd "$(dirname "$0")" && pwd)"
cd "$HERE"

# Ensure dataset is present (self-contained run dir)
if [[ ! -f data/penguins.csv ]]; then
  mkdir -p data
  cp ../data/penguins.csv data/penguins.csv
fi

# Prompt lives one level up, shared across arms
PROMPT_FILE="../prompt.md"

export DISABLE_AUTOUPDATER=1
export CLAUDE_CODE_VERSION_PINNED="$CC_VERSION"

echo "=== Claude Code $CC_VERSION — penguins report ==="
echo "Working dir: $HERE"
echo "Prompt:      $PROMPT_FILE"
echo "Started:     $(date -Iseconds)"
echo

"$CC_BIN" \
  --model claude-sonnet-4-6 \
  --print \
  --permission-mode acceptEdits \
  "$(cat "$PROMPT_FILE")"

echo
echo "Finished: $(date -Iseconds)"
echo "Session JSONL: ~/.claude/projects/$(pwd | sed 's|/|-|g')/*.jsonl"

Next to that file you will need a data directory with the attached penguins.csv file.

  1. Run bash run.sh

Claude Model

Sonnet (default)

Is this a regression?

Yes, this worked in a previous version

Last Working Version

2.1.96

Claude Code Version

2.1.107

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Other

Additional Information

Full A/B test harness attached.

cc-ab-clean.zip - ready to test - no results yet cc-ab-with-results.zip - with tests and analysis.

extent analysis

TL;DR

The issue can be addressed by identifying and potentially reverting the changes introduced in Claude Code versions 2.1.98 and 2.1.101 that caused the increase in output tokens.

Guidance

  1. Review changelog: Examine the changelog entries for versions 2.1.98 and 2.1.101 to understand the changes that might have caused the output token increase.
  2. Investigate sub-agent attribution: Since the issue occurs even without sub-agents, investigate if the sub-agent partial-progress attribution change in 2.1.98 affects sessions without sub-agents.
  3. Analyze skill description expansion: Consider the impact of the skill description expansion in 2.1.105, although it's noted that the full regression is visible at 2.1.101, which predates 2.1.105.
  4. Test with different versions: Use the provided A/B test harness to test different versions of Claude Code and verify the output token usage.

Example

No specific code snippet is provided as the issue is related to the behavior of Claude Code versions. However, the run.sh script in the issue body can be used as a starting point to test different versions.

Notes

The issue seems to be related to changes in Claude Code versions 2.1.98 and 2.1.101. The provided A/B test harness can be used to reproduce the issue and test different versions. It's essential to review the changelog entries and investigate the changes that might have caused the output token increase.

Recommendation

Apply a workaround by using an earlier version of Claude Code, such as 2.1.96, which is reported to work correctly, until the issue is resolved in a future version. This is because the issue is identified as a regression, and using an earlier version can mitigate the problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] 2x - 3x increase in output tokens starting with version 2.1.96 and increasing in 2.1.101 [3 comments, 2 participants]