openclaw - ✅(Solved) Fix Bug: claude-cli usage accumulator double-counts cache_read_input_tokens across tool-loop iterations [1 pull requests, 1 participants]

openclaw2026-04-23 16:23:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70679•Fetched 2026-04-24 05:54:48

View on GitHub

Comments

Participants

Timeline

Reactions

Author

yongsinzhang-commits

Participants

yongsinzhang-commits

When a claude-cli provider session does multiple tool-use iterations inside a single user turn (e.g. Read, Edit, web fetches), openclaw's usage accumulator sums cacheRead across every iteration. Because Anthropic's cache_read_input_tokens already represents the cumulative context at that call (it is a snapshot, not a delta), summing it inflates the reported total by roughly Nx, where N is the number of tool-loop iterations.

Symptoms:

TUI status line (tokens X/200k) shows values 2–5x higher than the real Anthropic context.
Token count visibly jumps and drops between turns (e.g. 24k → 132k → 24k) instead of growing monotonically.
Sessions intermittently become unresponsive once the inflated estimate crosses the preemptive-overflow threshold, even though the real context is well under the 200k window.

Sessions using OpenAI-style providers (tested with openai-codex/gpt-5.4) are not affected — the TUI rises smoothly from ~17k and grows slowly as expected.

Error Message

throw new Error(PREEMPTIVE_CONTEXT_OVERFLOW_MESSAGE);

Root Cause

Source: src/agents/pi-embedded-runner/usage-accumulator.ts (bundled at dist/pi-embedded-runner-*.js). Called from the tool-loop retry/attempt site in the same file — each attempt's attemptUsage (from normalizeUsage(sessionLastAssistant?.usage)) is merged in sequence:

// pi-embedded-runner, tool-loop attempt handler (simplified)
const lastAssistantUsage = normalizeUsage(sessionLastAssistant?.usage);
const attemptUsage = attempt.attemptUsage ?? lastAssistantUsage;
mergeUsageIntoAccumulator(usageAccumulator, attemptUsage);

The accumulator:

const mergeUsageIntoAccumulator = (target, usage) => {
  if (!hasUsageValues(usage)) return;
  const callTotal = usage.total ?? (usage.input ?? 0) + (usage.output ?? 0)
                    + (usage.cacheRead ?? 0) + (usage.cacheWrite ?? 0);
  target.input      += usage.input ?? 0;
  target.output     += usage.output ?? 0;
  target.cacheRead  += usage.cacheRead ?? 0;   // <-- incorrect for Anthropic
  target.cacheWrite += usage.cacheWrite ?? 0;  // <-- same
  target.total      += callTotal;
  target.lastInput = usage.input ?? 0;
  // ... lastX fields correctly track per-call values
};

The accumulated usage (not lastCallUsage) is what ultimately flows through deriveSessionTotalTokens → persistSessionUsageUpdate → session store totalTokens → buildSessionUsageSnapshot → ACP usage_update → TUI footer.

Anthropic's streaming API returns, for every tool-loop iteration, a usage object where cache_read_input_tokens is the full prompt cache hit for that call, not the delta since the previous call. For a turn with 5 API iterations each hitting ~25k cached tokens, the accumulator records cacheRead = 125k while the actual Anthropic context is still ~27k.

OpenAI-style providers avoid this because normalizeUsage in usage-CDsCClku.js normalizes cached_tokens by subtracting it from input, so accumulation doesn't double-count. Anthropic-style triples (input / cache_read_input_tokens / cache_creation_input_tokens) have no equivalent normalization step.

Fix Action

Workaround

Until fixed, a local patch that changes the additive lines to assignments gives correct-ish numbers for single-provider Anthropic sessions:

// pi-embedded-runner-*.js, mergeUsageIntoAccumulator
target.cacheRead  = usage.cacheRead  ?? target.cacheRead;
target.cacheWrite = usage.cacheWrite ?? target.cacheWrite;
target.total      = (usage.input ?? 0) + (usage.output ?? 0)
                    + target.cacheRead + target.cacheWrite;

This trades correct per-turn cost accounting for correct context-size display, so it's not a general-purpose fix — just a mitigation to confirm the diagnosis.

PR fix notes

PR #70987: test(agents): cover cache snapshot usage reporting

Repository: openclaw/openclaw
Author: hyspacex
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/70987

Description (problem / solution / changelog)

Summary

add regression coverage for cache-snapshot usage reporting in the embedded runner
verify the reported current-turn total comes from the final call snapshot instead of inflated accumulated tool-loop totals
assert lastCallUsage preserves the final cache snapshot used for session accounting

Refs #70679

Testing

pnpm exec vitest run src/agents/pi-embedded-runner/usage-reporting.test.ts
pnpm exec vitest run src/agents/usage.test.ts
pnpm exec oxfmt --check src/agents/pi-embedded-runner/usage-reporting.test.ts

Changed files

src/agents/pi-embedded-runner/usage-reporting.test.ts (modified, +45/-0)

Code Example

// pi-embedded-runner, tool-loop attempt handler (simplified)
const lastAssistantUsage = normalizeUsage(sessionLastAssistant?.usage);
const attemptUsage = attempt.attemptUsage ?? lastAssistantUsage;
mergeUsageIntoAccumulator(usageAccumulator, attemptUsage);

---

const mergeUsageIntoAccumulator = (target, usage) => {
  if (!hasUsageValues(usage)) return;
  const callTotal = usage.total ?? (usage.input ?? 0) + (usage.output ?? 0)
                    + (usage.cacheRead ?? 0) + (usage.cacheWrite ?? 0);
  target.input      += usage.input ?? 0;
  target.output     += usage.output ?? 0;
  target.cacheRead  += usage.cacheRead ?? 0;   // <-- incorrect for Anthropic
  target.cacheWrite += usage.cacheWrite ?? 0;  // <-- same
  target.total      += callTotal;
  target.lastInput = usage.input ?? 0;
  // ... lastX fields correctly track per-call values
};

---

ts=15:33:59.019  input=3  cc=151   cr=19372  out=248   # text + Read IDENTITY.md + Read AGENTS.md
ts=15:34:05.811  input=1  cc=3072  cr=19523  out=132   # Read HEARTBEAT.md + Read RADAR.md
ts=15:34:11.796  input=1  cc=1086  cr=22595  out=190   # final text reply

---

const PREEMPTIVE_OVERFLOW_RATIO = 0.9;
const TOOL_RESULT_ESTIMATE_TO_TEXT_RATIO = 4 / 2;  // 2 chars per token
// ...
if (exceedsPreemptiveOverflowThreshold({ messages, maxContextChars }))
  throw new Error(PREEMPTIVE_CONTEXT_OVERFLOW_MESSAGE);

---

// pi-embedded-runner-*.js, mergeUsageIntoAccumulator
target.cacheRead  = usage.cacheRead  ?? target.cacheRead;
target.cacheWrite = usage.cacheWrite ?? target.cacheWrite;
target.total      = (usage.input ?? 0) + (usage.output ?? 0)
                    + target.cacheRead + target.cacheWrite;

RAW_BUFFERClick to expand / collapse

Bug: claude-cli usage accumulator double-counts `cache_read_input_tokens` across tool-loop iterations, inflating TUI token display and triggering spurious context overflow

Summary

Symptoms:

TUI status line (tokens X/200k) shows values 2–5x higher than the real Anthropic context.
Token count visibly jumps and drops between turns (e.g. 24k → 132k → 24k) instead of growing monotonically.
Sessions intermittently become unresponsive once the inflated estimate crosses the preemptive-overflow threshold, even though the real context is well under the 200k window.

Sessions using OpenAI-style providers (tested with openai-codex/gpt-5.4) are not affected — the TUI rises smoothly from ~17k and grows slowly as expected.

Environment

openclaw: 2026.4.15 (npm global install)
Provider: claude-cli/claude-sonnet-4-6 (also reproducible with Opus 4.7 per same code path)
Surface: openclaw-tui, direct webchat channel
OS: macOS (Darwin 25.4.0)

Reproduction

Start a TUI session against any agent using claude-cli provider.
Send a prompt that causes the agent to chain tool calls, e.g. "read all workspace files and give me a summary" (triggers 4-5 Read tool calls in one turn).
Observe the tokens X/200k footer jump to a value substantially higher than the real Anthropic context. The exact inflation scales with the number of tool iterations in the turn.
Send a short follow-up ("hi"). The displayed count drops sharply because the next turn has only one iteration.

Real Anthropic usage (from ~/.claude/projects/<project>/<session>.jsonl) for the same turns stays within ~25-40k throughout.

Root cause

// pi-embedded-runner, tool-loop attempt handler (simplified)
const lastAssistantUsage = normalizeUsage(sessionLastAssistant?.usage);
const attemptUsage = attempt.attemptUsage ?? lastAssistantUsage;
mergeUsageIntoAccumulator(usageAccumulator, attemptUsage);

The accumulator:

const mergeUsageIntoAccumulator = (target, usage) => {
  if (!hasUsageValues(usage)) return;
  const callTotal = usage.total ?? (usage.input ?? 0) + (usage.output ?? 0)
                    + (usage.cacheRead ?? 0) + (usage.cacheWrite ?? 0);
  target.input      += usage.input ?? 0;
  target.output     += usage.output ?? 0;
  target.cacheRead  += usage.cacheRead ?? 0;   // <-- incorrect for Anthropic
  target.cacheWrite += usage.cacheWrite ?? 0;  // <-- same
  target.total      += callTotal;
  target.lastInput = usage.input ?? 0;
  // ... lastX fields correctly track per-call values
};

Concrete numerical evidence

From a real session at ~/.claude/projects/<project>/<uuid>.jsonl, a single user turn ("你有哪些任务吗？") issued 3 API iterations because the agent chained tool calls (Read IDENTITY.md / AGENTS.md / HEARTBEAT.md / RADAR.md, then a final text answer). Raw assistant.message.usage per iteration (stream-snapshot duplicates collapsed):

ts=15:33:59.019  input=3  cc=151   cr=19372  out=248   # text + Read IDENTITY.md + Read AGENTS.md
ts=15:34:05.811  input=1  cc=3072  cr=19523  out=132   # Read HEARTBEAT.md + Read RADAR.md
ts=15:34:11.796  input=1  cc=1086  cr=22595  out=190   # final text reply

Iteration	input	cache_creation	cache_read	output
1	3	151	19372	248
2	1	3072	19523	132
3	1	1086	22595	190
Accumulated (sums via `mergeUsageIntoAccumulator`)	5	4309	61490	570

deriveSessionTotalTokens persists totalTokens = input + cacheRead + cacheWrite = 5 + 61490 + 4309 = 66,304, which surfaces as tokens 66k/200k (33%) in the TUI footer.

Real Anthropic context at end of that turn (iteration 3 alone): 1 + 22595 + 1086 = 23,682 tokens (~24k). The displayed value is ~2.8× inflated for a 3-iteration turn.

A later turn on the same session with 5 Edit iterations reached tokens 132k/200k (66%) vs. a real ~28k context — scaling linearly with iteration count, consistent with the accumulation bug.

The wild-swing pattern tokens ... 31k → 19k → 66k → 24k → 132k → 24k ... across turns is not a UI glitch — it is an exact reflection of how many tool-loop iterations each turn happened to have. Single-call turns read correct; multi-call turns don't.

Secondary impact: spurious `PREEMPTIVE_CONTEXT_OVERFLOW`

In src/agents/pi-embedded-runner/tool-result-context-guard.ts:

const PREEMPTIVE_OVERFLOW_RATIO = 0.9;
const TOOL_RESULT_ESTIMATE_TO_TEXT_RATIO = 4 / 2;  // 2 chars per token
// ...
if (exceedsPreemptiveOverflowThreshold({ messages, maxContextChars }))
  throw new Error(PREEMPTIVE_CONTEXT_OVERFLOW_MESSAGE);

The char-based estimator (chars/2) further over-estimates real token cost (Anthropic tokenization is closer to chars/3.5–chars/4 for English and CJK). Combined with SAFETY_MARGIN = 1.2, a claude-cli session doing multi-step tool work can trip the preemptive overflow guard while the real context is well under the window. The throw aborts the tool loop and manifests as "the agent stopped responding" — matching a long-standing report we see locally where claude-cli sessions freeze after sustained tool usage, while Codex sessions on the same agent don't.

Suggested fix

For Anthropic-style usage, cacheRead and cacheWrite are state snapshots, not increments. The accumulator should either:

Use last value (preferred): track target.cacheRead = usage.cacheRead and target.cacheWrite = usage.cacheWrite for Anthropic-style usage (keep additive behavior for OpenAI-style where it's correct), and derive target.total from the last-call snapshot for context size purposes.
Take max: target.cacheRead = Math.max(target.cacheRead, usage.cacheRead ?? 0) — robust if iterations aren't strictly ordered.
Normalize at the source: in resolveClaudeCliUsage (chat-CMSNlsvD.js), emit a usage shape that the accumulator can sum without double-counting (e.g. only report input = 0 on cache hits and let the accumulator sum deltas).

Separately, the TOOL_RESULT_ESTIMATE_TO_TEXT_RATIO = 4 / 2 constant deserves a look — 4 / 3.5 or 4 / 4 would be closer to reality for both English and CJK content and reduce false preemptive overflows.

Workaround

Until fixed, a local patch that changes the additive lines to assignments gives correct-ish numbers for single-provider Anthropic sessions:

// pi-embedded-runner-*.js, mergeUsageIntoAccumulator
target.cacheRead  = usage.cacheRead  ?? target.cacheRead;
target.cacheWrite = usage.cacheWrite ?? target.cacheWrite;
target.total      = (usage.input ?? 0) + (usage.output ?? 0)
                    + target.cacheRead + target.cacheWrite;

This trades correct per-turn cost accounting for correct context-size display, so it's not a general-purpose fix — just a mitigation to confirm the diagnosis.

Additional notes

The lastInput / lastCacheRead / lastTotal fields in the same accumulator already track per-call values correctly; toLastCallUsage returns a sane last-call snapshot. The bug is that toNormalizedUsage (the cumulative form) is the one that flows into totalTokens persistence.
Happy to supply the raw .jsonl usage extracts if helpful.

extent analysis

TL;DR

The most likely fix is to update the mergeUsageIntoAccumulator function to handle Anthropic-style usage by tracking the last value of cacheRead and cacheWrite instead of summing them.

Guidance

Identify the mergeUsageIntoAccumulator function in src/agents/pi-embedded-runner/usage-accumulator.ts and update it to handle Anthropic-style usage.
Consider one of the suggested fixes: using the last value, taking the max, or normalizing at the source.
Verify the fix by checking the tokens X/200k display in the TUI and ensuring it no longer jumps or drops unexpectedly.
Additionally, review the TOOL_RESULT_ESTIMATE_TO_TEXT_RATIO constant and consider updating it to a more accurate value to reduce false preemptive overflows.

Example

// pi-embedded-runner-*.js, mergeUsageIntoAccumulator
target.cacheRead  = usage.cacheRead  ?? target.cacheRead;
target.cacheWrite = usage.cacheWrite ?? target.cacheWrite;
target.total      = (usage.input ?? 0) + (usage.output ?? 0)
                    + target.cacheRead + target.cacheWrite;

Notes

The provided workaround can be used as a temporary mitigation to confirm the diagnosis, but it is not a general-purpose fix.
The lastInput / lastCacheRead / lastTotal fields in the accumulator already track per-call values correctly, so the bug is specific to the cumulative form.

Recommendation

Apply the suggested fix to update the mergeUsageIntoAccumulator function to handle Anthropic-style usage, as it is the most straightforward and effective solution to resolve the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #configuration error #environment variable #network issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Bug: claude-cli usage accumulator double-counts cache_read_input_tokens across tool-loop iterations [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

PR fix notes

PR #70987: test(agents): cover cache snapshot usage reporting

Description (problem / solution / changelog)

Summary

Testing

Changed files

Code Example

Bug: claude-cli usage accumulator double-counts cache_read_input_tokens across tool-loop iterations, inflating TUI token display and triggering spurious context overflow

Summary

Environment

Reproduction

Root cause

Concrete numerical evidence

Secondary impact: spurious PREEMPTIVE_CONTEXT_OVERFLOW

Suggested fix

Workaround

Additional notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Bug: claude-cli usage accumulator double-counts `cache_read_input_tokens` across tool-loop iterations, inflating TUI token display and triggering spurious context overflow

Secondary impact: spurious `PREEMPTIVE_CONTEXT_OVERFLOW`