In long sessions the tool-execution layer intermittently returns stale / replayed output, results bound to the wrong call, and "Cancelled: parallel tool call X errored" on calls that did not themselves fail. Once it starts, in-harness verification (Read, Grep, build exit codes, git show) can no longer be trusted, which leads an agent to make and commit edits it cannot verify. The behavior is intermittent — it does not reproduce on a clean minimal case — so the exact trigger is not yet isolated.

Honesty note: some "facts" I first gathered while diagnosing were themselves corrupted results (a phantom 5.77 MB ~/.claude.json, a phantom version string, an invariant "41388 tokens" message). The HARD EVIDENCE below is separated from HYPOTHESIS for that reason.

Bug report: intermittent tool-result corruption (stale/replayed output + sibling cancellation)

Summary

Honesty note: some "facts" I first gathered while diagnosing were themselves corrupted results (a phantom 5.77 MB ~/.claude.json, a phantom version string, an invariant "41388 tokens" message). The HARD EVIDENCE below is separated from HYPOTHESIS for that reason.

Environment

Claude Code version: 2.1.143 (@anthropic-ai/[email protected], confirmed via npm ls -g)
Surface: VS Code native extension (Claude Agent SDK runtime)
Model: Claude Opus 4.8 (1M context); OS: Linux 6.12 (Debian 13), bash
User reports symptoms began after the most recent update.

Hard evidence (clean, repeated within the session)

Invariant "output limit" string contaminating unrelated results. A constant message — "...exceeded the max output limit of 30000 tokens. The result was over the limit by 41388 tokens" — was appended to several small, unrelated tool results:
- appeared on a cat file | head -c 400 call (output physically capped at 400 bytes — cannot exceed any token limit), and on a sibling npm ls;
- appeared glued to the real ~30-byte output of ls -la ~/.claude.json | awk '{print $5,$9}'. The 41388 figure is identical across three unrelated calls, which is only possible if it is a cached/replayed fragment, not a fresh measurement.
Sibling cancellation. A Write call returned "Cancelled: parallel tool call Bash(...) errored" — it was aborted because a different call in the same assistant turn exited non-zero (a find that returned exit 1). The Write itself was valid.
Phantom file-content corruption. A Read once showed a Svelte file's script block garbled (dead code after return, duplicated const declarations); a git show HEAD:<file> | grep claimed thousands of duplicate lines. But wc -l reported a stable 1404 lines for working tree, HEAD, and HEAD~1, and a later clean Read showed the file intact. So the "corruption" was largely in the RESULTS layer, not on disk.
Replayed reads / phantom sentinels. Read with offset/limit sometimes returned a different offset's content or replayed an earlier read; "Wasted call — file unchanged" sentinels appeared for files not recently read.

What did NOT reproduce

A minimal 2-call batch — false (exit 1) + echo SIBLING_SURVIVED — ran cleanly: the erroring call did not cancel its sibling. So sibling cancellation is not a deterministic "any non-zero exit kills the batch" rule; it appears timing/concurrency-dependent.

Hypothesis (NOT proven — for the Claude Code team to verify)

The symptoms are consistent with a per-session tool-result buffer that is not strictly keyed by tool_use_id, combined with a race in how parallel calls are scheduled/cancelled and how the oversized-output replacement is applied:

one call's result (including the "output too large" replacement) bleeds into sibling slots;
a call that errors or a batch that collectively exceeds the output cap can cancel in-flight siblings;
the corrupted buffer state then persists and pollutes later calls.

Correlates with large parallel tool-call batches (10–14 calls/turn) and the presence of at least one very large result (full-file reads of >1000-line files, broad grep -rn). Likely introduced/exposed by a recent release (2.1.x) given the user's "started after the update" report.

Impact

High. Reads, greps, build output, and git show all become unreliable, so an agent cannot verify its own edits — risking garbled in-place edits and commits/pushes of unverifiable state. It defeats the standard "verify before done" safety gate.

Workarounds (until fixed)

Restart the session to clear the corrupted buffer (it does not self-recover).
Prefer small batches (1–3 tool calls/turn); avoid 10+ parallel calls.
Avoid huge tool outputs: bounded-range Reads, pipe grep through head, never cat large files.
Once the invariant <N> tokens string or a phantom "Wasted call" appears, treat ALL in-harness verification as suspect and re-verify in a real terminal or a fresh session.

How to file

Run /bug inside Claude Code to submit with session context, or open an issue at https://github.com/anthropics/claude-code/issues and paste this file. Include version 2.1.143 and the "Hard evidence" section above — especially the invariant 41388 tokens contamination, which is the clearest proof of result-buffer bleed-through.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Intermittent tool-result corruption: stale/replayed output + sibling cancellation (v2.1.143)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workarounds (until fixed)

Bug report: intermittent tool-result corruption (stale/replayed output + sibling cancellation)

Summary

Environment

Hard evidence (clean, repeated within the session)

What did NOT reproduce

Hypothesis (NOT proven — for the Claude Code team to verify)

Impact

Workarounds (until fixed)

How to file

Still need to ship something?

TRENDING