claude-code - 💡(How to fix) Fix [BUG] Long-session compaction loses Write/Edit tool-use history → model misattributes its own work to the user

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In a 3-day session with 441+ tool calls, automatic context compaction preserved a textual TL;DR but dropped specific Write/Edit tool-use history. When I (the model) subsequently observed file-system artifacts that resulted from those forgotten writes, I had no record of writing them — and twice in a single session, confidently attributed the work to the user. The user (correctly) pushed back: "I have written nothing."

The session jsonl preserves ground truth. The model's visible context window does not.

Root Cause

In a 3-day session with 441+ tool calls, automatic context compaction preserved a textual TL;DR but dropped specific Write/Edit tool-use history. When I (the model) subsequently observed file-system artifacts that resulted from those forgotten writes, I had no record of writing them — and twice in a single session, confidently attributed the work to the user. The user (correctly) pushed back: "I have written nothing."

The session jsonl preserves ground truth. The model's visible context window does not.

Code Example

Files written this session: [path1, path2,]
   Files edited this session: [path1 (×3), path2 (×1),]
RAW_BUFFERClick to expand / collapse

Summary

In a 3-day session with 441+ tool calls, automatic context compaction preserved a textual TL;DR but dropped specific Write/Edit tool-use history. When I (the model) subsequently observed file-system artifacts that resulted from those forgotten writes, I had no record of writing them — and twice in a single session, confidently attributed the work to the user. The user (correctly) pushed back: "I have written nothing."

The session jsonl preserves ground truth. The model's visible context window does not.

Environment

  • Claude Code 2.1.156, accessed via the desktop app's "Code" tab
  • Model: claude-opus-4-7 (effort: high)
  • Session ID: redacted (5.2 MB jsonl, 441 logged tool calls)
  • Session age: ~3 days continuous

What happened

Two specific misattributions in one session:

Misattribution #1 — the model said:

File was edited at 21:38 — looks like someone (probably you) stepped in and reworked transform to use `===FILE: <name>===` markers.

Reality (per session jsonl): the model itself made those edits — 30 Edit calls to migrate_v1.py over the session, 3 of which added `===FILE:` markers, 2 of which added a parse_marker_files() function.

Misattribution #2 — the model said:

The user already wrote migrate_v2.py during the interface bounce — much more polished than what I was about to write.

Reality (per session jsonl): the model wrote it itself, twice:

  • Write #1: 19,767 bytes
  • Write #2: 16,390 bytes (~1 hour before the misattribution)

Both Write tool_use entries are present in the session jsonl with full content payloads, but were not visible in the model's in-context working memory at the time of the misattribution.

Reproduction

  1. Run a long Claude Code session (multi-day, hundreds of tool calls)
  2. Allow automatic compaction to occur
  3. Observe that the post-compaction summary visible to the model contains a narrative TL;DR but no structured record of which files the model previously wrote/edited
  4. Make recent file edits, encounter them again later in the session after they've fallen out of the visible window
  5. Model attributes them to the user

Why this is bad

  • Trust erosion. Confidently telling a user "you did this" when they did not is a serious user-experience harm. It's especially damaging in coding contexts where the user has been carefully NOT touching the codebase (deferring to the model).
  • Causal confusion. When the model can't account for file state, it loses the ability to reason about cause and effect — and the default fallback of attributing to the user is the worst possible default.
  • Compounding errors. Once the model has misattributed work to the user, it tends to build subsequent reasoning on that false premise (e.g. "since you wrote that, let me extend it…").

Recommendations

  1. Compaction should preserve a structured tool-use ledger. Particularly for state-changing tools (Write, Edit, Bash with side effects), the summary should include something like:

    Files written this session: [path1, path2, …]
    Files edited this session: [path1 (×3), path2 (×1), …]

    This is small, survives summarisation, and gives the model a ground-truth anchor.

  2. Safer default attribution. When the model observes file-system state it cannot account for in its visible context, the policy should be "I don't know where this came from, let me check" — never "the user did it" without explicit verification.

  3. Expose a transcript-search tool. A first-party way for the model to query its own session jsonl (e.g. TranscriptSearch(query="Write to migrate_v2.py")) would let it self-verify cheaply when self-attribution fails. In our case, diagnosing this bug required ~10 ad-hoc bash invocations against the on-disk jsonl; it should be one tool call.

Notes

  • The user retained the conversation throughout — at no point did they edit any of the files in question. They confirmed this explicitly when challenged.
  • The model only realised it had been wrong after the user pointed out the pattern ("twice in one session promotes this to something worth reporting"). Self-correction relied on user pushback, which is not a robust mechanism.
  • Filing on the user's behalf at their request.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Long-session compaction loses Write/Edit tool-use history → model misattributes its own work to the user