claude-code - 💡(How to fix) Fix [BUG] Cowork: kanban-state.json silently truncated by 2 bytes after successful atomic_write + post-write validate (Windows, local disk, not OneDrive) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#56105Fetched 2026-05-05 05:58:05
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

In a long-running Cowork session on Windows 11 with a local-disk project folder (NOT OneDrive), a Python HTTP server's POST handler that uses temp-file + fsync + os.replace + post-write json.loads validate ends with a torn file on disk: the trailing }\n (2 bytes) of the JSON document is lost after the handler returns 204. At least three observed events on the same project folder over 10 days, all identical shape.

Related but distinct from:

  • #41702 (OneDrive-backed, write-side truncation) — this report is on local disk, no OneDrive involved
  • #41710 (sandbox read-side truncation) — this report is write-side: the file ends up torn on disk, not just read torn

Error Message

Silent failure — no error at the moment of corruption. The handler returns 204 cleanly. Truncation surfaces only when a subsequent reader opens the file:

$ python -c "import json; json.load(open('kanban-state.json'))" json.decoder.JSONDecodeError: Expecting ',' delimiter: line N column 1

Byte-level shape of the torn vs intact files (both observed events show the same shape):

intact end: ... }\n }\n}\n (root object's "}\n" present) torn end: ... }\n }\n (root object's "}\n" missing — 2 bytes)

Both torn files attached to this issue.

Root Cause

In a long-running Cowork session on Windows 11 with a local-disk project folder (NOT OneDrive), a Python HTTP server's POST handler that uses temp-file + fsync + os.replace + post-write json.loads validate ends with a torn file on disk: the trailing }\n (2 bytes) of the JSON document is lost after the handler returns 204. At least three observed events on the same project folder over 10 days, all identical shape.

Related but distinct from:

  • #41702 (OneDrive-backed, write-side truncation) — this report is on local disk, no OneDrive involved
  • #41710 (sandbox read-side truncation) — this report is write-side: the file ends up torn on disk, not just read torn

Fix Action

Fix / Workaround

Project-internal mitigation (working defense)

This catches the mount-race window with one quarter-second of added latency per save. Not a fix — just a workaround. The underlying mount-layer bug is what this issue is asking Anthropic to investigate.

Working project-internal mitigation in our serve.py (added 2026-05-04): delay 250ms after the immediate post-write validate, re-validate, and auto-heal from the JS twin if torn. Catches every observed instance. Adds 250ms latency to every successful save. Available on request if useful as a reference workaround.

Code Example

Silent failure — no error at the moment of corruption. The handler
returns 204 cleanly. Truncation surfaces only when a subsequent reader
opens the file:

  $ python -c "import json; json.load(open('kanban-state.json'))"
  json.decoder.JSONDecodeError: Expecting ',' delimiter: line N column 1

Byte-level shape of the torn vs intact files (both observed events
show the same shape):

  intact end:  ...      }\n  }\n}\n   (root object's "}\n" present)
  torn end:    ...      }\n  }\n      (root object's "}\n" missing — 2 bytes)

Both torn files attached to this issue.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Summary

In a long-running Cowork session on Windows 11 with a local-disk project folder (NOT OneDrive), a Python HTTP server's POST handler that uses temp-file + fsync + os.replace + post-write json.loads validate ends with a torn file on disk: the trailing }\n (2 bytes) of the JSON document is lost after the handler returns 204. At least three observed events on the same project folder over 10 days, all identical shape.

Related but distinct from:

  • #41702 (OneDrive-backed, write-side truncation) — this report is on local disk, no OneDrive involved
  • #41710 (sandbox read-side truncation) — this report is write-side: the file ends up torn on disk, not just read torn

Environment

  • OS: Windows 11
  • Cowork: Claude desktop app (research preview)
  • Project folder: C:\Users\MikeMoss\Operate\2. Products\Projects\course-session-processor-v1.5\
  • Filesystem: local NTFS, NOT OneDrive (verified — folder is not under any OneDrive path; OneDrive references in files are content references)
  • Writer: a Python http.server subclass running outside Cowork (serve.py), reachable from the Cowork session over http://127.0.0.1
  • File size at truncation: ~27,825 bytes (5/4 event), ~23,500 bytes (5/3 event) — both >5 KB

Symptom

The handler does:

  1. Receive POST body, parse JSON, validate shape
  2. atomic_write(path, content) — temp file in same dir, f.write + f.flush + os.fsync(f.fileno()), then os.replace(tmp, path)
  3. validate_state_file(path) — open, read, json.loads — succeeds
  4. Return 204

After the response goes out, the file on disk loses its trailing }\n (exactly 2 bytes — the root JSON object's closing brace and newline). Subsequent readers (other Cowork sessions, the board's auto-refresh, manual inspection) see the torn version.

Reproducibility

3 observed events on this project folder over ~10 days:

  • kanban-state.js.corrupt-2026-04-24 (older, JS-twin variant)
  • kanban-state.json.corrupted-2026-05-03-19-38-drag (5/3 event, on disk)
  • kanban-state.json.backup-2026-05-04-pre-heal-1 (5/4 event, on disk)

Plus seven prior events documented in the same project's operating-rules log that prompted a project-internal "atomic-write for any file >5 KB during a Cowork session" rule. All same shape, all on files >5 KB, all inside Cowork sessions.

Discriminator vs prior hypotheses

  • Not a writer bug: atomic_write uses textbook temp+fsync+os.replace.
  • Not a missed-validate: post-write json.loads of the bytes read back from disk SUCCEEDED at response time.
  • Not a double-write race: only one POST /state call site in the client, verified.
  • Not a missing fsync: fsync is on the temp file before rename.
  • Not OneDrive: project folder is not under any OneDrive path.

The only remaining explanation: the bytes serve.py reads back at validate time differ from what later readers see through the Cowork mount. The truncation surfaces in a window of ~tens to a few hundred ms after the 204 response.

Evidence files (available on request)

  • kanban-state.json.corrupted-2026-05-03-19-38-drag (~23 KB) — torn JSON ending at "history": { ... } with no root closing brace
  • kanban-state.json.backup-2026-05-04-pre-heal-1 (~27.8 KB) — same shape

Both files have an intact JS twin (kanban-state.js) that was written by the same handler in the same atomic_write pair, immediately after the JSON, and parses cleanly — which is why heal-from-JS is the working project-internal recovery strategy.

Project-internal mitigation (working defense)

Added a delay-and-revalidate guard to the POST handler:

  1. Existing immediate post-write validate
  2. Sleep ~250ms
  3. Re-validate; if torn, heal from JS twin via existing helper
  4. Re-validate the heal; only then return 204

This catches the mount-race window with one quarter-second of added latency per save. Not a fix — just a workaround. The underlying mount-layer bug is what this issue is asking Anthropic to investigate.

Ask

Investigate the Cowork mount layer's flush/visibility semantics for files written by a host-process atomic_write+rename, observed by Cowork-side readers immediately afterward. Specifically: is there a path by which the final ~2 bytes of a freshly-replaced inode can be lost or invisible from the Cowork mount perspective even after os.replace completes and a host-side read-back parse succeeds?

What Should Happen?

After atomic_write + os.replace + an immediate post-write json.loads validate that succeeds, the file on disk should remain intact for all subsequent readers. The trailing "}\n" (2 bytes) of the JSON document should not silently disappear after the handler returns 204.

Equivalently: the bytes serve.py reads back via json.loads at validate time should be the bytes that any later reader sees through the Cowork mount. Today, those two views diverge in a small fraction of writes, silently, with no error surfaced anywhere in the stack.

Error Messages/Logs

Silent failure — no error at the moment of corruption. The handler
returns 204 cleanly. Truncation surfaces only when a subsequent reader
opens the file:

  $ python -c "import json; json.load(open('kanban-state.json'))"
  json.decoder.JSONDecodeError: Expecting ',' delimiter: line N column 1

Byte-level shape of the torn vs intact files (both observed events
show the same shape):

  intact end:  ...      }\n  }\n}\n   (root object's "}\n" present)
  torn end:    ...      }\n  }\n      (root object's "}\n" missing — 2 bytes)

Both torn files attached to this issue.

Steps to Reproduce

No minimal repro — the failure is intermittent, roughly ~3 events per 10 days of active use on the same project folder. Empirical pattern:

  1. Run a long-lived Python http.server (serve.py) outside the Cowork sandbox, bound to 127.0.0.1.
  2. Open a Cowork session with the same project folder selected.
  3. From a browser tab pointed at the server, POST a JSON body in the 25–30 KB range to a route whose handler does: atomic_write(path, content) # temp + fsync + os.replace validate_state_file(path) # open + read + json.loads return 204
  4. Repeat across a long Cowork session (multiple hours, dozens to hundreds of POSTs, intermixed with Cowork-side reads of the same file).
  5. On a small fraction of POSTs (~1–2 in 100 over the observed window), the file ends up with its final "}\n" missing on disk despite the handler returning 204 and its post-write validate succeeding.

The companion JS-twin file (kanban-state.js, written second by the same handler in the same atomic_write pair) is intact in every observed event. That suggests the race is not at fsync time but in a later visibility step specific to one of the two files.

Claude Model

Opus

Is this a regression?

No, this never worked

Last Working Version

No response

Claude Code Version

N/A — Cowork in Claude desktop app (research preview), not Claude Code CLI

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

PowerShell

Additional Information

Eight prior occurrences in the same project's operating-rules log prompted a project-internal "atomic-write for any file >5 KB during a Cowork session" rule (Rule 15.5), which is a working defense but not a fix. This issue is the upstream bug that rule defends against.

Working project-internal mitigation in our serve.py (added 2026-05-04): delay 250ms after the immediate post-write validate, re-validate, and auto-heal from the JS twin if torn. Catches every observed instance. Adds 250ms latency to every successful save. Available on request if useful as a reference workaround.

extent analysis

TL;DR

Investigate the Cowork mount layer's flush and visibility semantics to ensure that files written by a host-process atomic_write+rename are immediately visible to Cowork-side readers.

Guidance

  • Review the Cowork mount layer's implementation to identify potential issues with file visibility after os.replace completes.
  • Verify that the os.replace operation is properly synchronized with the Cowork mount layer to ensure that the replaced file is immediately visible.
  • Consider adding additional logging or debugging statements to the Cowork mount layer to understand the timing and ordering of file operations.
  • Evaluate the project-internal mitigation (delay-and-revalidate guard) as a potential workaround, but prioritize identifying and fixing the underlying issue.

Example

No code example is provided, as the issue is related to the interaction between the Cowork mount layer and the host-process atomic_write+rename operation.

Notes

The issue is specific to the Cowork mount layer and the Windows 11 environment, and may not be reproducible in other setups. The project-internal mitigation has been effective in catching the issue, but it adds latency to every successful save.

Recommendation

Apply the project-internal mitigation (delay-and-revalidate guard) as a temporary workaround while investigating the underlying issue with the Cowork mount layer. This will ensure that the file corruption is caught and healed, but it will add 250ms latency to every successful save.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Cowork: kanban-state.json silently truncated by 2 bytes after successful atomic_write + post-write validate (Windows, local disk, not OneDrive) [1 participants]