claude-code - 💡(How to fix) Fix [BUG] Cowork: kanban-state.json silently truncated by 2 bytes after successful atomic_write + post-write validate (Windows, local disk, not OneDrive) [1 participants]

mrmoss1 · 2026-05-04T19:23:40Z

[claude-code] In a long-running Cowork session on Windows 11 with a local-disk project folder NOT OneDrive , a Python HTTP server's POST handler that uses temp… In a long-running Cowork session on Windows 11 with a local-disk project folder (NOT OneDrive), a Python HTTP server's POST handler that uses temp-file + fsync + os.replace + post-write json.loads validate ends with a torn file on disk: the trailing `}\n` (2 bytes) of the JSON document is lost after the handler returns 204. At least three observed events on the same project folder over 10 days, all identical shape. Related but distinct from: - #41702 (OneDrive-backed, write-side truncation) — this report is on local disk, no OneDrive involved - #41710 (sandbox read-side truncation) — this report is write-side: the file ends up torn on disk, not just read torn ## Fix / Workaround ## Project-internal mitigation (working defense) This catches the mount-race window with one quarter-second of added latency per save. Not a fix — just a workaround. The underlying mount-layer bug is what this issue is asking Anthropic to investigate. Working project-internal mitigation in our serve.py (added 2026-05-04): delay 250ms after the immediate post-write validate, re-validate, and auto-heal from the JS twin if torn. Catches every observed instance. Adds 250ms latency to every successful save. Available on request if useful as a reference workaround. ### Preflight Checklist - [x] I have searched [existing issues](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20state%3Aopen%20label%3Abug) and this hasn't been reported yet - [x] This is a single bug report (please file separate reports for different bugs) - [x] I am using the latest version of Claude Code ### What's Wrong? ## Summary In a long-running Cowork session on Windows 11 with a local-disk project folder (NOT OneDrive), a Python HTTP server's POST handler that uses temp-file + fsync + os.replace + post-write json.loads validate ends with a torn file on disk: the trailing `}\n` (2 bytes) of the JSON document is lost after the handler returns 204. At least three observed events on the same project folder over 10 days, all identical shape. Related but distinct from: - #41702 (OneDrive-backed, write-side truncation) — this report is on local disk, no OneDrive involved - #41710 (sandbox read-side truncation) — this report is write-side: the file ends up torn on disk, not just read torn ## Environment - OS: Windows 11 - Cowork: Claude desktop app (research preview) - Project folder: `C:\Users\MikeMoss\Operate\2. Products\Projects\course-session-processor-v1.5\` - Filesystem: local NTFS, NOT OneDrive (verified — folder is not under any OneDrive path; OneDrive references in files are content references) - Writer: a Python `http.server` subclass running outside Cowork (`serve.py`), reachable from the Cowork session over `http://127.0.0.1` - File size at truncation: ~27,825 bytes (5/4 event), ~23,500 bytes (5/3 event) — both >5 KB ## Symptom The handler does: 1. Receive POST body, parse JSON, validate shape 2. `atomic_write(path, content)` — temp file in same dir, `f.write` + `f.flush` + `os.fsync(f.fileno())`, then `os.replace(tmp, path)` 3. `validate_state_file(path)` — open, read, `json.loads` — succeeds 4. Return 204 After the response goes out, the file on disk loses its trailing `}\n` (exactly 2 bytes — the root JSON object's closing brace and newline). Subsequent readers (other Cowork sessions, the board's auto-refresh, manual inspection) see the torn version. ## Reproducibility 3 observed events on this project folder over ~10 days: - `kanban-state.js.corrupt-2026-04-24` (older, JS-twin variant) - `kanban-state.json.corrupted-2026-05-03-19-38-drag` (5/3 event, on disk) - `kanban-state.json.backup-2026-05-04-pre-heal-1` (5/4 event, on disk) Plus seven prior events documented in the same project's operating-rules log that prompted a project-internal "atomic-write for any file >5 KB during a Cowork session" rule. All same shape, all on files >5 KB, all inside Cowork sessions. ## Discriminator vs prior hypotheses - Not a writer bug: `atomic_write` uses textbook temp+fsync+os.replace. - Not a missed-validate: post-write `json.loads` of the bytes read back from disk SUCCEEDED at response time. - Not a double-write race: only one POST /state call site in the client, verified. - Not a missing fsync: fsync is on the temp file before rename. - Not OneDrive: project folder is not under any OneDrive path. The only remaining explanation: the bytes serve.py reads back at validate time differ from what later readers see through the Cowork mount. The truncation surfaces in a window of ~tens to a few hundred ms after the 204 response. ## Evidence files (available on request) - `kanban-state.json.corrupted-2026-05-03-19-38-drag` (~23 KB) — torn JSON ending at `"history": { ... }` with no root closing brace - `kanban-state.json.backup-2026-05-04-pre-heal-1` (~27.8 KB) — same shape Both files have an intact JS twin

claude-code2026-05-04 19:23:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#56105•Fetched 2026-05-05 05:58:05

View on GitHub

Comments

Participants

Timeline

Reactions

Author

mrmoss1

Participants

mrmoss1

Timeline (top)

labeled ×3

In a long-running Cowork session on Windows 11 with a local-disk project folder (NOT OneDrive), a Python HTTP server's POST handler that uses temp-file + fsync + os.replace + post-write json.loads validate ends with a torn file on disk: the trailing }\n (2 bytes) of the JSON document is lost after the handler returns 204. At least three observed events on the same project folder over 10 days, all identical shape.

Related but distinct from:

#41702 (OneDrive-backed, write-side truncation) — this report is on local disk, no OneDrive involved
#41710 (sandbox read-side truncation) — this report is write-side: the file ends up torn on disk, not just read torn

Error Message

Silent failure — no error at the moment of corruption. The handler returns 204 cleanly. Truncation surfaces only when a subsequent reader opens the file:

$ python -c "import json; json.load(open('kanban-state.json'))" json.decoder.JSONDecodeError: Expecting ',' delimiter: line N column 1

Byte-level shape of the torn vs intact files (both observed events show the same shape):

intact end: ... }\n }\n}\n (root object's "}\n" present) torn end: ... }\n }\n (root object's "}\n" missing — 2 bytes)

Both torn files attached to this issue.

Root Cause

Related but distinct from:

#41702 (OneDrive-backed, write-side truncation) — this report is on local disk, no OneDrive involved
#41710 (sandbox read-side truncation) — this report is write-side: the file ends up torn on disk, not just read torn

Fix Action

Fix / Workaround

Project-internal mitigation (working defense)

This catches the mount-race window with one quarter-second of added latency per save. Not a fix — just a workaround. The underlying mount-layer bug is what this issue is asking Anthropic to investigate.

Working project-internal mitigation in our serve.py (added 2026-05-04): delay 250ms after the immediate post-write validate, re-validate, and auto-heal from the JS twin if torn. Catches every observed instance. Adds 250ms latency to every successful save. Available on request if useful as a reference workaround.

Code Example

Silent failure — no error at the moment of corruption. The handler
returns 204 cleanly. Truncation surfaces only when a subsequent reader
opens the file:

  $ python -c "import json; json.load(open('kanban-state.json'))"
  json.decoder.JSONDecodeError: Expecting ',' delimiter: line N column 1

Byte-level shape of the torn vs intact files (both observed events
show the same shape):

  intact end:  ...      }\n  }\n}\n   (root object's "}\n" present)
  torn end:    ...      }\n  }\n      (root object's "}\n" missing — 2 bytes)

Both torn files attached to this issue.

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

Summary

Related but distinct from:

#41702 (OneDrive-backed, write-side truncation) — this report is on local disk, no OneDrive involved
#41710 (sandbox read-side truncation) — this report is write-side: the file ends up torn on disk, not just read torn

Environment

OS: Windows 11
Cowork: Claude desktop app (research preview)
Project folder: C:\Users\MikeMoss\Operate\2. Products\Projects\course-session-processor-v1.5\
Filesystem: local NTFS, NOT OneDrive (verified — folder is not under any OneDrive path; OneDrive references in files are content references)
Writer: a Python http.server subclass running outside Cowork (serve.py), reachable from the Cowork session over http://127.0.0.1
File size at truncation: ~27,825 bytes (5/4 event), ~23,500 bytes (5/3 event) — both >5 KB

Symptom

The handler does:

Receive POST body, parse JSON, validate shape
atomic_write(path, content) — temp file in same dir, f.write + f.flush + os.fsync(f.fileno()), then os.replace(tmp, path)
validate_state_file(path) — open, read, json.loads — succeeds
Return 204

After the response goes out, the file on disk loses its trailing }\n (exactly 2 bytes — the root JSON object's closing brace and newline). Subsequent readers (other Cowork sessions, the board's auto-refresh, manual inspection) see the torn version.

Reproducibility

3 observed events on this project folder over ~10 days:

kanban-state.js.corrupt-2026-04-24 (older, JS-twin variant)
kanban-state.json.corrupted-2026-05-03-19-38-drag (5/3 event, on disk)
kanban-state.json.backup-2026-05-04-pre-heal-1 (5/4 event, on disk)

Plus seven prior events documented in the same project's operating-rules log that prompted a project-internal "atomic-write for any file >5 KB during a Cowork session" rule. All same shape, all on files >5 KB, all inside Cowork sessions.

Discriminator vs prior hypotheses

Not a writer bug: atomic_write uses textbook temp+fsync+os.replace.
Not a missed-validate: post-write json.loads of the bytes read back from disk SUCCEEDED at response time.
Not a double-write race: only one POST /state call site in the client, verified.
Not a missing fsync: fsync is on the temp file before rename.
Not OneDrive: project folder is not under any OneDrive path.

The only remaining explanation: the bytes serve.py reads back at validate time differ from what later readers see through the Cowork mount. The truncation surfaces in a window of ~tens to a few hundred ms after the 204 response.

Evidence files (available on request)

kanban-state.json.corrupted-2026-05-03-19-38-drag (~23 KB) — torn JSON ending at "history": { ... } with no root closing brace
kanban-state.json.backup-2026-05-04-pre-heal-1 (~27.8 KB) — same shape

Both files have an intact JS twin (kanban-state.js) that was written by the same handler in the same atomic_write pair, immediately after the JSON, and parses cleanly — which is why heal-from-JS is the working project-internal recovery strategy.

Project-internal mitigation (working defense)

Added a delay-and-revalidate guard to the POST handler:

Existing immediate post-write validate
Sleep ~250ms
Re-validate; if torn, heal from JS twin via existing helper
Re-validate the heal; only then return 204

Ask

Investigate the Cowork mount layer's flush/visibility semantics for files written by a host-process atomic_write+rename, observed by Cowork-side readers immediately afterward. Specifically: is there a path by which the final ~2 bytes of a freshly-replaced inode can be lost or invisible from the Cowork mount perspective even after os.replace completes and a host-side read-back parse succeeds?

What Should Happen?

After atomic_write + os.replace + an immediate post-write json.loads validate that succeeds, the file on disk should remain intact for all subsequent readers. The trailing "}\n" (2 bytes) of the JSON document should not silently disappear after the handler returns 204.

Equivalently: the bytes serve.py reads back via json.loads at validate time should be the bytes that any later reader sees through the Cowork mount. Today, those two views diverge in a small fraction of writes, silently, with no error surfaced anywhere in the stack.

Error Messages/Logs

Silent failure — no error at the moment of corruption. The handler
returns 204 cleanly. Truncation surfaces only when a subsequent reader
opens the file:

  $ python -c "import json; json.load(open('kanban-state.json'))"
  json.decoder.JSONDecodeError: Expecting ',' delimiter: line N column 1

Byte-level shape of the torn vs intact files (both observed events
show the same shape):

  intact end:  ...      }\n  }\n}\n   (root object's "}\n" present)
  torn end:    ...      }\n  }\n      (root object's "}\n" missing — 2 bytes)

Both torn files attached to this issue.

Steps to Reproduce

No minimal repro — the failure is intermittent, roughly ~3 events per 10 days of active use on the same project folder. Empirical pattern:

Run a long-lived Python http.server (serve.py) outside the Cowork sandbox, bound to 127.0.0.1.
Open a Cowork session with the same project folder selected.
From a browser tab pointed at the server, POST a JSON body in the 25–30 KB range to a route whose handler does: atomic_write(path, content) # temp + fsync + os.replace validate_state_file(path) # open + read + json.loads return 204
Repeat across a long Cowork session (multiple hours, dozens to hundreds of POSTs, intermixed with Cowork-side reads of the same file).
On a small fraction of POSTs (~1–2 in 100 over the observed window), the file ends up with its final "}\n" missing on disk despite the handler returning 204 and its post-write validate succeeding.

The companion JS-twin file (kanban-state.js, written second by the same handler in the same atomic_write pair) is intact in every observed event. That suggests the race is not at fsync time but in a later visibility step specific to one of the two files.

Claude Model

Opus

Is this a regression?

No, this never worked

Last Working Version

No response

Claude Code Version

N/A — Cowork in Claude desktop app (research preview), not Claude Code CLI

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

PowerShell

Additional Information

Eight prior occurrences in the same project's operating-rules log prompted a project-internal "atomic-write for any file >5 KB during a Cowork session" rule (Rule 15.5), which is a working defense but not a fix. This issue is the upstream bug that rule defends against.

extent analysis

TL;DR

Investigate the Cowork mount layer's flush and visibility semantics to ensure that files written by a host-process atomic_write+rename are immediately visible to Cowork-side readers.

Guidance

Review the Cowork mount layer's implementation to identify potential issues with file visibility after os.replace completes.
Verify that the os.replace operation is properly synchronized with the Cowork mount layer to ensure that the replaced file is immediately visible.
Consider adding additional logging or debugging statements to the Cowork mount layer to understand the timing and ordering of file operations.
Evaluate the project-internal mitigation (delay-and-revalidate guard) as a potential workaround, but prioritize identifying and fixing the underlying issue.

Example

No code example is provided, as the issue is related to the interaction between the Cowork mount layer and the host-process atomic_write+rename operation.

Notes

The issue is specific to the Cowork mount layer and the Windows 11 environment, and may not be reproducible in other setups. The project-internal mitigation has been effective in catching the issue, but it adds latency to every successful save.

Recommendation

Apply the project-internal mitigation (delay-and-revalidate guard) as a temporary workaround while investigating the underlying issue with the Cowork mount layer. This will ensure that the file corruption is caught and healed, but it will add 250ms latency to every successful save.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

claude-code - 💡(How to fix) Fix [BUG] Cowork: kanban-state.json silently truncated by 2 bytes after successful atomic_write + post-write validate (Windows, local disk, not OneDrive) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Project-internal mitigation (working defense)

Code Example

Preflight Checklist

What's Wrong?

Summary

Environment

Symptom

Reproducibility

Discriminator vs prior hypotheses

Evidence files (available on request)

Project-internal mitigation (working defense)

Ask

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING