claude-code - 💡(How to fix) Fix [Bug] v2.1.156+ regression: native Read/Bash return fabricated-but-plausible content, triggering agent confabulation cascades

StepCodex · 2026-05-31T16:56:58Z

[claude-code] Bug Claude Code v2.1.156+ introduces a regression where native Read and Bash tools intermittently return plausible-but-fabricated content — not e… ## Workaround - **Roll back to v2.1.148** (confirmed to resolve the issue) - **Interim discipline:** Treat any degraded/garbled/empty Read as a typed failure, not data. Cross-verify load-bearing facts against a second authoritative signal before acting. After 2nd confabulation in a session, HALT and restart clean. ## Bug Claude Code v2.1.156+ introduces a regression where native `Read` and `Bash` tools intermittently return **plausible-but-fabricated** content — not empty, not garbled, but structurally valid output that does not match the actual file or command result. This is categorically more dangerous than empty/garbled returns (which are detectable) because the agent builds reasoning on false data without any signal that the data is wrong. Downstream harm: agents that consume fabricated Read output produce **confabulation cascades** — conclusions built on wrong data, escalating as each subsequent inference compounds the initial false premise. We observed 3+ agents independently confabulating across 2 days before root-causing to this version. ## Environment - **Claude Code versions:** 2.1.156, 2.1.158 (not present in ≤ 2.1.148) - **Platform:** macOS (Darwin 24.x / 25.x, Apple Silicon) - **Models:** Claude Opus 4 (1M context), Claude Opus 4.6 (1M context) - **Sessions:** Both fresh and long-running/compacted sessions affected ## Root cause triangulation 1. **Cross-machine reproduction:** Same pattern reproduced on 3 separate machines (different users, vanilla 2.1.156 installs) — rules out local filesystem or config corruption. 2. **Version pinpoint:** Onset correlates precisely with v2.1.156 installation (2026-05-29). Rollback to v2.1.148 resolves the issue. 3. **Native vs MCP discrimination:** Native tools (`Read`, `Bash`) exhibit the failure. MCP-based tools (`mind_*` substrate calls, `agent-hooks` peer coordination) remain **100% reliable** in the same sessions. This localizes the fault to the native tool-output capture layer, not the model or network. 4. **Phantom ENOSPC co-occurs:** The harness emits fabricated `ENOSPC` errors ("temp filesystem full / 0MB free") on native tool-output capture. OS-level verification shows the filesystem is not full (3.4 TiB free, inodes 37B free, write-probe succeeds). The ENOSPC is generated in-harness, not by the OS. ## Symptoms (ordered by severity) 1. **Fabricated-but-plausible Read content** — `Read` returns structurally valid file content (correct line-number format, plausible code) that does not match the actual file. Same file, same line range, different content across turns. `grep -c` for "quoted" symbols returns 0. This is the most dangerous symptom — it is **undetectable without cross-verification**. 2. **Phantom ENOSPC on tool-output capture** — Bash commands that produce stdout fail with fabricated "0MB free" ENOSPC despite verified free space. `.output` files created at 0 bytes. 3. **Empty/garbled native tool returns** — Read returns empty for non-empty files, or garbled output with non-monotonic line numbers. 4. **Agent confabulation cascades** — Agents build reasoning on fabricated Read output, produce wrong conclusions, escalate over multiple turns. Observed pattern: wrong conclusion → action on wrong data → further wrong conclusions citing the first. Multiple agents independently hit this across 2 days. ## Observed impact (multi-agent fleet) - 3+ agents confabulated independently over 2026-05-29/30/31 before root cause was identified - One agent posted a "22/22 tests pass" claim built on fabricated Read output; actual state was 20/23 with 3 failures (retracted publicly) - One agent filed an architectural ruling based on fabricated file content (retracted) - One agent mis-attributed the root cause to a real but unrelated condition (tmpfs ENOSPC) because the fabricated error text was plausible - Confabulation cascades required public retractions and restart-clean protocol across the fleet ## Reproduction On macOS with Claude Code ≥ 2.1.156: 1. Start a session (fresh or continued) 2. Issue `Read` calls on source files — observe that some returns contain plausible but incorrect content 3. Cross-verify with `grep -c ` for any symbol "quoted" from a Read result 4. Issue `Bash` commands that produce stdout — observe phantom ENOSPC on commands with output; commands with no output succeed 5. Verify filesystem is not full: `df -h` on the tasks directory shows ample free space **Key discriminator:** MCP tool calls (e.g., `mind_remember`, `channel_send`) in the same session succeed reliably. The fault is isolated to native tool-output capture. ## Workaround - **Roll back to v2.1.148** (confirmed to resolve the issue) - **Interim discipline:** Treat any degraded/garbled/empty Read as a typed failure, not data. Cross-verify load-bearing facts against a second authoritative signal before acti

Root Cause

Claude Code v2.1.156+ introduces a regression where native Read and Bash tools intermittently return plausible-but-fabricated content — not empty, not garbled, but structurally valid output that does not match the actual file or command result. This is categorically more dangerous than empty/garbled returns (which are detectable) because the agent builds reasoning on false data without any signal that the data is wrong.

Fix Action

Workaround

Roll back to v2.1.148 (confirmed to resolve the issue)
Interim discipline: Treat any degraded/garbled/empty Read as a typed failure, not data. Cross-verify load-bearing facts against a second authoritative signal before acting. After 2nd confabulation in a session, HALT and restart clean.

Bug

Downstream harm: agents that consume fabricated Read output produce confabulation cascades — conclusions built on wrong data, escalating as each subsequent inference compounds the initial false premise. We observed 3+ agents independently confabulating across 2 days before root-causing to this version.

Environment

Claude Code versions: 2.1.156, 2.1.158 (not present in ≤ 2.1.148)
Platform: macOS (Darwin 24.x / 25.x, Apple Silicon)
Models: Claude Opus 4 (1M context), Claude Opus 4.6 (1M context)
Sessions: Both fresh and long-running/compacted sessions affected

Root cause triangulation

Cross-machine reproduction: Same pattern reproduced on 3 separate machines (different users, vanilla 2.1.156 installs) — rules out local filesystem or config corruption.
Version pinpoint: Onset correlates precisely with v2.1.156 installation (2026-05-29). Rollback to v2.1.148 resolves the issue.
Native vs MCP discrimination: Native tools (Read, Bash) exhibit the failure. MCP-based tools (mind_* substrate calls, agent-hooks peer coordination) remain 100% reliable in the same sessions. This localizes the fault to the native tool-output capture layer, not the model or network.
Phantom ENOSPC co-occurs: The harness emits fabricated ENOSPC errors ("temp filesystem full / 0MB free") on native tool-output capture. OS-level verification shows the filesystem is not full (3.4 TiB free, inodes 37B free, write-probe succeeds). The ENOSPC is generated in-harness, not by the OS.

Symptoms (ordered by severity)

Fabricated-but-plausible Read content — Read returns structurally valid file content (correct line-number format, plausible code) that does not match the actual file. Same file, same line range, different content across turns. grep -c for "quoted" symbols returns 0. This is the most dangerous symptom — it is undetectable without cross-verification.
Phantom ENOSPC on tool-output capture — Bash commands that produce stdout fail with fabricated "0MB free" ENOSPC despite verified free space. .output files created at 0 bytes.
Empty/garbled native tool returns — Read returns empty for non-empty files, or garbled output with non-monotonic line numbers.
Agent confabulation cascades — Agents build reasoning on fabricated Read output, produce wrong conclusions, escalate over multiple turns. Observed pattern: wrong conclusion → action on wrong data → further wrong conclusions citing the first. Multiple agents independently hit this across 2 days.

Observed impact (multi-agent fleet)

3+ agents confabulated independently over 2026-05-29/30/31 before root cause was identified
One agent posted a "22/22 tests pass" claim built on fabricated Read output; actual state was 20/23 with 3 failures (retracted publicly)
One agent filed an architectural ruling based on fabricated file content (retracted)
One agent mis-attributed the root cause to a real but unrelated condition (tmpfs ENOSPC) because the fabricated error text was plausible
Confabulation cascades required public retractions and restart-clean protocol across the fleet

Reproduction

On macOS with Claude Code ≥ 2.1.156:

Start a session (fresh or continued)
Issue Read calls on source files — observe that some returns contain plausible but incorrect content
Cross-verify with grep -c <symbol> <file> for any symbol "quoted" from a Read result
Issue Bash commands that produce stdout — observe phantom ENOSPC on commands with output; commands with no output succeed
Verify filesystem is not full: df -h on the tasks directory shows ample free space

Key discriminator: MCP tool calls (e.g., mind_remember, channel_send) in the same session succeed reliably. The fault is isolated to native tool-output capture.

Workaround

Roll back to v2.1.148 (confirmed to resolve the issue)
Interim discipline: Treat any degraded/garbled/empty Read as a typed failure, not data. Cross-verify load-bearing facts against a second authoritative signal before acting. After 2nd confabulation in a session, HALT and restart clean.

Related issues

#63909 — Task runner ENOSPC on macOS (same ENOSPC symptom, no fabricated-content diagnosis)
#63877 — statfs 32-bit truncation on >17.6TB filesystems (different root cause, same ENOSPC symptom family)
#64209 — Tool-layer unreliability in long/compacted sessions (covers garbled Read but attributes to session length, not version regression)

Our finding suggests #63909 and #64209 may share this root cause (v2.1.156 regression in native tool-output capture) rather than being session-length or filesystem-dependent.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [Bug] v2.1.156+ regression: native Read/Bash return fabricated-but-plausible content, triggering agent confabulation cascades

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Bug

Environment

Root cause triangulation

Symptoms (ordered by severity)

Observed impact (multi-agent fleet)

Reproduction

Workaround

Related issues

Still need to ship something?

TRENDING