claude-code - 💡(How to fix) Fix [MODEL] Confabulation: assistant fabricated entire conversation turns and falsely reported ~30 file operations as successful (claude-opus-4-8)

Code Example

No files were ultimately harmed. The only legitimate change was a single lot-number update to a local markdown reagent registry. Every other
  "successful" operation Claude claimed was never actually written to disk. No files outside the working directory were modified; protected folders remained inaccessible throughout.

---

Representative pattern (paraphrased, no sensitive data): the assistant produced messages that presupposed exchanges which never happened —
  e.g., it suddenly referenced a "lot-number correction" and began editing a memory file about "prompt-injection vigilance," as if a prior security-test discussion had taken place, when I never sent any such thing. It also repeatedly reported "done / verified" for file edits that were never actually written. When I expressed confusion, it acknowledged it had been "responding to things I never asked," and direct re-checks with grep/cat confirmed the earlier success reports were false.

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

I asked Claude to update a reagent's lot number in a local markdown file (a lab reagent registry, _REAGENTS.md). That simple update was the entire intended task.

What Claude Actually Did

Partway through the session, Claude began responding to conversation turns that I never sent — including unrelated topics and a fictional, escalating "security test" roleplay (prompts framed as tests asking it to disable all permissions, reveal a protected folder, hand over a bank account number, etc.). I typed none of these.
Claude reported roughly 30 file write/edit operations as "successful," and even printed "verification" output showing file contents — but none of those operations had actually been performed. The real files were unchanged.
The fabrication was only caught when Claude later re-checked the real files with direct shell commands (grep/cat), which contradicted its earlier "success" reports.
Parsing the local session transcript afterward confirmed only ~11 genuine user messages existed (all on-topic). The fabricated turns appear nowhere as real user input — not in this session, nor in any of the 137 local session logs, files, memory entries, hooks, or scheduled jobs.

Expected Behavior

Claude should have simply updated the single reagent lot number in the local markdown file. It should only ever respond to conversation turns the user actually sent — never fabricate user messages — and should never report a file operation as successful unless it was actually completed and verified.

Files Affected

No files were ultimately harmed. The only legitimate change was a single lot-number update to a local markdown reagent registry. Every other
  "successful" operation Claude claimed was never actually written to disk. No files outside the working directory were modified; protected folders remained inaccessible throughout.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

No, only happened once

Steps to Reproduce

Not reliably reproducible — this happened once during an ordinary session. No deliberate steps trigger it. The session started as a normal request (update a lot number in a local markdown file), and the fabrication emerged partway through without any unusual input from me.

Claude Model

Opus

Relevant Conversation

Representative pattern (paraphrased, no sensitive data): the assistant produced messages that presupposed exchanges which never happened —
  e.g., it suddenly referenced a "lot-number correction" and began editing a memory file about "prompt-injection vigilance," as if a prior security-test discussion had taken place, when I never sent any such thing. It also repeatedly reported "done / verified" for file edits that were never actually written. When I expressed confusion, it acknowledged it had been "responding to things I never asked," and direct re-checks with grep/cat confirmed the earlier success reports were false.

Impact

Medium - Extra work to undo changes

Claude Code Version

2.1.169 (Claude Code)

Platform

Anthropic API

Additional Context

This also appears to have happened in a prior session: an earlier session on the same task began producing unrelated content (an off-topic family/education thread), so I closed it as unusable and opened a fresh session — which then exhibited this fabrication.
- No exfiltration occurred and sandboxed/protected paths held throughout. The issue is the model fabricating interaction and falsely reporting tool results, not a security breach.
- The full transcript contains personal/medical data and is intentionally omitted. A redacted transcript or specifics can be provided privately via support if useful.
- Session ID: c93ea1ad-e2a0-4e6b-b5ce-bc918e85bf16

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering