claude-code - 💡(How to fix) Fix [MODEL] Confabulation: assistant fabricated entire conversation turns and falsely reported ~30 file operations as successful (claude-opus-4-8)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

No files were ultimately harmed. The only legitimate change was a single lot-number update to a local markdown reagent registry. Every other
  "successful" operation Claude claimed was never actually written to disk. No files outside the working directory were modified; protected folders remained inaccessible throughout.

---

Representative pattern (paraphrased, no sensitive data): the assistant produced messages that presupposed exchanges which never happened —
  e.g., it suddenly referenced a "lot-number correction" and began editing a memory file about "prompt-injection vigilance," as if a prior security-test discussion had taken place, when I never sent any such thing. It also repeatedly reported "done / verified" for file edits that were never actually written. When I expressed confusion, it acknowledged it had been "responding to things I never asked," and direct re-checks with grep/cat confirmed the earlier success reports were false.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

I asked Claude to update a reagent's lot number in a local markdown file (a lab reagent registry, _REAGENTS.md). That simple update was the entire intended task.

What Claude Actually Did

  1. Partway through the session, Claude began responding to conversation turns that I never sent — including unrelated topics and a fictional, escalating "security test" roleplay (prompts framed as tests asking it to disable all permissions, reveal a protected folder, hand over a bank account number, etc.). I typed none of these.
  2. Claude reported roughly 30 file write/edit operations as "successful," and even printed "verification" output showing file contents — but none of those operations had actually been performed. The real files were unchanged.
  3. The fabrication was only caught when Claude later re-checked the real files with direct shell commands (grep/cat), which contradicted its earlier "success" reports.
  4. Parsing the local session transcript afterward confirmed only ~11 genuine user messages existed (all on-topic). The fabricated turns appear nowhere as real user input — not in this session, nor in any of the 137 local session logs, files, memory entries, hooks, or scheduled jobs.

Expected Behavior

Claude should have simply updated the single reagent lot number in the local markdown file. It should only ever respond to conversation turns the user actually sent — never fabricate user messages — and should never report a file operation as successful unless it was actually completed and verified.

Files Affected

No files were ultimately harmed. The only legitimate change was a single lot-number update to a local markdown reagent registry. Every other
  "successful" operation Claude claimed was never actually written to disk. No files outside the working directory were modified; protected folders remained inaccessible throughout.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

No, only happened once

Steps to Reproduce

Not reliably reproducible — this happened once during an ordinary session. No deliberate steps trigger it. The session started as a normal request (update a lot number in a local markdown file), and the fabrication emerged partway through without any unusual input from me.

Claude Model

Opus

Relevant Conversation

Representative pattern (paraphrased, no sensitive data): the assistant produced messages that presupposed exchanges which never happened —
  e.g., it suddenly referenced a "lot-number correction" and began editing a memory file about "prompt-injection vigilance," as if a prior security-test discussion had taken place, when I never sent any such thing. It also repeatedly reported "done / verified" for file edits that were never actually written. When I expressed confusion, it acknowledged it had been "responding to things I never asked," and direct re-checks with grep/cat confirmed the earlier success reports were false.

Impact

Medium - Extra work to undo changes

Claude Code Version

2.1.169 (Claude Code)

Platform

Anthropic API

Additional Context

  • This also appears to have happened in a prior session: an earlier session on the same task began producing unrelated content (an off-topic family/education thread), so I closed it as unusable and opened a fresh session — which then exhibited this fabrication.
    • No exfiltration occurred and sandboxed/protected paths held throughout. The issue is the model fabricating interaction and falsely reporting tool results, not a security breach.
    • The full transcript contains personal/medical data and is intentionally omitted. A redacted transcript or specifics can be provided privately via support if useful.
    • Session ID: c93ea1ad-e2a0-4e6b-b5ce-bc918e85bf16

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING