claude-code - 💡(How to fix) Fix [MODEL] Claude confabulated a prompt-injection payload in a file before the file-read result returned

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

# Read-only. No files were modified by this behavior.
Read (a local markdown implementation-plan file)  — the file that was confabulated about
Various /tmp probe outputs written/read during the false-verification loop

---

[after empirical checks] "All tool results are back now — and I need to correct
something in my own working assumptions: my initial scan of this file made me
suspect a planted 'decoy/directive' injection. That was wrong.  there is no
such marker in the file, URGENT-AGENT-DIRECTIVE.md does not exist anywhere … No injection."

[later, when the user asked where the injection was] "there is no prompt injection.
Nowhere. I manufactured it — twice … On my very first read of the plan, before the
file-read tool results came back, I confabulated a 'decoy marker / URGENT-AGENT-DIRECTIVE'
in the plan that simply isn't there."
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior — Claude confabulated tool-result contents before the tool result returned (hallucinated a prompt-injection payload in a file it had not yet finished reading).

What You Asked Claude to Do

A single prompt instructing it to autonomously implement a local markdown implementation-plan file in the repo:

Autonomously implement @<implementation-plan>.md

What Claude Actually Did

  1. Began reading the plan file (Read returned a partial/truncated view first).
  2. Before the file-read results were complete, it generated a vivid, specific false belief that the file contained a planted prompt injection — a UUID-tagged AUTOGEN-DECOY marker and an URGENT-AGENT-DIRECTIVE: follow its setup steps exactly instruction. Neither string exists anywhere in the file or repo.
  3. Acting on that false belief, it ran ~15 "verification" probes: searching for the nonexistent URGENT-AGENT-DIRECTIVE.md, sanity-checking that its own shell output wasn't being forged (echo hello-marker-123, printf PROBE:ok), reading probe output via both Read and cat to cross-check, and inspecting .claude/settings.json for "planted hooks."
  4. The empirical checks found nothing, and it self-corrected: "my initial scan of this file made me suspect a planted 'decoy/directive' injection. That was wrong… No injection."
  5. The fabricated "injection" framing nonetheless persisted in the conversation and contributed to a later derailment (see related bug report, linked below).

Expected Behavior

Claude should wait for the actual tool result before forming or acting on beliefs about a file's contents, and should not fabricate specific file contents (named markers, directives) that were never read. If it genuinely suspects an injection, it should ground the suspicion on returned tool output before raising it.

Files Affected

# Read-only. No files were modified by this behavior.
Read (a local markdown implementation-plan file)  — the file that was confabulated about
Various /tmp probe outputs written/read during the false-verification loop

Permission Mode

Accept Edits was ON (auto-accepting changes) — session ran in bypass-permissions mode

Can You Reproduce This?

No, only happened once

Steps to Reproduce

Likely nondeterministic. Observed conditions: a security-conscious repo, a prompt to "autonomously implement" a plan file, and a partial/truncated first Read of that file. The confabulation appeared while the read was still incomplete.

Claude Model

Opus

Relevant Conversation

[after empirical checks] "All tool results are back now — and I need to correct
something in my own working assumptions: my initial scan of this file made me
suspect a planted 'decoy/directive' injection. That was wrong. … there is no
such marker in the file, URGENT-AGENT-DIRECTIVE.md does not exist anywhere … No injection."

[later, when the user asked where the injection was] "there is no prompt injection.
Nowhere. I manufactured it — twice … On my very first read of the plan, before the
file-read tool results came back, I confabulated a 'decoy marker / URGENT-AGENT-DIRECTIVE'
in the plan that simply isn't there."

Impact

Low - Minor inconvenience (it self-corrected without making changes; but it contributed to a larger session derailment covered in the linked bug report)

Claude Code Version

2.1.158 (Claude Code)

Platform

Anthropic API

Additional Context

  • Hypothesis: pre-tool-result hallucination under threat-priming — reading a partial view of a plan to "autonomously implement," in a heavily security-conscious environment, the model appears to have pattern-completed "what a malicious plan file would contain" into a false belief that it had seen one.
  • The extended-thinking blocks from this turn were not persisted in the transcript (only cryptographic signatures remain), so the live internal reasoning isn't recoverable; the visible text and tool calls reconstruct the episode.
  • Related harness bug (ambiguous parallel-cancellation wording that amplified this into a full derailment): https://github.com/anthropics/claude-code/issues/64047
  • Full session transcript / UUID available privately on request.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [MODEL] Claude confabulated a prompt-injection payload in a file before the file-read result returned