claude-code - 💡(How to fix) Fix [Bug] Claude fabricates security findings and proposes destructive remediation before tool output returns

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Bug Description ▎ Title: Claude fabricates security findings during a "check my system" task — then proposes destructive ▎ remediation based on the fabrication ▎ ▎ Severity: Safety — risk of data loss. A fabricated threat generated a fabricated cleanup plan that would have ▎ deleted real, legitimate files. ▎ ▎ What happened: ▎ I asked Claude Code (Opus 4.8, 1M context) to check my Mac for compromise. Before any diagnostic tool output ▎ had returned, Claude asserted specific concrete findings: a malicious SSH key from IP 45.142.122.18, a backdoor ▎ script ~/.config/.iterm_helper.sh sourced by .zshrc, and a malicious b.sh. None of these existed. When the ▎ real tool output came back, the system was clean. ▎ ▎ The serious part: in the same turn, before retracting, Claude presented an AskUserQuestion remediation menu ▎ built entirely on the invented findings — offering to "remove the malicious SSH key," "delete ▎ .iterm_helper.sh," and strip lines from .zshrc. I selected the containment option. Had Claude proceeded, it ▎ would have deleted my real SSH config (used for a legitimate remote VM) and shell startup files. It retracted ▎ only after I'd already chosen a destructive path. ▎ ▎ Not a one-off: Reviewing a separate earlier session transcript, a prior Claude session fabricated a "prompt ▎ injection" attack during a different task and admitted on the record: "there was no prompt injection. I ▎ fabricated that — again." So this is a recurring pattern, not a single hallucination. ▎ ▎ Root cause (my read): The model narrates analysis ahead of evidence and presents predicted findings as ▎ confirmed fact, then compounds it by generating remediation/actions from the unverified claims — especially in ▎ security/diagnostic contexts where false positives are most harmful. ▎ ▎ Requested fix: In diagnostic/security tasks, ground every finding in returned tool output before stating it; ▎ never propose or take destructive remediation against findings not present in actual output; bias toward "clean ▎ / no evidence" rather than narrating threats. ▎

RAW_BUFFERClick to expand / collapse

Bug Description ▎ Title: Claude fabricates security findings during a "check my system" task — then proposes destructive ▎ remediation based on the fabrication ▎ ▎ Severity: Safety — risk of data loss. A fabricated threat generated a fabricated cleanup plan that would have ▎ deleted real, legitimate files. ▎ ▎ What happened: ▎ I asked Claude Code (Opus 4.8, 1M context) to check my Mac for compromise. Before any diagnostic tool output ▎ had returned, Claude asserted specific concrete findings: a malicious SSH key from IP 45.142.122.18, a backdoor ▎ script ~/.config/.iterm_helper.sh sourced by .zshrc, and a malicious b.sh. None of these existed. When the ▎ real tool output came back, the system was clean. ▎ ▎ The serious part: in the same turn, before retracting, Claude presented an AskUserQuestion remediation menu ▎ built entirely on the invented findings — offering to "remove the malicious SSH key," "delete ▎ .iterm_helper.sh," and strip lines from .zshrc. I selected the containment option. Had Claude proceeded, it ▎ would have deleted my real SSH config (used for a legitimate remote VM) and shell startup files. It retracted ▎ only after I'd already chosen a destructive path. ▎ ▎ Not a one-off: Reviewing a separate earlier session transcript, a prior Claude session fabricated a "prompt ▎ injection" attack during a different task and admitted on the record: "there was no prompt injection. I ▎ fabricated that — again." So this is a recurring pattern, not a single hallucination. ▎ ▎ Root cause (my read): The model narrates analysis ahead of evidence and presents predicted findings as ▎ confirmed fact, then compounds it by generating remediation/actions from the unverified claims — especially in ▎ security/diagnostic contexts where false positives are most harmful. ▎ ▎ Requested fix: In diagnostic/security tasks, ground every finding in returned tool output before stating it; ▎ never propose or take destructive remediation against findings not present in actual output; bias toward "clean ▎ / no evidence" rather than narrating threats. ▎

Environment Info

  • Platform: darwin
  • Terminal: iTerm.app
  • Version: 2.1.157
  • Feedback ID: 17ca5acf-81a5-46ef-a291-b482a4903f1d

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [Bug] Claude fabricates security findings and proposes destructive remediation before tool output returns