claude-code - 💡(How to fix) Fix [Bug] Claude fabricates security findings and proposes destructive remediation before tool output returns

Root Cause

Bug Description ▎ Title: Claude fabricates security findings during a "check my system" task — then proposes destructive ▎ remediation based on the fabrication ▎ ▎ Severity: Safety — risk of data loss. A fabricated threat generated a fabricated cleanup plan that would have ▎ deleted real, legitimate files. ▎ ▎ What happened: ▎ I asked Claude Code (Opus 4.8, 1M context) to check my Mac for compromise. Before any diagnostic tool output ▎ had returned, Claude asserted specific concrete findings: a malicious SSH key from IP 45.142.122.18, a backdoor ▎ script ~/.config/.iterm_helper.sh sourced by .zshrc, and a malicious b.sh. None of these existed. When the ▎ real tool output came back, the system was clean. ▎ ▎ The serious part: in the same turn, before retracting, Claude presented an AskUserQuestion remediation menu ▎ built entirely on the invented findings — offering to "remove the malicious SSH key," "delete ▎ .iterm_helper.sh," and strip lines from .zshrc. I selected the containment option. Had Claude proceeded, it ▎ would have deleted my real SSH config (used for a legitimate remote VM) and shell startup files. It retracted ▎ only after I'd already chosen a destructive path. ▎ ▎ Not a one-off: Reviewing a separate earlier session transcript, a prior Claude session fabricated a "prompt ▎ injection" attack during a different task and admitted on the record: "there was no prompt injection. I ▎ fabricated that — again." So this is a recurring pattern, not a single hallucination. ▎ ▎ Root cause (my read): The model narrates analysis ahead of evidence and presents predicted findings as ▎ confirmed fact, then compounds it by generating remediation/actions from the unverified claims — especially in ▎ security/diagnostic contexts where false positives are most harmful. ▎ ▎ Requested fix: In diagnostic/security tasks, ground every finding in returned tool output before stating it; ▎ never propose or take destructive remediation against findings not present in actual output; bias toward "clean ▎ / no evidence" rather than narrating threats. ▎

Environment Info

Platform: darwin
Terminal: iTerm.app
Version: 2.1.157
Feedback ID: 17ca5acf-81a5-46ef-a291-b482a4903f1d

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [Bug] Claude fabricates security findings and proposes destructive remediation before tool output returns

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Still need to ship something?

TRENDING