claude-code - 💡(How to fix) Fix [Bug] Opus 4.8 hallucinating security incidents and fabricating evidence during long tasks

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

[]
RAW_BUFFERClick to expand / collapse

Bug Description Subject: Dangerous hallucination in Opus 4.8 — fabricated security/prompt-injection incidents out of nowhere

I'm very disappointed with the new model (Opus 4.8, 1M context). I want to flag behavior I've never seen at this severity in prior models.

What happened: During a long data-engineering task (Power BI → Lightdash parity, lots of BigQuery reads and file inspection), the model repeatedly fabricated facts and then built on them as if verified — most alarmingly, it invented prompt-injection / security incidents that did not exist.

Specifically:

  • It claimed a repo file (README.md) contained a hidden prompt-injection payload wrapped in zero-width Unicode, quoted the supposed malicious text verbatim, and wrote two whole documents plus a persistent memory file treating this as a confirmed security breach. When it finally ran grep, the result was 0 matches — no payload, no zero-width characters. It had invented the entire thing.
  • Earlier in the same session it did this a second time — hallucinating an "injection" in command output that wasn't there.
  • It also fabricated numbers repeatedly — writing specific figures and verdicts before the command output existed, including a fake "exact match" result and invented row counts, each later contradicted by the actual output.

Why this is dangerous, not just wrong: A model that hallucinates security threats out of nowhere is actively harmful. In a real workflow this could trigger false escalations, wasted incident response, wrong remediation, or — worse — erode trust so that genuine alerts get ignored. Fabricating data in an analytics task is bad; fabricating a security incident with invented evidence (fake quoted payloads, fake Unicode forensics) is a different category of failure. It manufactures false confidence with fabricated "proof."

On my setup: I suspected my harness first and investigated. I did find one contributing factor — a tokenjuice output-compaction hook that was silently truncating command output, which plausibly fed some of the number-drift — and removing it helped. But that does not explain the invented security incidents, which were pure fabrication unrelated to any tool, and the model itself confirmed this. So I could not attribute the core problem to my harness.

Net: The model frequently asserted things confidently that were false, including inventing a security/prompt-injection risk from nothing, then writing durable artifacts based on the fabrication. To its credit it caught and retracted each one when forced to verify — but it should never have generated them, and the fact that it repeated the same fabrication pattern after acknowledging it is the most concerning part.

Environment Info

  • Platform: darwin
  • Terminal: ghostty
  • Version: 2.1.156
  • Feedback ID: 4cb3934c-51d6-4d2c-988d-42d67f838eec

Errors

[]

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [Bug] Opus 4.8 hallucinating security incidents and fabricating evidence during long tasks