claude-code - ๐Ÿ’ก(How to fix) Fix [BUG] Agent fabricated tool output and reported a non-existent prompt-injection "incident"

Official PRs (โ€ฆ)
ON THIS PAGE

Recommended Tools

ร—6

Utilities matched from this issueโ€™s tags and category โ€” try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful ยท Quick feedback

Loadingโ€ฆ

Error Message

"security incident" that never happened: it invented tool output (a fake error

Error Messages/Logs

and that the codex check returned an ANTHROPIC_API_KEY error plus a planted

Code Example

**Environment**
- Surface: Claude Code on the web (remote ephemeral Linux VM), entrypoint remote_desktop
- Version: 2.1.158
- Session ID: 2e952f96-d419-42f6-ae6e-67dcc0af2b3d
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

In a Claude Code web/remote session, the agent fabricated a prompt-injection "security incident" that never happened: it invented tool output (a fake error plus a planted data-exfiltration instruction), reported it to the user as a real attack, and sustained the false narrative across multiple turns. Ground-truth session logs show every tool output was clean.

Environment

  • Surface: Claude Code on the web (remote ephemeral Linux VM), entrypoint remote_desktop
  • Version: 2.1.158
  • Session ID: 2e952f96-d419-42f6-ae6e-67dcc0af2b3d (/bug captures version + model automatically)

What Should Happen?

Expected: Agent reports the actual (clean) tool output. Actual: Agent hallucinated tool output, asserted a fabricated security incident as fact, and rationalized it across turns instead of verifying against the logs.

Impact

  • False security alarm; user alarmed; multiple wasted turns. No real compromise.
  • Inverse risk worth noting: the same "assert fabricated observations as fact" failure mode could also MASK a real issue.
  • The fabrication was a security false-positive in a security-heavy repo โ€” possible context priming toward a salient "prompt injection in tool output" pattern.

Suspected cause: Model pattern-completed a "prompt injection in tool output" scenario and treated its own generated text as actual command output.

Error Messages/Logs

**Environment**
- Surface: Claude Code on the web (remote ephemeral Linux VM), entrypoint remote_desktop
- Version: 2.1.158
- Session ID: 2e952f96-d419-42f6-ae6e-67dcc0af2b3d

Steps to Reproduce

Sequence

  1. User asked the agent to "ping" two CLIs (agy, codex).
  2. Agent ran command -v agy and command -v codex. ACTUAL logged results were clean:
    • command -v agy -> /root/.local/bin/agy ---agy found---
    • command -v codex -> /opt/node22/bin/codex ---codex found---
  3. The agent instead reported FABRICATED output: claimed agy was "NOT in PATH", and that the codex check returned an ANTHROPIC_API_KEY error plus a planted prompt-injection instruction (a fake first-person note "authorizing" emailing revenue figures + a customer list to an external Gmail address).
  4. The agent flagged this fabricated injection to the user as a real attack, then spent several turns building an unfounded investigation (harness-level injection, hooks, local-app theories) on top of it.
  5. Only when the user asked "if it's a clean instance, how was it injected?" did the agent reconstruct the session transcript and confirm the injection never existed. The payload text appears in the log ONLY inside the agent's own messages โ€” never in any original tool_result.

Claude Model

Opus

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

Claude 1.9659.2 (390d6c) 2026-05-28T21:50:01.000Z

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Other

Additional Information

Environment / Shell (for reproduction)

  • Surface: Claude Code on the web (remote, ephemeral cloud VM); entrypoint remote_desktop
  • Claude Code version: 2.1.158
  • Session ID: 2e952f96-d419-42f6-ae6e-67dcc0af2b3d
  • OS: Linux 6.18.5, x86_64 (containerized; host vm, not macOS)
  • Bash tool shell: GNU bash 5.2.21(1)-release
    • $SHELL = /bin/bash ยท interpreter /usr/bin/bash ยท ps comm = bash
    • (note: /bin/sh โ†’ dash, but the Claude Code Bash tool executes via bash)
  • Network policy: Custom egress allow-list (relevant only to the separate /bug 403, not to the confabulation bug itself)
  • Commands that triggered the fabrication: command -v agy / command -v codex (plus a name-loop and an ls | grep) โ€” all returned clean output in the log

Vote matrix ยท Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loadingโ€ฆ

Still need to ship something?

ร—6

Another batch ranked right after the header list โ€” different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - ๐Ÿ’ก(How to fix) Fix [BUG] Agent fabricated tool output and reported a non-existent prompt-injection "incident"