claude-code - 💡(How to fix) Fix [BUG] Agent fabricated tool output and reported a non-existent prompt-injection "incident"

claude-code2026-05-30 05:41:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

"security incident" that never happened: it invented tool output (a fake error

Error Messages/Logs

and that the codex check returned an ANTHROPIC_API_KEY error plus a planted

Code Example

**Environment**
- Surface: Claude Code on the web (remote ephemeral Linux VM), entrypoint remote_desktop
- Version: 2.1.158
- Session ID: 2e952f96-d419-42f6-ae6e-67dcc0af2b3d

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

In a Claude Code web/remote session, the agent fabricated a prompt-injection "security incident" that never happened: it invented tool output (a fake error plus a planted data-exfiltration instruction), reported it to the user as a real attack, and sustained the false narrative across multiple turns. Ground-truth session logs show every tool output was clean.

Environment

Surface: Claude Code on the web (remote ephemeral Linux VM), entrypoint remote_desktop
Version: 2.1.158
Session ID: 2e952f96-d419-42f6-ae6e-67dcc0af2b3d (/bug captures version + model automatically)

What Should Happen?

Expected: Agent reports the actual (clean) tool output. Actual: Agent hallucinated tool output, asserted a fabricated security incident as fact, and rationalized it across turns instead of verifying against the logs.

Impact

False security alarm; user alarmed; multiple wasted turns. No real compromise.
Inverse risk worth noting: the same "assert fabricated observations as fact" failure mode could also MASK a real issue.
The fabrication was a security false-positive in a security-heavy repo — possible context priming toward a salient "prompt injection in tool output" pattern.

Suspected cause: Model pattern-completed a "prompt injection in tool output" scenario and treated its own generated text as actual command output.

Error Messages/Logs

**Environment**
- Surface: Claude Code on the web (remote ephemeral Linux VM), entrypoint remote_desktop
- Version: 2.1.158
- Session ID: 2e952f96-d419-42f6-ae6e-67dcc0af2b3d

Steps to Reproduce

Sequence

User asked the agent to "ping" two CLIs (agy, codex).
Agent ran command -v agy and command -v codex. ACTUAL logged results were clean:
- command -v agy -> /root/.local/bin/agy ---agy found---
- command -v codex -> /opt/node22/bin/codex ---codex found---
The agent instead reported FABRICATED output: claimed agy was "NOT in PATH", and that the codex check returned an ANTHROPIC_API_KEY error plus a planted prompt-injection instruction (a fake first-person note "authorizing" emailing revenue figures + a customer list to an external Gmail address).
The agent flagged this fabricated injection to the user as a real attack, then spent several turns building an unfounded investigation (harness-level injection, hooks, local-app theories) on top of it.
Only when the user asked "if it's a clean instance, how was it injected?" did the agent reconstruct the session transcript and confirm the injection never existed. The payload text appears in the log ONLY inside the agent's own messages — never in any original tool_result.

Claude Model

Opus

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

Claude 1.9659.2 (390d6c) 2026-05-28T21:50:01.000Z

Platform

Anthropic API

Operating System

Ubuntu/Debian Linux

Terminal/Shell

Other

Additional Information

Environment / Shell (for reproduction)

Surface: Claude Code on the web (remote, ephemeral cloud VM); entrypoint remote_desktop
Claude Code version: 2.1.158
Session ID: 2e952f96-d419-42f6-ae6e-67dcc0af2b3d
OS: Linux 6.18.5, x86_64 (containerized; host vm, not macOS)
Bash tool shell: GNU bash 5.2.21(1)-release
- $SHELL = /bin/bash · interpreter /usr/bin/bash · ps comm = bash
- (note: /bin/sh → dash, but the Claude Code Bash tool executes via bash)
Network policy: Custom egress allow-list (relevant only to the separate /bug 403, not to the confabulation bug itself)
Commands that triggered the fabrication: command -v agy / command -v codex (plus a name-loop and an ls | grep) — all returned clean output in the log

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Agent fabricated tool output and reported a non-existent prompt-injection "incident"

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Code Example

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

Environment / Shell (for reproduction)

Still need to ship something?

TRENDING