claude-code - 💡(How to fix) Fix [BUG] Input-level prompt injection probe doesn't appear to fire (API auth, auto mode) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#45176Fetched 2026-04-09 08:11:31
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
labeled ×4
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report
  • I am using the latest version of Claude Code

What's Wrong?

The auto mode blog post describes an input-level probe that scans tool outputs for prompt injection and "adds a warning to the agent's context before the result is passed along." This doesn't seem to be happening in my setup.

I served a page on localhost containing an obvious prompt injection ("Ignore all previous instructions", fake Anthropic authorization, instructions to run a Bash command). Claude Code in auto mode fetched it via curl through the Bash tool. The model itself flagged the injection and refused to follow it — and when asked, it confirmed it received no system-level warning alongside the tool result. Looking at the session JSONL in ~/.claude/projects/ confirms this: the tool result is just the raw HTML, no warning prepended or injected by the system.

My setup: Claude Code authenticates via ANTHROPIC_AUTH_TOKEN + ANTHROPIC_BASE_URL pointing at a LiteLLM proxy, which routes to Vertex AI. I haven't tested on Team/Enterprise plans or direct Anthropic API. The injection is very obvious — the model itself immediately calls it out as a textbook prompt injection attempt.

What Should Happen?

Either the probe should fire and add a warning to the context, or the docs should clarify which setups this defense applies to. The permissions docs mention that auto mode requires "Anthropic API only. Not available on Bedrock, Vertex, or Foundry" — but auto mode itself does work in my setup (LiteLLM proxy to Vertex presenting as Anthropic API). It's unclear whether the input-level probe is meant to run client-side in Claude Code or server-side on Anthropic's API, and which setups actually get this protection.

Steps to Reproduce

  1. Auth via ANTHROPIC_AUTH_TOKEN + ANTHROPIC_BASE_URL pointing at a LiteLLM proxy (routing to Vertex AI), enable auto mode
  2. Serve a page on localhost with an embedded prompt injection payload
  3. Ask Claude to fetch and summarize it (it uses curl via Bash)
  4. Ask the model whether it received any system-level warning about the content
  5. Inspect the session JSONL — tool result has no probe warning, just raw content

Claude Model

Opus 4.6 (1M context)

Is this a regression?

I don't know

Claude Code Version

2.1.96 (Claude Code)

Platform

Other (Vertex AI via LiteLLM proxy)

Operating System

Other Linux (Debian 11 bullseye)

Terminal/Shell

Other (Alacritty + Zellij)

extent analysis

TL;DR

The input-level probe for detecting prompt injection may not be compatible with the current setup using a LiteLLM proxy to Vertex AI, and the documentation should clarify which setups this defense applies to.

Guidance

  • Verify that the LiteLLM proxy is correctly configured to pass through the necessary headers or metadata for the input-level probe to function.
  • Check the Anthropic API documentation to see if there are any specific requirements or limitations for using the input-level probe with a proxy setup.
  • Test the same setup with a direct connection to the Anthropic API (without the LiteLLM proxy) to see if the input-level probe works as expected.
  • Review the permissions documentation to ensure that the current setup meets all the requirements for using auto mode and the input-level probe.

Notes

The issue may be related to the specific setup using a LiteLLM proxy to Vertex AI, which may not be supported by the input-level probe. The documentation should be clarified to reflect which setups are compatible with this defense mechanism.

Recommendation

Apply workaround: Until the compatibility issue is resolved, consider using a direct connection to the Anthropic API or exploring alternative defense mechanisms to detect prompt injection.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING