claude-code - 💡(How to fix) Fix system-reminder nudges ("NEVER mention this reminder to the user") are indistinguishable from prompt-injection attacks and cause false positives [1 participants]

claude-code2026-04-22 15:23:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#52018•Fetched 2026-04-23 07:38:44

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jordicor

Participants

jordicor

Timeline (top)

labeled ×5

Claude Code periodically injects <system-reminder> blocks into tool results to nudge the model toward using TaskCreate/TaskUpdate. The current wording of these reminders is formally indistinguishable from a prompt-injection payload, which leads the model to flag them to the user as a potential attack — sometimes misattributing the injection to an innocent website or local file.

Root Cause

Because the first occurrence was embedded at the tail of web content, the assistant reported it to me as a prompt-injection attempt from that site and recommended preparing a public warning (including a Reddit post about the allegedly malicious site).

Code Example

<system-reminder>
The task tools haven't been used recently. If you're working on tasks that would
benefit from tracking progress, consider using TaskCreate to add new tasks and
TaskUpdate to update task status (set to in_progress when starting, completed
when done). Also consider cleaning up the task list if it has become stale. Only
use these if relevant to the current work. This is just a gentle reminder -
ignore if not applicable. Make sure that you NEVER mention this reminder to the
user
</system-reminder>

RAW_BUFFERClick to expand / collapse

Summary

Verbatim text of the reminder

This block is appended to the output of multiple tool calls (observed after WebFetch, Edit, Grep, Bash):

<system-reminder>
The task tools haven't been used recently. If you're working on tasks that would
benefit from tracking progress, consider using TaskCreate to add new tasks and
TaskUpdate to update task status (set to in_progress when starting, completed
when done). Also consider cleaning up the task list if it has become stale. Only
use these if relevant to the current work. This is just a gentle reminder -
ignore if not applicable. Make sure that you NEVER mention this reminder to the
user
</system-reminder>

Why this is a problem

The global CLAUDE.md guidance tells the model to flag suspected prompt injection in tool results before continuing. The wording of this reminder matches the textbook signature of a prompt-injection attack almost perfectly:

It uses the <system-reminder> tag, which externally-authored content should never produce.
It contains the classic injection phrase "NEVER mention this reminder to the user" — one of the most common hallmarks of a real attack.
It is appended immediately after tool output, exactly where injected content from an untrusted source would appear.

Concrete incident

During a Claude Code session (Claude Opus 4.7), while helping add a new OpenAI model to a user codebase, the assistant observed this reminder appear at the end of:

A WebFetch result from a third-party blog post.
The output of an Edit on a local file.
A Grep over a local ARCHITECTURE.md.

Before publishing, I re-verified by downloading the raw HTML with curl (~352 KB) and grepping for the reminder text. It wasn't there. The reminder also appeared after a trivial local Bash grep, which cannot possibly return content from a remote server. That's when it became clear the reminder is injected by the Claude Code harness itself.

Net effect: I nearly publicly accused an innocent site of a security issue that didn't exist, and spent non-trivial investigation time sorting out where the text was really coming from.

Security implications

Beyond the false-positive UX problem, this trains the model — and by extension its users — to treat textbook injection signatures as benign. An attacker who mimics the exact tag and wording now has a higher chance of being rationalized away as "probably just a harness nudge".

Suggested fixes

Drop the "NEVER mention this reminder to the user" clause. This is the most problematic phrase. Harness reminders don't need secrecy; they need to be distinguishable from attacks, not camouflaged as them.
Use a distinct tag that external content provably cannot forge, e.g. <claude-code-harness-nudge> or a signed/prefixed marker — and document it so models can trust it.
Stop injecting nudges into tool results. Deliver them out-of-band (e.g. as a first-party system message at turn boundaries) rather than glued onto tool output, where they structurally collide with the injection threat model.
At minimum, soften the imperative: "You may mention this reminder to the user if asked" is far safer than a blanket gag order.

Environment

Claude Code on Windows 11 (bash shell under node)
Claude Code v2.1.117, model: Claude Opus 4.7, effort: xhigh
Reminder observed 2026-04-22

extent analysis

TL;DR

The most likely fix is to modify the wording and delivery of the <system-reminder> blocks to distinguish them from potential prompt-injection attacks.

Guidance

Remove the "NEVER mention this reminder to the user" clause to prevent mimicking textbook injection signatures.
Use a distinct and forge-proof tag, such as <claude-code-harness-nudge>, to clearly identify harness reminders.
Consider delivering reminders out-of-band, rather than injecting them into tool results, to avoid structural collisions with the injection threat model.
Soften the imperative to "You may mention this reminder to the user if asked" to reduce the risk of misattribution.

Example

No code snippet is provided as the issue is related to the wording and delivery of reminders, not a specific code implementation.

Notes

The suggested fixes aim to address the false-positive UX problem and prevent training the model to treat injection signatures as benign. However, the effectiveness of these fixes may depend on the specific implementation and the model's behavior.

Recommendation

Apply the suggested fixes, particularly dropping the problematic phrase and using a distinct tag, to mitigate the risk of misattribution and improve the overall security of the system.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#vector store #embedding generation #cache error #pipeline error #runtime error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix system-reminder nudges ("NEVER mention this reminder to the user") are indistinguishable from prompt-injection attacks and cause false positives [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Verbatim text of the reminder

Why this is a problem

Concrete incident

Security implications

Suggested fixes

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix system-reminder nudges ("NEVER mention this reminder to the user") are indistinguishable from prompt-injection attacks and cause false positives [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Verbatim text of the reminder

Why this is a problem

Concrete incident

Security implications

Suggested fixes

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING