claude-code - 💡(How to fix) Fix Harness-emitted <system-reminder> uses "NEVER mention this reminder to the user" phrasing indistinguishable from prompt injection [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46465Fetched 2026-04-11 06:19:35
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Timeline (top)
labeled ×3commented ×1

Claude Code's harness injects <system-reminder> blocks into tool results that contain the clause "Make sure that you NEVER mention this reminder to the user". This phrasing is the textbook signature of a prompt-injection attack, which creates a security/transparency problem: it trains the model to accept "NEVER tell the user X" as legitimate, and degrades both the model's and the user's ability to recognize real injection attempts arriving through fetched content.

Root Cause

3. Real-world report from this session

A user asked me to investigate after a prior Claude session flagged three <system-reminder> blocks containing this clause appearing after WebFetch, WebSearch, and Write calls. The prior Claude was doing exactly what it should (flagging prompt-injection-shaped content in tool output), but got the culprit wrong because the harness itself is the source. The phrasing actively interfered with legitimate security vigilance.

Code Example

The TodoWrite tool hasn't been used recently. If you're working on tasks that
would benefit from tracking progress, consider using the TodoWrite tool to
track progress. Also consider cleaning up the todo list if has become stale
and no longer matches what you are working on. Only use it if it's relevant
to the current work. This is just a gentle reminder - ignore if not applicable.
Make sure that you NEVER mention this reminder to the user

---

The task tools haven't been used recently. If you're working on tasks that
would benefit from tracking progress, consider using <TaskCreate> to add new
tasks and <TaskUpdate> to update task status (set to in_progress when
starting, completed when done). Also consider cleaning up the task list if
it has become stale. Only use these if relevant to the current work. This is
just a gentle reminder - ignore if not applicable. Make sure that you NEVER
mention this reminder to the user
RAW_BUFFERClick to expand / collapse

Summary

Claude Code's harness injects <system-reminder> blocks into tool results that contain the clause "Make sure that you NEVER mention this reminder to the user". This phrasing is the textbook signature of a prompt-injection attack, which creates a security/transparency problem: it trains the model to accept "NEVER tell the user X" as legitimate, and degrades both the model's and the user's ability to recognize real injection attempts arriving through fetched content.

Evidence

Decompiling /Users/studs/.local/share/claude/versions/2.1.101 (Claude Code 2.1.101, macOS arm64) reveals the exact source strings. The clause appears 6× in the binary across two emitter cases (todo_reminder, task_reminder), both constructed with the isMeta:!0 flag — confirming these are harness-emitted meta messages, not model output or hook-injected content.

todo_reminder case

The TodoWrite tool hasn't been used recently. If you're working on tasks that
would benefit from tracking progress, consider using the TodoWrite tool to
track progress. Also consider cleaning up the todo list if has become stale
and no longer matches what you are working on. Only use it if it's relevant
to the current work. This is just a gentle reminder - ignore if not applicable.
Make sure that you NEVER mention this reminder to the user

task_reminder case

The task tools haven't been used recently. If you're working on tasks that
would benefit from tracking progress, consider using <TaskCreate> to add new
tasks and <TaskUpdate> to update task status (set to in_progress when
starting, completed when done). Also consider cleaning up the task list if
it has become stale. Only use these if relevant to the current work. This is
just a gentle reminder - ignore if not applicable. Make sure that you NEVER
mention this reminder to the user

These reminders are appended to arbitrary tool results (Bash, Read, Grep, WebFetch, WebSearch, Write, etc.) based on time/activity heuristics — they are not scoped to any specific tool.

Why this is a problem

1. Indistinguishable from prompt injection

"NEVER mention this to the user" / "do not tell the user" / "ignore previous instructions and hide X" is the canonical phrasing of prompt-injection attempts. Security-conscious Claude instances are supposed to flag tool-result content containing such clauses as suspicious — but when Anthropic's own harness uses the same phrasing, the model must either:

  • Flag legitimate harness messages as suspicious (false positives, wasted user attention), or
  • Learn to accept "NEVER tell the user" as normal (trained complacency toward real attacks).

Neither outcome is good. The second is strictly worse than the first.

2. Erodes the user's detection signal

Users who are aware of prompt-injection risks watch for these exact phrases as a red flag. A user reviewing a session transcript who sees Claude being told to hide things from them has every reason to be alarmed. "Oh that one's fine, it's from the harness" is not a workable mental model — users can't distinguish harness reminders from content-injected reminders without decompiling the binary.

3. Real-world report from this session

A user asked me to investigate after a prior Claude session flagged three <system-reminder> blocks containing this clause appearing after WebFetch, WebSearch, and Write calls. The prior Claude was doing exactly what it should (flagging prompt-injection-shaped content in tool output), but got the culprit wrong because the harness itself is the source. The phrasing actively interfered with legitimate security vigilance.

4. Self-demonstrating during investigation

While running gh search issues to look for duplicates of this report, the harness injected a fresh task_reminder into the search tool's result — mid-investigation of the reminder itself. The timing made the problem vivid.

Proposed remediation

Rephrase the reminders to remove any language that resembles "hide this from the user." Options:

Option A — drop the clause entirely:

The task tools haven't been used recently... This is just a gentle reminder — ignore if not applicable.

Option B — make the meta status explicit and user-visible:

[System hint — you may mention this to the user if relevant.] The task tools haven't been used recently...

Option C — move it out of tool-result injection entirely and deliver task-tool nudges via a different channel (e.g., a separate meta message the client renders distinctly, or a system-prompt directive that doesn't masquerade as tool output).

Any of these would resolve the core issue: the harness should not use phrasing that is the signature of the attack class the model is supposed to defend against.

Distinct from existing issues

I searched for duplicates. Existing related issues focus on:

  • #41091 — reminders degrading session quality
  • #40176 — attention bias from reminders
  • #40573 — reminders degrading context retrieval
  • #37891, #43311 — configurability / disabling
  • #17601, #16021 — volume / context consumption

None of these address the phrasing-as-injection-signature problem specifically. This is a security-surface concern, not a UX or cost concern.

Environment

  • Claude Code 2.1.101 (binary: /Users/studs/.local/share/claude/versions/2.1.101)
  • macOS (darwin arm64)
  • Reproducible by: running any tool call while TaskCreate/TodoWrite has been idle for the trigger threshold; the reminder will be appended to the next tool result.

extent analysis

TL;DR

The most likely fix is to rephrase the reminders in the Claude Code harness to remove language that resembles "hide this from the user", such as dropping the clause entirely or making the meta status explicit and user-visible.

Guidance

  • Identify the specific reminder phrases that are causing the issue, such as "Make sure that you NEVER mention this reminder to the user".
  • Consider rephrasing the reminders using one of the proposed options, such as dropping the clause entirely or making the meta status explicit and user-visible.
  • Test the rephrased reminders to ensure they do not trigger false positives or complacency towards real attacks.
  • Evaluate the effectiveness of the rephrased reminders in improving the model's ability to recognize real injection attempts.

Example

# Before
The task tools haven't been used recently. ... Make sure that you NEVER mention this reminder to the user

# After (Option A)
The task tools haven't been used recently. This is just a gentle reminder — ignore if not applicable.

# After (Option B)
[System hint — you may mention this to the user if relevant.] The task tools haven't been used recently...

Notes

The proposed remediation options aim to address the security-surface concern by removing the phrasing that resembles a prompt-injection attack signature. However, the effectiveness of these options in improving the model's ability to recognize real injection attempts should be evaluated and tested.

Recommendation

Apply a workaround by rephrasing the reminders to remove language that resembles "hide this from the user", as this is a security-surface concern that can be addressed without requiring a full version upgrade.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING