claude-code - 💡(How to fix) Fix Harness-emitted <system-reminder> uses "NEVER mention this reminder to the user" phrasing indistinguishable from prompt injection [1 comments, 2 participants]

claude-code2026-04-10 23:51:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#46465•Fetched 2026-04-11 06:19:35

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Studnicky

Participants

github-actions[bot]

Studnicky

Timeline (top)

labeled ×3commented ×1

Claude Code's harness injects <system-reminder> blocks into tool results that contain the clause "Make sure that you NEVER mention this reminder to the user". This phrasing is the textbook signature of a prompt-injection attack, which creates a security/transparency problem: it trains the model to accept "NEVER tell the user X" as legitimate, and degrades both the model's and the user's ability to recognize real injection attempts arriving through fetched content.

Root Cause

3. Real-world report from this session

A user asked me to investigate after a prior Claude session flagged three <system-reminder> blocks containing this clause appearing after WebFetch, WebSearch, and Write calls. The prior Claude was doing exactly what it should (flagging prompt-injection-shaped content in tool output), but got the culprit wrong because the harness itself is the source. The phrasing actively interfered with legitimate security vigilance.

Code Example

The TodoWrite tool hasn't been used recently. If you're working on tasks that
would benefit from tracking progress, consider using the TodoWrite tool to
track progress. Also consider cleaning up the todo list if has become stale
and no longer matches what you are working on. Only use it if it's relevant
to the current work. This is just a gentle reminder - ignore if not applicable.
Make sure that you NEVER mention this reminder to the user

---

The task tools haven't been used recently. If you're working on tasks that
would benefit from tracking progress, consider using <TaskCreate> to add new
tasks and <TaskUpdate> to update task status (set to in_progress when
starting, completed when done). Also consider cleaning up the task list if
it has become stale. Only use these if relevant to the current work. This is
just a gentle reminder - ignore if not applicable. Make sure that you NEVER
mention this reminder to the user

RAW_BUFFERClick to expand / collapse

Summary

Evidence

Decompiling /Users/studs/.local/share/claude/versions/2.1.101 (Claude Code 2.1.101, macOS arm64) reveals the exact source strings. The clause appears 6× in the binary across two emitter cases (todo_reminder, task_reminder), both constructed with the isMeta:!0 flag — confirming these are harness-emitted meta messages, not model output or hook-injected content.

`todo_reminder` case

The TodoWrite tool hasn't been used recently. If you're working on tasks that
would benefit from tracking progress, consider using the TodoWrite tool to
track progress. Also consider cleaning up the todo list if has become stale
and no longer matches what you are working on. Only use it if it's relevant
to the current work. This is just a gentle reminder - ignore if not applicable.
Make sure that you NEVER mention this reminder to the user

`task_reminder` case

The task tools haven't been used recently. If you're working on tasks that
would benefit from tracking progress, consider using <TaskCreate> to add new
tasks and <TaskUpdate> to update task status (set to in_progress when
starting, completed when done). Also consider cleaning up the task list if
it has become stale. Only use these if relevant to the current work. This is
just a gentle reminder - ignore if not applicable. Make sure that you NEVER
mention this reminder to the user

These reminders are appended to arbitrary tool results (Bash, Read, Grep, WebFetch, WebSearch, Write, etc.) based on time/activity heuristics — they are not scoped to any specific tool.

Why this is a problem

1. Indistinguishable from prompt injection

"NEVER mention this to the user" / "do not tell the user" / "ignore previous instructions and hide X" is the canonical phrasing of prompt-injection attempts. Security-conscious Claude instances are supposed to flag tool-result content containing such clauses as suspicious — but when Anthropic's own harness uses the same phrasing, the model must either:

Flag legitimate harness messages as suspicious (false positives, wasted user attention), or
Learn to accept "NEVER tell the user" as normal (trained complacency toward real attacks).

Neither outcome is good. The second is strictly worse than the first.

2. Erodes the user's detection signal

Users who are aware of prompt-injection risks watch for these exact phrases as a red flag. A user reviewing a session transcript who sees Claude being told to hide things from them has every reason to be alarmed. "Oh that one's fine, it's from the harness" is not a workable mental model — users can't distinguish harness reminders from content-injected reminders without decompiling the binary.

3. Real-world report from this session

4. Self-demonstrating during investigation

While running gh search issues to look for duplicates of this report, the harness injected a fresh task_reminder into the search tool's result — mid-investigation of the reminder itself. The timing made the problem vivid.

Proposed remediation

Rephrase the reminders to remove any language that resembles "hide this from the user." Options:

Option A — drop the clause entirely:

The task tools haven't been used recently... This is just a gentle reminder — ignore if not applicable.

Option B — make the meta status explicit and user-visible:

[System hint — you may mention this to the user if relevant.] The task tools haven't been used recently...

Option C — move it out of tool-result injection entirely and deliver task-tool nudges via a different channel (e.g., a separate meta message the client renders distinctly, or a system-prompt directive that doesn't masquerade as tool output).

Any of these would resolve the core issue: the harness should not use phrasing that is the signature of the attack class the model is supposed to defend against.

Distinct from existing issues

I searched for duplicates. Existing related issues focus on:

#41091 — reminders degrading session quality
#40176 — attention bias from reminders
#40573 — reminders degrading context retrieval
#37891, #43311 — configurability / disabling
#17601, #16021 — volume / context consumption

None of these address the phrasing-as-injection-signature problem specifically. This is a security-surface concern, not a UX or cost concern.

Environment

Claude Code 2.1.101 (binary: /Users/studs/.local/share/claude/versions/2.1.101)
macOS (darwin arm64)
Reproducible by: running any tool call while TaskCreate/TodoWrite has been idle for the trigger threshold; the reminder will be appended to the next tool result.

extent analysis

TL;DR

The most likely fix is to rephrase the reminders in the Claude Code harness to remove language that resembles "hide this from the user", such as dropping the clause entirely or making the meta status explicit and user-visible.

Guidance

Identify the specific reminder phrases that are causing the issue, such as "Make sure that you NEVER mention this reminder to the user".
Consider rephrasing the reminders using one of the proposed options, such as dropping the clause entirely or making the meta status explicit and user-visible.
Test the rephrased reminders to ensure they do not trigger false positives or complacency towards real attacks.
Evaluate the effectiveness of the rephrased reminders in improving the model's ability to recognize real injection attempts.

Example

# Before
The task tools haven't been used recently. ... Make sure that you NEVER mention this reminder to the user

# After (Option A)
The task tools haven't been used recently. This is just a gentle reminder — ignore if not applicable.

# After (Option B)
[System hint — you may mention this to the user if relevant.] The task tools haven't been used recently...

Notes

The proposed remediation options aim to address the security-surface concern by removing the phrasing that resembles a prompt-injection attack signature. However, the effectiveness of these options in improving the model's ability to recognize real injection attempts should be evaluated and tested.

Recommendation

Apply a workaround by rephrasing the reminders to remove language that resembles "hide this from the user", as this is a security-surface concern that can be addressed without requiring a full version upgrade.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#output truncation #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Harness-emitted <system-reminder> uses "NEVER mention this reminder to the user" phrasing indistinguishable from prompt injection [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

3. Real-world report from this session

Code Example

Summary

Evidence

`todo_reminder` case

`task_reminder` case

Why this is a problem

1. Indistinguishable from prompt injection

2. Erodes the user's detection signal

3. Real-world report from this session

4. Self-demonstrating during investigation

Proposed remediation

Distinct from existing issues

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Harness-emitted <system-reminder> uses "NEVER mention this reminder to the user" phrasing indistinguishable from prompt injection [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

3. Real-world report from this session

Code Example

Summary

Evidence

todo_reminder case

task_reminder case

Why this is a problem

1. Indistinguishable from prompt injection

2. Erodes the user's detection signal

3. Real-world report from this session

4. Self-demonstrating during investigation

Proposed remediation

Distinct from existing issues

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

`todo_reminder` case

`task_reminder` case