claude-code - 💡(How to fix) Fix Default auto-memory: prohibition-framed feedback memories may prime the model toward the patterns they aim to suppress [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#57747Fetched 2026-05-11 03:26:28
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
labeled ×2

Root Cause

Notably, the system already documents this exact principle for runtime CLAUDE.md authorship — describe violations abstractly, don't quote the literal banned phrase, because quoting primes the model. That insight just hasn't been generalised to the memory-writing layer.

RAW_BUFFERClick to expand / collapse

Observation

The default auto-memory system creates feedback memories when the user corrects the model. The natural way to encode "Claude did X wrong" is "don't do X" — so over time the MEMORY.md index fills with prohibition-led entries that load into context every session.

This can prime the model toward the patterns it's trying to suppress. Transformer attention has no negation gate: the tokens of a banned phrase activate whether they're prefixed with "DON'T", "NEVER", or "BAD example:". Show the model "don't say Let me check…" and it can produce "Let me wait for…" — same Let me… pattern, primed by the warning.

Concrete case

After ~6 weeks of accumulating corrections in a long-running project, I had 32 feedback memories. The two rules I broke most often — "diagnose, don't fix" and "never bypass the dev system" — were also the most prohibition-heavy in framing. Less prohibition-heavy memories ("always allow task tool access", "verify CLI flags") were honoured consistently across sessions.

Migrating the 32 files from prohibitive titles + lead sentences ("Never use Haiku", "Don't skip systematic-debugging", "No self-authorize #direct-edit") to prescriptive ones ("Use Opus or Sonnet", "Run systematic-debugging under deploy pressure", "User authorizes #direct-edit") is the natural fix. Incident detail (dates, IDs, what happened) goes into a Background section below the prescriptive instruction so the audit trail stays intact.

Notably, the system already documents this exact principle for runtime CLAUDE.md authorship — describe violations abstractly, don't quote the literal banned phrase, because quoting primes the model. That insight just hasn't been generalised to the memory-writing layer.

Why this isn't widely reported (best guesses)

  • Most users probably don't accumulate 32 feedback memories — the priming is dose-dependent.
  • When the model repeats a known-bad pattern, the read is "the model is unreliable", not "my feedback memory's framing put the bad pattern back into attention." The chain takes a specific mental model to spot.
  • The default writing trigger (user just corrected the model) makes the corpus skew prohibitive by construction. The system-prompt guidance to "lead with the rule itself" doesn't distinguish positive from negative framing.

Suggested fix in the auto-memory system prompt

In the <type><name>feedback</name> section of the memory instructions:

  1. Nudge the title and lead sentence toward the desired behaviour, not the prohibition. The example bodies already use a **How to apply:** block with prescriptive bullets — extend that pattern to the title.
  2. Add a one-line note on why prescriptive framing matters (the priming / attention-window mechanism) so the model self-applies the rule when writing new memories.
  3. Generalise the existing "describe negative examples abstractly" rule (currently aimed at runtime CLAUDE.md authorship) to memory writing itself.

A small change to the system-prompt <body_structure> / <when_to_save> guidance would prevent the corpus skew without requiring users to discover the issue manually.

Scope

Not a request to remove the bug registry or stop recording incidents — those serve real audit purposes. The thing to flip is the behaviour-shaping layer (the feedback type), not the descriptive layer (project notes, debugging history).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING