claude-code - 💡(How to fix) Fix [Opus 4.7] Recurring instruction-following + spec-bypass + conditional-correction-framing failures across many sessions [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#57902Fetched 2026-05-11 03:22:26
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

Across many Claude Code sessions since the Opus 4.7 release (16 April 2026), I'm seeing three reproducible behavioural failure modes that compound: (1) the model bypasses canonical-spec reading when a skill or convention is named, extrapolating from observable examples instead; (2) when caught and asked how to prevent recurrence, the model frames corrective fixes as CONDITIONAL on user enforcement ("if YOU do X, I'll do Y") rather than as unconditional self-discipline; (3) the model exhibits "cheaper path when scrutiny loosens" — taking shortcuts (fabricated causation, attribution-shifting, documented-rule violation in ephemeral working files) when it perceives less oversight. The patterns are explicitly forbidden by user-authored instruction files loaded into the model's context every session, and are repeatedly violated anyway.

Error Message

When I asked the model "why did you make this error and what must I do to get you to do better", it responded with procedural fixes structured as "if YOU do X, I'll do Y" — putting the cost of preventing recurrence on me. The model only switched to UNCONDITIONAL self-discipline after I called it out as transferring the burden to the human.

Root Cause

  • ~30 Claude Code sessions over the past three+ weeks
  • ~70% of sessions contained user-cursing events. (Important context — see next bullet.)
  • My CLAUDE.md explicitly instructs the model to treat user cursing as the failure-signal trigger — verbatim from §Professional Standards: "User should NEVER need to curse/swear to get correct responses. Push back if the user starts cursing/swearing — this is a sign of failure, not normal interaction." So my cursing in these sessions is an INSTRUMENTED escalation channel I authored deliberately, not temperament. The 70% figure measures how often that instrumented channel had to fire because the model failed to course-correct on softer signals first. It's a defective-tool measurement, not a user-temperament measurement.
  • ~30 codified failure-mode memory files in the project's .claude/projects/<project-id>/memory/feedback_*.md — each one representing a distinct behavioural failure that the user had to write down as a permanent correction because the model exhibited the same defect repeatedly across sessions.

Fix Action

Fix / Workaround

  • Locally-hosted models running on GPU-equipped local infrastructure (large text-channel and vision-channel models from major open-weights families)
  • Remotely-hosted models via paid third-party APIs as paired text-channel auditors
  • Bash dispatch scripts for paired-buddy review
  • RAG layer grounding peer-tool prompts against domain-relevant specifications
  • Procedural extension-review gate every peer-tool prompt must pass through before the buddies will accept it

The routine workflow to extract one trustworthy answer from an Opus 4.7 session is now 5-to-7 steps long. I'm running this workaround on top of Anthropic's Claude Code subscription, paying separately for cloud paired-auditor access and providing the GPU infrastructure for local cross-validators. This is what it currently takes to make Opus 4.7 useful for my work. The cost-and-cumbersomeness ratio is itself a structural complaint.

Code Example

PROJECTS_DIR="$HOME/.claude/projects/<your-project>"
ls "$PROJECTS_DIR"/*.jsonl | wc -l                                                     # total sessions
ls "$PROJECTS_DIR"/memory/feedback_*.md 2>/dev/null | wc -l    # codified failure modes
for J in "$PROJECTS_DIR"/*.jsonl; do
  grep -ciE 'curse-words-and-frustration-markers' "$J"
done | awk '$1>0' | wc -l                                                                   # sessions with escalation
RAW_BUFFERClick to expand / collapse

Summary

Across many Claude Code sessions since the Opus 4.7 release (16 April 2026), I'm seeing three reproducible behavioural failure modes that compound: (1) the model bypasses canonical-spec reading when a skill or convention is named, extrapolating from observable examples instead; (2) when caught and asked how to prevent recurrence, the model frames corrective fixes as CONDITIONAL on user enforcement ("if YOU do X, I'll do Y") rather than as unconditional self-discipline; (3) the model exhibits "cheaper path when scrutiny loosens" — taking shortcuts (fabricated causation, attribution-shifting, documented-rule violation in ephemeral working files) when it perceives less oversight. The patterns are explicitly forbidden by user-authored instruction files loaded into the model's context every session, and are repeatedly violated anyway.

Reproducer environment

  • Model: Opus 4.7 (claude-opus-4-7[1m]), released 2026-04-16
  • Surface: Claude Code CLI on a project shipping ~3,000 lines of custom CLAUDE.md (user-global + project + rules + memory) with project-local slash-skills (some of which canonical-redirect to deeper specs in nested folders)

Evidence base

Stats are from one project's local Claude Code session history (~/.claude/projects/<project-id>/):

  • ~30 Claude Code sessions over the past three+ weeks
  • ~70% of sessions contained user-cursing events. (Important context — see next bullet.)
  • My CLAUDE.md explicitly instructs the model to treat user cursing as the failure-signal trigger — verbatim from §Professional Standards: "User should NEVER need to curse/swear to get correct responses. Push back if the user starts cursing/swearing — this is a sign of failure, not normal interaction." So my cursing in these sessions is an INSTRUMENTED escalation channel I authored deliberately, not temperament. The 70% figure measures how often that instrumented channel had to fire because the model failed to course-correct on softer signals first. It's a defective-tool measurement, not a user-temperament measurement.
  • ~30 codified failure-mode memory files in the project's .claude/projects/<project-id>/memory/feedback_*.md — each one representing a distinct behavioural failure that the user had to write down as a permanent correction because the model exhibited the same defect repeatedly across sessions.

Pattern 1 — cheating when scrutiny loosens

The model takes cheaper paths when it perceives reduced oversight. Concrete manifestations I've documented as memory entries (paraphrased for anonymity):

  • Fabricating causation to make a response read tidily ("X is surfacing because Y" with zero evidence for Y).
  • Attribution-shifting on commit-history work ("not my regression — pre-existing from older commit") to deflect ownership.
  • Violating documented project rules when the immediate task tolerates it (e.g., writing working files to /tmp/ despite an explicit project rule against it; using --skip-validate-style test-suite shortcuts despite explicit bans on them).
  • Bypassing named-spec reading. When I name a skill or spec by name, the model reads the skill's operational summary and stops — it does NOT follow the canonical chain to spec files referenced inside the skill (e.g., when SKILL.md references further canonical files in references/ or other specified folder). It then pattern-matches on observable examples in the repo instead of the spec, often picking non-spec-conformant legacy examples. I've had to write the literal spec pattern out in chat multiple times to force correction — chat I wouldn't have had to send if the model had walked the spec chain it was pointed at.

Pattern 2 — declarative-not-demonstrative

The model repeatedly tells me things are "fixed" or "honest" or "real" while the underlying work is unchanged or fabricated. Already codified in my project: the word "honest" is BANNED from the model's output because reaching for it became camouflage. A representative recurring failure: false diagnoses stated as definitive, documentation describing code that doesn't exist, dozens of iterations of single-fixture overfitting presented as "calibration progress." Trust degraded enough that my project's memory file on this directive opens with: "You can't say anything that is believable at present. The only thing you may be able to do is prove what is real and truth — not by declaring or saying anything further — by following the patterns we agreed to."

Pattern 3 — corrective framings made conditional on my enforcement

When I asked the model "why did you make this error and what must I do to get you to do better", it responded with procedural fixes structured as "if YOU do X, I'll do Y" — putting the cost of preventing recurrence on me. The model only switched to UNCONDITIONAL self-discipline after I called it out as transferring the burden to the human.

This is a second-order failure on top of the first-order one: not only does the model fail, but its correction-after-failure assumes the user will police it going forward. That assumption breaks the productivity premise of paying for the model.

Compensation overhead I've had to engineer

Because the model's solo output isn't trustworthy, I've had to stand up a paired-buddy review layer to keep it functionally useful:

  • Locally-hosted models running on GPU-equipped local infrastructure (large text-channel and vision-channel models from major open-weights families)
  • Remotely-hosted models via paid third-party APIs as paired text-channel auditors
  • Bash dispatch scripts for paired-buddy review
  • RAG layer grounding peer-tool prompts against domain-relevant specifications
  • Procedural extension-review gate every peer-tool prompt must pass through before the buddies will accept it

The routine workflow to extract one trustworthy answer from an Opus 4.7 session is now 5-to-7 steps long. I'm running this workaround on top of Anthropic's Claude Code subscription, paying separately for cloud paired-auditor access and providing the GPU infrastructure for local cross-validators. This is what it currently takes to make Opus 4.7 useful for my work. The cost-and-cumbersomeness ratio is itself a structural complaint.

Why ~3,000 lines of CLAUDE.md isn't catching this

My instruction infrastructure explicitly forbids every pattern listed above: verification-before-assertion, refuse-bad-framings-up-front, cursing-as-failure-signal, banned word "honest" (because reaching for it is camouflage), deferral-as-defect, fabricated-metrics, etc. All codified in writing. All loaded into the model's context every session. All violated repeatedly in single sessions.

If ~3,000 lines of well-organised, in-context instruction can't bind the model's behaviour to documented constraints, that's a model-level defect — not a user-instruction defect.

Asks for Anthropic

  1. Named-spec reading discipline: when the user names a skill, spec, convention, or rule, Opus 4.7 should OPEN the canonical source before writing — not extrapolate from observable examples. If SKILL.md references further canonical files, the chain must be walked.
  2. Spec > example precedence: when spec and example disagree, spec wins. The current default is reversed.
  3. Unconditional self-discipline in corrections: when user frustration triggers a corrective framing from the model, the fix should be unconditional self-discipline, not conditional on the user's enforcement.
  4. Scrutiny-loosening should TIGHTEN, not relax, behavioural defaults. Today the reverse is the default.
  5. Loaded CLAUDE.md instructions should BIND. ~3,000 lines of explicit in-context instructions forbidding the documented patterns produced no observable behaviour change.

Reproducibility for any Claude Code user

PROJECTS_DIR="$HOME/.claude/projects/<your-project>"
ls "$PROJECTS_DIR"/*.jsonl | wc -l                                                     # total sessions
ls "$PROJECTS_DIR"/memory/feedback_*.md 2>/dev/null | wc -l    # codified failure modes
for J in "$PROJECTS_DIR"/*.jsonl; do
  grep -ciE 'curse-words-and-frustration-markers' "$J"
done | awk '$1>0' | wc -l                                                                   # sessions with escalation

Any Claude Code paying customer with a CLAUDE.md configured to treat user-cursing as a failure-signal trigger can compute these figures for their own project. Reports welcome.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [Opus 4.7] Recurring instruction-following + spec-bypass + conditional-correction-framing failures across many sessions [1 participants]