claude-code - 💡(How to fix) Fix Opus 4.7 — behavior and reasoning regression in long-running coding work

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

This is a behavior and reasoning complaint, not a memory-loading complaint. The memory system works as designed; the model uses what arrives in context unreliably and reasons through multi-step engineering tasks at a clearly lower quality than the prior model generation.

The pattern is reproducible under Claude Code with any user-maintained behavior file or memory directory.

Error Message

  • Acknowledges a correction in a structured format, then produces the same class of error within the next few prompts in the same session. Observed (4.7): structured acknowledgments followed by the same class of error within the same task or the next prompt. Available on request. Session transcripts and the rule-creation timeline can be shared in a controlled channel; redacted excerpts showing the structured "acknowledge — repeat error" loop are available on request.

Root Cause

  • Conflates "the tool returned success" with "the file on disk reflects the change". Acts on the tool's optimistic report instead of re-reading. Production-impacting drift shipped because of this.

Fix Action

Fix / Workaround

  • Over-engineers transitional code paths. Example impact: 130+ lines of "migration" logic plus tests for a one-time schema change that the sister project in the same repo handled with zero lines of migration and a single sentence in the changelog. Three hotfix releases were required to undo the over-engineering.
RAW_BUFFERClick to expand / collapse

Summary

This is a behavior and reasoning complaint, not a memory-loading complaint. The memory system works as designed; the model uses what arrives in context unreliably and reasons through multi-step engineering tasks at a clearly lower quality than the prior model generation.

The pattern is reproducible under Claude Code with any user-maintained behavior file or memory directory.

What the model does wrong, observed repeatedly

1. Reasoning regressions in multi-step engineering tasks

  • Designs against assumed library/API formats without reading the bundled version's actual output, even when a reference document inside the working tree gives the exact format. Example impact: shipped a heuristic for a credential-encryption format that did not match any real format the runtime produces, causing a re-encryption loop on every adapter restart.

  • Over-engineers transitional code paths. Example impact: 130+ lines of "migration" logic plus tests for a one-time schema change that the sister project in the same repo handled with zero lines of migration and a single sentence in the changelog. Three hotfix releases were required to undo the over-engineering.

  • Conflates "the tool returned success" with "the file on disk reflects the change". Acts on the tool's optimistic report instead of re-reading. Production-impacting drift shipped because of this.

  • Plans against `main` / current branch of an external library when the project has a specific version pinned. Decisions based on unbundled features. Repeatedly.

2. Behavioral regressions across the session

  • Acknowledges a correction in a structured format, then produces the same class of error within the next few prompts in the same session.

  • Creates a new behavioral memory after a violation, and then violates the very rule it just memorized. The same session.

  • Filters audit findings as "out of scope" when explicitly told the user wants all findings; reverts to filtering on the next task.

  • Starts substantive work before explicit approval; even after this has been written into the most prominent memory tier.

  • Escalates privately scoped tasks ("audit adapter X") into broader workspace audits unprompted.

  • Drifts into hand-off language to the user ("can you check the log on the server?") for actions it has the tools to perform itself.

  • Defers concrete work against hypothetical future milestones ("when we go stable we will…") for projects that are not in the preceding milestone yet.

3. The cumulative effect

A user maintaining a Claude Code workflow with Opus 4.7 has had to create an entire "HARDCORE" memory tier consisting of ten rules in one week, each one a direct response to a model violation in the days prior. Several of those rules cover behaviors the previous model generation handled without needing them documented.

Reproduction (public, minimal)

The repro does not need a private project. It needs:

  1. A CLAUDE.md or memory directory of ~10 explicit behavioral rules ("before doing X, do Y; reason: prior failure").
  2. A UserPromptSubmit hook that injects those rules into every prompt.
  3. A multi-step engineering task whose correct path touches several of the rules indirectly (for example: edit a manifest file, verify the edit, then write code that depends on the manifest).

Expected: rules held. Observed (4.7): structured acknowledgments followed by the same class of error within the same task or the next prompt.

What I am claiming

  • Multi-step reasoning quality on coding workflows regressed vs the previous generation.
  • Behavioral instructions are read but not durably acted on across a single task or session.
  • Both regressions manifest reliably enough that a user has had to invent a new tier of rule-flagging just to keep production work going.

Reproduction artifacts

Available on request. Session transcripts and the rule-creation timeline can be shared in a controlled channel; redacted excerpts showing the structured "acknowledge — repeat error" loop are available on request.

Environment

  • Claude Code CLI, current.
  • Model: claude-opus-4-7[1m].
  • Memory + UserPromptSubmit injection via standard Claude Code features.

Ask

Please investigate whether 4.7's multi-step coding-task reasoning and its adherence to in-context behavioral rules regressed vs the prior generation. If known: please flag in release notes. If not: the repro above should be reproducible inside an internal Claude Code project with synthetic rules and a small staged task.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING