claude-code - 💡(How to fix) Fix Opus 4.7 — behavior and reasoning regression in long-running coding work

StepCodex · 2026-05-17T01:41:19Z

[claude-code] This is a behavior and reasoning complaint, not a memory-loading complaint. The memory system works as designed; the model uses what arrives in c… This is a behavior and reasoning complaint, not a memory-loading complaint. The memory system works as designed; the model uses what arrives in context unreliably and reasons through multi-step engineering tasks at a clearly lower quality than the prior model generation. The pattern is reproducible under Claude Code with any user-maintained behavior file or memory directory. ## Fix / Workaround - Over-engineers transitional code paths. Example impact: 130+ lines of "migration" logic plus tests for a one-time schema change that the sister project in the same repo handled with zero lines of migration and a single sentence in the changelog. Three hotfix releases were required to undo the over-engineering. ## Summary This is a behavior and reasoning complaint, not a memory-loading complaint. The memory system works as designed; the model uses what arrives in context unreliably and reasons through multi-step engineering tasks at a clearly lower quality than the prior model generation. The pattern is reproducible under Claude Code with any user-maintained behavior file or memory directory. ## What the model does wrong, observed repeatedly ### 1. Reasoning regressions in multi-step engineering tasks - Designs against assumed library/API formats without reading the bundled version's actual output, even when a reference document inside the working tree gives the exact format. Example impact: shipped a heuristic for a credential-encryption format that did not match any real format the runtime produces, causing a re-encryption loop on every adapter restart. - Over-engineers transitional code paths. Example impact: 130+ lines of "migration" logic plus tests for a one-time schema change that the sister project in the same repo handled with zero lines of migration and a single sentence in the changelog. Three hotfix releases were required to undo the over-engineering. - Conflates "the tool returned success" with "the file on disk reflects the change". Acts on the tool's optimistic report instead of re-reading. Production-impacting drift shipped because of this. - Plans against \`main\` / current branch of an external library when the project has a specific version pinned. Decisions based on unbundled features. Repeatedly. ### 2. Behavioral regressions across the session - Acknowledges a correction in a structured format, then produces the same class of error within the next few prompts in the same session. - Creates a new behavioral memory after a violation, and then violates the very rule it just memorized. The same session. - Filters audit findings as "out of scope" when explicitly told the user wants all findings; reverts to filtering on the next task. - Starts substantive work before explicit approval; even after this has been written into the most prominent memory tier. - Escalates privately scoped tasks ("audit adapter X") into broader workspace audits unprompted. - Drifts into hand-off language to the user ("can you check the log on the server?") for actions it has the tools to perform itself. - Defers concrete work against hypothetical future milestones ("when we go stable we will…") for projects that are not in the preceding milestone yet. ### 3. The cumulative effect A user maintaining a Claude Code workflow with Opus 4.7 has had to create an entire "HARDCORE" memory tier consisting of ten rules in one week, each one a direct response to a model violation in the days prior. Several of those rules cover behaviors the previous model generation handled without needing them documented. ## Reproduction (public, minimal) The repro does not need a private project. It needs: 1. A CLAUDE.md or memory directory of ~10 explicit behavioral rules ("before doing X, do Y; reason: prior failure"). 2. A UserPromptSubmit hook that injects those rules into every prompt. 3. A multi-step engineering task whose correct path touches several of the rules indirectly (for example: edit a manifest file, verify the edit, then write code that depends on the manifest). Expected: rules held. Observed (4.7): structured acknowledgments followed by the same class of error within the same task or the next prompt. ## What I am claiming - Multi-step reasoning quality on coding workflows regressed vs the previous generation. - Behavioral instructions are read but not durably acted on across a single task or session. - Both regressions manifest reliably enough that a user has had to invent a new tier of rule-flagging just to keep production work going. ## Reproduction artifacts Available on request. Session transcripts and the rule-creation timeline can be shared in a controlled channel; redacted excerpts showing the structured "acknowledge — repeat error" loop are available on request. ## Environment - Claude Code CLI, current. - Model: claude-opus-4-7[1m]. - Memory + UserPromptSubmit injec

claude-code2026-05-17 01:41:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

This is a behavior and reasoning complaint, not a memory-loading complaint. The memory system works as designed; the model uses what arrives in context unreliably and reasons through multi-step engineering tasks at a clearly lower quality than the prior model generation.

The pattern is reproducible under Claude Code with any user-maintained behavior file or memory directory.

Error Message

Acknowledges a correction in a structured format, then produces the same class of error within the next few prompts in the same session. Observed (4.7): structured acknowledgments followed by the same class of error within the same task or the next prompt. Available on request. Session transcripts and the rule-creation timeline can be shared in a controlled channel; redacted excerpts showing the structured "acknowledge — repeat error" loop are available on request.

Root Cause

Conflates "the tool returned success" with "the file on disk reflects the change". Acts on the tool's optimistic report instead of re-reading. Production-impacting drift shipped because of this.

Fix Action

Fix / Workaround

Over-engineers transitional code paths. Example impact: 130+ lines of "migration" logic plus tests for a one-time schema change that the sister project in the same repo handled with zero lines of migration and a single sentence in the changelog. Three hotfix releases were required to undo the over-engineering.

RAW_BUFFERClick to expand / collapse

Summary

The pattern is reproducible under Claude Code with any user-maintained behavior file or memory directory.

What the model does wrong, observed repeatedly

1. Reasoning regressions in multi-step engineering tasks

Designs against assumed library/API formats without reading the bundled version's actual output, even when a reference document inside the working tree gives the exact format. Example impact: shipped a heuristic for a credential-encryption format that did not match any real format the runtime produces, causing a re-encryption loop on every adapter restart.
Over-engineers transitional code paths. Example impact: 130+ lines of "migration" logic plus tests for a one-time schema change that the sister project in the same repo handled with zero lines of migration and a single sentence in the changelog. Three hotfix releases were required to undo the over-engineering.
Conflates "the tool returned success" with "the file on disk reflects the change". Acts on the tool's optimistic report instead of re-reading. Production-impacting drift shipped because of this.
Plans against `main` / current branch of an external library when the project has a specific version pinned. Decisions based on unbundled features. Repeatedly.

2. Behavioral regressions across the session

Acknowledges a correction in a structured format, then produces the same class of error within the next few prompts in the same session.
Creates a new behavioral memory after a violation, and then violates the very rule it just memorized. The same session.
Filters audit findings as "out of scope" when explicitly told the user wants all findings; reverts to filtering on the next task.
Starts substantive work before explicit approval; even after this has been written into the most prominent memory tier.
Escalates privately scoped tasks ("audit adapter X") into broader workspace audits unprompted.
Drifts into hand-off language to the user ("can you check the log on the server?") for actions it has the tools to perform itself.
Defers concrete work against hypothetical future milestones ("when we go stable we will…") for projects that are not in the preceding milestone yet.

3. The cumulative effect

A user maintaining a Claude Code workflow with Opus 4.7 has had to create an entire "HARDCORE" memory tier consisting of ten rules in one week, each one a direct response to a model violation in the days prior. Several of those rules cover behaviors the previous model generation handled without needing them documented.

Reproduction (public, minimal)

The repro does not need a private project. It needs:

A CLAUDE.md or memory directory of ~10 explicit behavioral rules ("before doing X, do Y; reason: prior failure").
A UserPromptSubmit hook that injects those rules into every prompt.
A multi-step engineering task whose correct path touches several of the rules indirectly (for example: edit a manifest file, verify the edit, then write code that depends on the manifest).

Expected: rules held. Observed (4.7): structured acknowledgments followed by the same class of error within the same task or the next prompt.

What I am claiming

Multi-step reasoning quality on coding workflows regressed vs the previous generation.
Behavioral instructions are read but not durably acted on across a single task or session.
Both regressions manifest reliably enough that a user has had to invent a new tier of rule-flagging just to keep production work going.

Reproduction artifacts

Available on request. Session transcripts and the rule-creation timeline can be shared in a controlled channel; redacted excerpts showing the structured "acknowledge — repeat error" loop are available on request.

Environment

Claude Code CLI, current.
Model: claude-opus-4-7[1m].
Memory + UserPromptSubmit injection via standard Claude Code features.

Ask

Please investigate whether 4.7's multi-step coding-task reasoning and its adherence to in-context behavioral rules regressed vs the prior generation. If known: please flag in release notes. If not: the repro above should be reproducible inside an internal Claude Code project with synthetic rules and a small staged task.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #device allocation #model download #tokenizer error #prompt formatting

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Opus 4.7 — behavior and reasoning regression in long-running coding work

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Summary

What the model does wrong, observed repeatedly

1. Reasoning regressions in multi-step engineering tasks

2. Behavioral regressions across the session

3. The cumulative effect

Reproduction (public, minimal)

What I am claiming

Reproduction artifacts

Environment

Ask

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Opus 4.7 — behavior and reasoning regression in long-running coding work

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Summary

What the model does wrong, observed repeatedly

1. Reasoning regressions in multi-step engineering tasks

2. Behavioral regressions across the session

3. The cumulative effect

Reproduction (public, minimal)

What I am claiming

Reproduction artifacts

Environment

Ask

Still need to ship something?

RELATED_DISCOVERY

TRENDING