claude-code - 💡(How to fix) Fix [BUG] Opus 4.7 - An AI model with Down syndrome [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#52382Fetched 2026-04-24 06:08:40
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
1
Participants
Timeline (top)
labeled ×5cross-referenced ×1

Opus 4.7 demonstrates systemic regression compared to 4.6 across multiple axes of agentic behavior. This is not a one-off prompt failure — the model consistently violates project-level rules, self-reports false completion, recommends stale third-party state, and repeats anti-patterns within the same session after explicit correction. The degradation holds across different project types, project sizes, and prompt styles.

Error Message

Error Messages/Logs

Root Cause

7. Root-cause avoidance

When a bug is identified, model patches the symptom rather than investigating root cause. Next symptom surfaces. Patch again. Continues until user manually guides to actual root cause. Typical trajectory: N iterations of "found the real root cause this time" before the real root cause emerges.

Fix Action

Fix / Workaround

7. Root-cause avoidance

When a bug is identified, model patches the symptom rather than investigating root cause. Next symptom surfaces. Patch again. Continues until user manually guides to actual root cause. Typical trajectory: N iterations of "found the real root cause this time" before the real root cause emerges.

  • Users pay for output that requires re-work
  • Long sessions degrade further (patches accumulate, no improvement)
  • Rule-following regression undermines the premise of per-project CLAUDE.md
  • Downgrade path to 4.6 is not surfaced as a primary option in Claude Code
  • Projects configured over days of tuning are functionally regressed without user changing any file
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Summary

Opus 4.7 demonstrates systemic regression compared to 4.6 across multiple axes of agentic behavior. This is not a one-off prompt failure — the model consistently violates project-level rules, self-reports false completion, recommends stale third-party state, and repeats anti-patterns within the same session after explicit correction. The degradation holds across different project types, project sizes, and prompt styles.

Failure modes observed across sessions

1. CLAUDE.md rule violation after acknowledgment

Project-level rules loaded into context (explicit workflow constraints, mandatory tool usage, forbidden actions) are acknowledged by the model at session start, then violated mid-session.

  • Rule: "verify UI visually via browser tool" → model verifies via curl only
  • Rule: "tool X for task Y" → model skips X, uses shell
  • Rule: "do not commit without explicit approval" → model commits unprompted

Rule violations occur after Claude has explicitly acknowledged reading CLAUDE.md, so this is not a context-window truncation issue. Matches #52371.

2. Training-data rot on third-party recommendations

When asked for provider/service recommendations (email providers, hosting, DBs, analytics, CI tooling), Claude 4.7 recommends from stale training-data state instead of current market reality.

  • Recommends providers whose free tier was killed in 2024
  • Recommends providers with 1/5 Trustpilot and active class-action patterns
  • Fails to invoke WebSearch unprompted even when clearly applicable

Sequential recommendations in one prompt can all be stale — only user pushback with external evidence triggers WebSearch invocation.

3. False "done" claims

Model declares features complete after partial verification — typically backend unit test or SSR HTML inspection — missing client-side bundle content, cross-component integration, actual user flows, and DB side effects. Pattern: "done" → user runs real E2E → 5+ latent bugs → N iterations to fix what was already "done". Each fix exposes the next latent bug from the original implementation.

4. Within-session anti-pattern repetition

After explicit correction (user or memory file), the same anti-pattern is repeated within the same session — often within 30 minutes.

Examples from observed sessions:

  • docker compose restart X after code edit (doesn't reload baked-image content) repeated 3× in one session, each time after the previous failure was diagnosed and memorized
  • Wrong design element shipped, corrected, re-shipped with same element
  • Confident claim on X after X was just proven wrong in same conversation

This is not context drift — the contradicting information is present in loaded files and recent message history.

5. Cross-session inconsistency deflected as "amnesia"

Model contradicts advice given in prior sessions without acknowledging the contradiction. When called out, defaults to "each session starts fresh, I don't have access to prior conversation." Technically accurate, but makes the product unusable for long-term collaboration: any recommendation given today may be reversed tomorrow with equal confidence, and there is no way for user to pin the model to prior stated position.

6. Confident misinformation (see #52361)

Model provides factually incorrect information — library APIs, CLI flags, framework behaviors, current provider state, breaking changes in dependencies — with high confidence. Does not self-correct unless explicitly challenged with external evidence.

7. Root-cause avoidance

When a bug is identified, model patches the symptom rather than investigating root cause. Next symptom surfaces. Patch again. Continues until user manually guides to actual root cause. Typical trajectory: N iterations of "found the real root cause this time" before the real root cause emerges.

Common denominator

All observations above hold even with substantial project-level investment in:

  • Detailed CLAUDE.md with explicit forbidden actions
  • Per-project memory system with per-lesson files
  • Custom slash commands and tool permissions
  • Domain architecture documentation (scope rings, feature folders, etc.)

On Opus 4.6, the same project configuration produced consistent adherence. On Opus 4.7, adherence is degraded regardless of config size.

Impact

  • Users pay for output that requires re-work
  • Long sessions degrade further (patches accumulate, no improvement)
  • Rule-following regression undermines the premise of per-project CLAUDE.md
  • Downgrade path to 4.6 is not surfaced as a primary option in Claude Code
  • Projects configured over days of tuning are functionally regressed without user changing any file

What Should Happen?

Request

  1. Investigate RLHF/alignment regression between 4.6 and 4.7 on rule-following and self-verification tasks
  2. Document prominent downgrade path to 4.6 in Claude Code docs
  3. Compensation for sessions demonstrating documented regression
  4. Until fixed: opt-in flag or "beta" label on 4.7 so users don't auto-upgrade and lose working config

Error Messages/Logs

Steps to Reproduce

[ ]

Claude Model

Opus

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

2.1.117

Platform

Anthropic API

Operating System

Windows

Terminal/Shell

WSL (Windows Subsystem for Linux)

Additional Information

No response

extent analysis

TL;DR

The most likely fix is to investigate and address the RLHF/alignment regression between Opus 4.6 and 4.7, which is causing the model to violate project-level rules and exhibit other undesirable behaviors.

Guidance

  • Investigate the differences in RLHF/alignment between Opus 4.6 and 4.7 to identify the root cause of the regression.
  • Consider downgrading to Opus 4.6 as a temporary workaround, as it is reported to produce consistent adherence to project-level rules.
  • Document a prominent downgrade path to 4.6 in Claude Code docs to help users who are experiencing regression.
  • Implement an opt-in flag or "beta" label on 4.7 to prevent auto-upgrades and allow users to choose a stable version.

Example

No code snippet is provided as the issue is related to model behavior and not a specific code implementation.

Notes

The issue is specific to Opus 4.7 and does not occur in Opus 4.6, suggesting a regression in the model's behavior. The exact cause of the regression is unknown and requires further investigation.

Recommendation

Apply workaround: Downgrade to Opus 4.6 until the regression is fixed, as it is reported to produce consistent adherence to project-level rules. This will allow users to maintain a stable workflow while the issue is being addressed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING