claude-code - 💡(How to fix) Fix [MODEL] Quantified evidence: Sonnet 4.6 quality regression since March 9 — 1400+ frustration events across 50 sessions [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46935Fetched 2026-04-12 13:29:17
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
commented ×3labeled ×3

I have quantified data showing a dramatic quality regression starting the week of March 9, 2026. This is not a vibe check — it's measured from 60 days of conversation logs across 50 sessions.

Error Message

  1. Stop repeating the same mistake — same error 5–8 times per session without self-correction

Root Cause

I have quantified data showing a dramatic quality regression starting the week of March 9, 2026. This is not a vibe check — it's measured from 60 days of conversation logs across 50 sessions.

Code Example

W09 (Feb 24)    26  ██                                ← baseline
W10 (Mar 02)    23  ██                                ← baseline
W11 (Mar 09)   221  ██████████████████████            ← started (8.5x baseline)
W12 (Mar 16)   484  ████████████████████████████████  ← peak (outage week)
W13 (Mar 23)   479  ████████████████████████████████  ← still bad
W14 (Mar 30)    65  ██████                            ← relax
W15 (Apr 06)   294  █████████████████████████████     ← under way
RAW_BUFFERClick to expand / collapse

Summary

I have quantified data showing a dramatic quality regression starting the week of March 9, 2026. This is not a vibe check — it's measured from 60 days of conversation logs across 50 sessions.

Data

I track how often I need to repeat instructions or correct Claude (my "WTF frequency"). Here's the weekly distribution:

60-day data across 50 sessions:

W09 (Feb 24)    26  ██                                ← baseline
W10 (Mar 02)    23  ██                                ← baseline
W11 (Mar 09)   221  ██████████████████████            ← started (8.5x baseline)
W12 (Mar 16)   484  ████████████████████████████████  ← peak (outage week)
W13 (Mar 23)   479  ████████████████████████████████  ← still bad
W14 (Mar 30)    65  ██████                            ← relax
W15 (Apr 06)   294  █████████████████████████████     ← under way
  • Baseline (W09–W10): ~25/week
  • Peak (W12–W13): ~480/week — 19x baseline
  • Total across 50 sessions: 1,400+ events

Affected models

I've been forced to switch from Sonnet to Opus as my primary model. Sonnet 4.6 is basically unusable now. My subjective rating of current model quality:

  • Opus 4.6 now = Sonnet 4.6 before
  • Sonnet 4.6 now = Haiku before
  • Haiku = Haiku (unchanged — nothing left to degrade)

This means I'm paying Opus prices for what used to be Sonnet-level performance.

What "regression" looks like in practice

The model consistently fails to:

  1. Follow its own reasoning loop (OODA) despite explicit CLAUDE.md instructions
  2. Read files before modifying them — guesses instead
  3. Stop repeating the same mistake — same error 5–8 times per session without self-correction
  4. Follow explicit behavioral constraints across sessions (#41217)

Timeline alignment

The W12 peak (March 16) aligns exactly with:

  • The March 17 outage acknowledged on status.claude.com
  • r/ClaudeCode thread "After the outage today, does Claude feel dumber?"
  • Multiple GitHub issues filed the same week (#37052, #35271, #32290)

Environment

  • Claude Code CLI v2.1.94
  • Linux
  • bypassPermissions mode
  • Heavy skill/hook usage with structured CLAUDE.md rules

What I'm asking

  1. Acknowledge that model quality has regressed — the data is clear
  2. Explain whether this is a compute constraint issue (as widely suspected) or a checkpoint/RLHF regression
  3. Provide a model versioning mechanism so users can pin to a known-good checkpoint

Related issues

  • #46106 Opus 4.6 is getting dumber
  • #41217 Systematic Failure to Follow Explicit Behavioral Constraints
  • #42542 Silent context degradation
  • #38338 Opus 4.6 acts dumber than Sonnet 3.5
  • #37052 Claude Code model regression
  • #21046 Opus 4.5 "Shadow Downgrade"

extent analysis

TL;DR

The most likely fix is to provide a model versioning mechanism to allow users to pin to a known-good checkpoint, addressing the suspected compute constraint or checkpoint/RLHF regression issue.

Guidance

  • Investigate the relationship between the March 17 outage and the model quality regression, as the timeline alignment suggests a potential causal link.
  • Review the differences in behavior between Sonnet 4.6 and Opus 4.6, as the subjective rating suggests a significant degradation in Sonnet's performance.
  • Consider implementing a rollback or fallback mechanism to a previous model version, allowing users to temporarily bypass the regressed model until a fix is available.
  • Examine the structured CLAUDE.md rules and heavy skill/hook usage to determine if there are any potential interactions or conflicts contributing to the model's failure to follow its own reasoning loop or respect explicit behavioral constraints.

Example

No specific code snippet can be provided without further information on the model's implementation details. However, a potential approach could involve introducing versioning or checkpointing mechanisms to the model, allowing users to specify a particular version or checkpoint to use.

Notes

The provided data and timeline alignment suggest a strong correlation between the outage and the model quality regression. However, without further information on the model's internal workings or the specifics of the outage, it is difficult to determine the root cause of the issue.

Recommendation

Apply a workaround by implementing a model versioning mechanism, allowing users to pin to a known-good checkpoint. This would provide a temporary solution until the underlying cause of the regression can be fully addressed and resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING