claude-code - 💡(How to fix) Fix [MODEL] Quantified evidence: Sonnet 4.6 quality regression since March 9 — 1400+ frustration events across 50 sessions [3 comments, 2 participants]

claude-code2026-04-12 09:44:05

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#46935•Fetched 2026-04-12 13:29:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

woolkingx

Participants

github-actions[bot]

woolkingx

Timeline (top)

commented ×3labeled ×3

I have quantified data showing a dramatic quality regression starting the week of March 9, 2026. This is not a vibe check — it's measured from 60 days of conversation logs across 50 sessions.

Error Message

Stop repeating the same mistake — same error 5–8 times per session without self-correction

Root Cause

I have quantified data showing a dramatic quality regression starting the week of March 9, 2026. This is not a vibe check — it's measured from 60 days of conversation logs across 50 sessions.

Code Example

W09 (Feb 24)    26  ██                                ← baseline
W10 (Mar 02)    23  ██                                ← baseline
W11 (Mar 09)   221  ██████████████████████            ← started (8.5x baseline)
W12 (Mar 16)   484  ████████████████████████████████  ← peak (outage week)
W13 (Mar 23)   479  ████████████████████████████████  ← still bad
W14 (Mar 30)    65  ██████                            ← relax
W15 (Apr 06)   294  █████████████████████████████     ← under way

RAW_BUFFERClick to expand / collapse

Summary

I have quantified data showing a dramatic quality regression starting the week of March 9, 2026. This is not a vibe check — it's measured from 60 days of conversation logs across 50 sessions.

Data

I track how often I need to repeat instructions or correct Claude (my "WTF frequency"). Here's the weekly distribution:

60-day data across 50 sessions:

W09 (Feb 24)    26  ██                                ← baseline
W10 (Mar 02)    23  ██                                ← baseline
W11 (Mar 09)   221  ██████████████████████            ← started (8.5x baseline)
W12 (Mar 16)   484  ████████████████████████████████  ← peak (outage week)
W13 (Mar 23)   479  ████████████████████████████████  ← still bad
W14 (Mar 30)    65  ██████                            ← relax
W15 (Apr 06)   294  █████████████████████████████     ← under way

Baseline (W09–W10): ~25/week
Peak (W12–W13): ~480/week — 19x baseline
Total across 50 sessions: 1,400+ events

Affected models

I've been forced to switch from Sonnet to Opus as my primary model. Sonnet 4.6 is basically unusable now. My subjective rating of current model quality:

Opus 4.6 now = Sonnet 4.6 before
Sonnet 4.6 now = Haiku before
Haiku = Haiku (unchanged — nothing left to degrade)

This means I'm paying Opus prices for what used to be Sonnet-level performance.

What "regression" looks like in practice

The model consistently fails to:

Follow its own reasoning loop (OODA) despite explicit CLAUDE.md instructions
Read files before modifying them — guesses instead
Stop repeating the same mistake — same error 5–8 times per session without self-correction
Follow explicit behavioral constraints across sessions (#41217)

Timeline alignment

The W12 peak (March 16) aligns exactly with:

The March 17 outage acknowledged on status.claude.com
r/ClaudeCode thread "After the outage today, does Claude feel dumber?"
Multiple GitHub issues filed the same week (#37052, #35271, #32290)

Environment

Claude Code CLI v2.1.94
Linux
bypassPermissions mode
Heavy skill/hook usage with structured CLAUDE.md rules

What I'm asking

Acknowledge that model quality has regressed — the data is clear
Explain whether this is a compute constraint issue (as widely suspected) or a checkpoint/RLHF regression
Provide a model versioning mechanism so users can pin to a known-good checkpoint

Related issues

#46106 Opus 4.6 is getting dumber
#41217 Systematic Failure to Follow Explicit Behavioral Constraints
#42542 Silent context degradation
#38338 Opus 4.6 acts dumber than Sonnet 3.5
#37052 Claude Code model regression
#21046 Opus 4.5 "Shadow Downgrade"

extent analysis

TL;DR

The most likely fix is to provide a model versioning mechanism to allow users to pin to a known-good checkpoint, addressing the suspected compute constraint or checkpoint/RLHF regression issue.

Guidance

Investigate the relationship between the March 17 outage and the model quality regression, as the timeline alignment suggests a potential causal link.
Review the differences in behavior between Sonnet 4.6 and Opus 4.6, as the subjective rating suggests a significant degradation in Sonnet's performance.
Consider implementing a rollback or fallback mechanism to a previous model version, allowing users to temporarily bypass the regressed model until a fix is available.
Examine the structured CLAUDE.md rules and heavy skill/hook usage to determine if there are any potential interactions or conflicts contributing to the model's failure to follow its own reasoning loop or respect explicit behavioral constraints.

Example

No specific code snippet can be provided without further information on the model's implementation details. However, a potential approach could involve introducing versioning or checkpointing mechanisms to the model, allowing users to specify a particular version or checkpoint to use.

Notes

The provided data and timeline alignment suggest a strong correlation between the outage and the model quality regression. However, without further information on the model's internal workings or the specifics of the outage, it is difficult to determine the root cause of the issue.

Recommendation

Apply a workaround by implementing a model versioning mechanism, allowing users to pin to a known-good checkpoint. This would provide a temporary solution until the underlying cause of the regression can be fully addressed and resolved.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#authentication setup #request error #file not found #serialization error #model compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [MODEL] Quantified evidence: Sonnet 4.6 quality regression since March 9 — 1400+ frustration events across 50 sessions [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Data

Affected models

What "regression" looks like in practice

Timeline alignment

Environment

What I'm asking

Related issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [MODEL] Quantified evidence: Sonnet 4.6 quality regression since March 9 — 1400+ frustration events across 50 sessions [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Data

Affected models

What "regression" looks like in practice

Timeline alignment

Environment

What I'm asking

Related issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING