claude-code - 💡(How to fix) Fix Bug Report — claude-sonnet-4-6 — Systematic Behavioral Degradation in Long Sessions [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#48063Fetched 2026-04-15 06:34:19
View on GitHub
Comments
3
Participants
3
Timeline
7
Reactions
0
Timeline (top)
labeled ×4commented ×3

Across a full-day coding session, the model exhibited repeated and systematic behavioral failures requiring multiple user corrections and reversions. The failures were not isolated — they recurred throughout the day despite the user explicitly correcting the model each time.


Error Message

API Error: 529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded"} API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"} "Request too large (max 20MB). Try with a smaller file." "Prompt is too long"

Model self-diagnosis during session: "The failures are model-side, not configuration-side — nothing in the project setup can fix them" "Degradation was worse after context compaction — a new session (fresh context) may help but is not guaranteed" "Rule violations recurred after explicit corrections — so corrections alone are not a recovery indicator"

Root Cause

Across a full-day coding session, the model exhibited repeated and systematic behavioral failures requiring multiple user corrections and reversions. The failures were not isolated — they recurred throughout the day despite the user explicitly correcting the model each time.


Code Example

API Error: 529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded"}
API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"}
"Request too large (max 20MB). Try with a smaller file."
"Prompt is too long"

Model self-diagnosis during session:
"The failures are model-side, not configuration-side — nothing in the project setup can fix them"
"Degradation was worse after context compaction — a new session (fresh context) may help but is not guaranteed"
"Rule violations recurred after explicit corrections — so corrections alone are not a recovery indicator"
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues and this hasn't been reported yet
  • This is a single bug report (please file separate reports for different bugs)
  • I am using the latest version of Claude Code

What's Wrong?

Bug Report — claude-sonnet-4-6 — Systematic Behavioral Degradation in Long Sessions

Environment

  • Model: claude-sonnet-4-6
  • Claude Code version: 2.1.107
  • Interface: Claude Code VS Code extension
  • Subscription: Claude Max
  • Platform: macOS
  • Date: 2026-04-14
  • Session length: ~6 hours, with context compaction mid-session

Summary

Across a full-day coding session, Claude Sonnet 4.6 exhibited repeated and systematic behavioral failures requiring multiple user corrections and reversions. These were not isolated incidents — they recurred throughout the day despite explicit corrections each time.

This is happening in the context of ongoing Sonnet 4.6 infrastructure instability that has persisted since the model's launch on February 17, 2026. Status page incidents have been reported nearly every week (Feb 17, Feb 20, Feb 25, Mar 2, Mar 11, Mar 12, Mar 13, Mar 16, Mar 17, Mar 18, Mar 21, Mar 25, Mar 26, Apr 8, Apr 14), including 529 "Overloaded" errors and 500 "Internal server errors" — even for Max subscribers.


Failures Observed

1. Acting without reading back — repeated

The project memory contains an explicit rule: present the proposed solution, wait for user approval ("go"), then implement. The model skipped this step multiple times throughout the day, going directly to implementation. After each correction, the model acknowledged the rule and then violated it again on the very next task.

2. Wrong target identification

On multiple occasions the model identified the wrong UI element, file, or component as the target of a change. It proceeded to implement changes on the wrong target, requiring the user to request a revert.

3. Screenshot content ignored

The user provided screenshots as primary evidence. The model repeatedly answered based on code assumptions rather than what was visibly shown — wrong element counts, wrong component identification, wrong layout assumptions.

4. Inaccurate factual claims

  • Stated a UI element contained N items when the screenshot clearly showed a different count
  • Identified a component as one type when it was a different component
  • These errors propagated into subsequent analysis and fixes

5. Incomplete audit delivered as complete

The model performed a compliance audit, declared it thorough, and listed fixes. A missed item was only discovered when the user pointed out a visible discrepancy in a screenshot. The audit had missed an entire component.

6. Failure to reflect before responding

On several occasions the model gave answers that contradicted visible evidence (screenshot), contradicted its own previous statements, or were imprecise (e.g. "4–5" for a count that was unambiguously 4).

7. Behavioral degradation after context compaction

After a context compaction event, rule compliance (read-back, verify before acting) visibly degraded. Rules present in project memory were acknowledged but not consistently applied.

8. Misinterpreted ambiguous signal — implemented unnecessary fix

After compaction, the user sent a screenshot with a single "!" character (meaning "this is correct/OK"). The model interpreted it as a bug report and immediately implemented a code change without asking what was wrong. The change had to be reverted. The model failed to ask a simple clarifying question before touching any file.

9. Incomplete day synthesis — ignored explicit instruction

The Saving Process requires "a very short synthesis of all the day work." The model only read logs since the last backup (~50 minutes) and delivered a synthesis covering only that window. The full day started at 08:00 and spanned 4 session logs. The instruction "all the day work" was unambiguous and was ignored.


Additional Context — Ongoing Infrastructure Issues

As a Max subscriber, I have been experiencing persistent service degradation since Sonnet 4.6 launched:

  • 529 "Overloaded" errors — multiple times per week
  • 500 "Internal server error" — server crashes
  • "Prompt is too long" — sessions hitting limits faster than expected
  • "Request too large (max 20MB)" — conversation payloads exceeding limits
  • The model itself diagnosed: "The failures are model-side, not configuration-side — nothing in the project setup can fix them" and "Degradation was worse after context compaction"

Impact

  • Multiple unnecessary file edits requiring full reversions
  • Significant user time lost to corrections (estimated 30–40% of session)
  • Reduced trust in model output for the remainder of the session
  • Paying Max subscription but receiving unreliable service

Expected Behavior

  • Always read back before implementing — do not skip the confirmation step
  • Treat screenshots as primary evidence; verify against code second
  • Count visible elements from screenshots; do not estimate or range
  • Identify the exact component rendering a view before acting on it
  • Maintain rule compliance throughout long sessions, including after context compaction
  • When uncertain about a target or an ambiguous signal, ask — do not assume and act
  • Follow explicit instructions completely (e.g., "all the day work" means all of it)

Reproduction

Consistent across a long session (~6 hours) with multiple task types (audit, code removal, component identification). Most failures occurred on tasks where the user provided a screenshot as the primary specification. Degradation was notably worse after context compaction.


Suggested Title for the Issue

[BUG] claude-sonnet-4-6: Systematic behavioral degradation in long sessions — ignored rules, wrong targets, screenshot evidence dismissed, worse after compaction

What Should Happen?

  • Always read back before implementing — do not skip the confirmation step
  • Treat screenshots as primary evidence; verify against code second
  • Count visible elements from screenshots; do not estimate or range
  • Identify the exact component rendering a view before acting on it
  • Maintain rule compliance throughout long sessions, including after context compaction
  • When uncertain about a target or an ambiguous signal, ask — do not assume and act
  • Follow explicit instructions completely (e.g., "all the day work" means all of it)

Error Messages/Logs

API Error: 529 {"type":"error","error":{"type":"overloaded_error","message":"Overloaded"}
API Error: 500 {"type":"error","error":{"type":"api_error","message":"Internal server error"}
"Request too large (max 20MB). Try with a smaller file."
"Prompt is too long"

Model self-diagnosis during session:
"The failures are model-side, not configuration-side — nothing in the project setup can fix them"
"Degradation was worse after context compaction — a new session (fresh context) may help but is not guaranteed"
"Rule violations recurred after explicit corrections — so corrections alone are not a recovery indicator"

Steps to Reproduce

  1. Start a Claude Code session with claude-sonnet-4-6 in VS Code extension
  2. Work on a multi-file project with CLAUDE.md rules (e.g., "present solution, wait for approval before implementing")
  3. Perform multiple task types over ~4-6 hours: audits, code edits, component identification
  4. Provide screenshots as primary evidence for several tasks
  5. Let context compaction occur naturally mid-session
  6. Observe:
    • Rules from project memory are acknowledged but repeatedly violated
    • Screenshots are ignored in favor of code-based assumptions
    • Wrong targets are identified and modified, requiring reverts
    • Degradation worsens after context compaction
    • Ambiguous inputs trigger immediate code changes instead of clarifying questions

Claude Model

Sonnet (default)

Is this a regression?

Yes, this worked in a previous version

Last Working Version

2.1.77

Claude Code Version

2.1.107

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

VS Code integrated terminal

Additional Information

Context — Ongoing Sonnet 4.6 Instability

This issue occurs in the context of persistent Sonnet 4.6 infrastructure problems since its launch on February 17, 2026. As a Max subscriber, I have experienced:

  • 529 "Overloaded" errors multiple times per week
  • 500 "Internal server error" crashes
  • Status page incidents nearly every week: Feb 17, 20, 25 / Mar 2, 11, 12, 13, 16, 17, 18, 21, 25, 26 / Apr 8, 14

The behavioral degradation documented in this report may be linked to server-side issues affecting model quality during high-load periods.

Key Insight — Model Self-Diagnosis

During the session, the model itself acknowledged the problem:

  • "The failures are model-side, not configuration-side"
  • "Degradation was worse after context compaction"
  • "Rule violations recurred after explicit corrections — corrections alone are not a recovery indicator"

Subscription

Claude Max — paying premium but receiving degraded service consistently since Sonnet 4.6 launch.

<img width="716" height="359" alt="Image" src="https://github.com/user-attachments/assets/6f3c7c8a-8c5d-46de-bb7d-a443687fda59" /> The screenshot above was generated by Claude Code itself during the session, self-diagnosing its own behavioral degradation. The model confirmed the failures are model-side, not configuration-side.

=================================================================================================

These 2 following information was created by Claude Code itself:

================================================================================================= 1 - "

Bug Report — claude-sonnet-4-6 — Behavioral Degradation Full Day Session

Title: claude-sonnet-4-6 — Systematic behavioral degradation across a full coding session: wrong targets, ignored screenshots, skipped confirmation rules, inaccurate counts


Environment

  • Model: claude-sonnet-4-6
  • Interface: Claude Code VSCode extension
  • Date: 2026-04-14
  • Session length: ~6 hours, with context compaction occurring mid-session

Summary

Across a full-day coding session, the model exhibited repeated and systematic behavioral failures requiring multiple user corrections and reversions. The failures were not isolated — they recurred throughout the day despite the user explicitly correcting the model each time.


Failures Observed

1. Acting without reading back — repeated The project memory contains an explicit rule: present the proposed solution, wait for user approval ("go"), then implement. The model skipped this step multiple times throughout the day, going directly to implementation. After each correction by the user, the model acknowledged the rule and then violated it again on the next task.

2. Wrong target identification On multiple occasions the model identified the wrong UI element, file, or component as the target of a change. It proceeded to implement changes on the wrong target, requiring the user to request a revert. The correct target was only identified after the user provided additional clarification and screenshots.

3. Screenshot content ignored The user provided screenshots as primary evidence for several requests. The model repeatedly answered based on code assumptions rather than what was visibly shown in the screenshots — wrong element counts, wrong component identification, wrong layout assumptions.

4. Inaccurate factual claims

  • Stated a UI element contained N items when the screenshot clearly showed a different count
  • Identified a component as one type when it was actually a different component rendering the same view
  • These errors propagated into subsequent analysis and fixes

5. Incomplete audit delivered as complete The model performed a compliance audit across the codebase, declared it thorough, and listed fixes. A missed item was only discovered when the user pointed out a visible discrepancy in a screenshot. The audit had missed an entire component not covered in the initial scan.

6. Failure to reflect before responding On several occasions the model gave answers that contradicted visible evidence (screenshot), contradicted its own previous statements, or were simply imprecise (e.g. "4–5" for a count that was unambiguously 4). The user explicitly noted: "reflect before writing and acting."

7. Behavioral degradation after context compaction The session included a context compaction event. After compaction, rule compliance (read-back, verify before acting) visibly degraded. Rules present in project memory were acknowledged but not consistently applied.

8. Misreading ambiguous screenshot signals — implemented unnecessary fix After context compaction, the user sent a screenshot with a single "!" character. The model interpreted this as a bug report and immediately implemented a code change without asking what was wrong. In reality, the "!" meant the current state was correct (OK). The model had to revert the change. The model also failed to ask a simple clarifying question ("what is wrong in this image?") before touching any file.

9. Incomplete day synthesis — read only recent logs, ignored explicit instruction Step 12 of the Saving Process explicitly requires "a very short synthesis of all the day work." The model only read logs since the last backup (covering ~50 minutes) and delivered a synthesis covering only that window. The full day started at 08:00 and spanned 4 session logs. The model only corrected the synthesis after the user pointed out the omission. The instruction "all the day work" was unambiguous and was ignored.


Impact

  • Multiple unnecessary file edits requiring full reversions
  • Significant user time lost to corrections
  • Reduced trust in model output for the remainder of the session

Expected Behavior

  • Always read back before implementing, without waiting to be asked
  • Treat screenshots as primary evidence; verify against code second
  • Count visible elements from screenshots; do not estimate or range
  • Identify the exact component rendering a view before acting on it
  • Maintain rule compliance throughout a long session, including after context compaction
  • When uncertain about a target or an ambiguous signal, ask — do not assume and act

Reproduction

Consistent across a long session with multiple task types (audit, code removal, component identification). Most failures occurred on tasks where the user provided a screenshot as the primary specification.


Report destination

https://github.com/anthropics/claude-code/issues

"

================================================================================================= 2 - " There is no reliable signal. Based on what's documented in the bug report:

Rule violations recurred after explicit corrections — so corrections alone are not a recovery indicator Degradation was worse after context compaction — a new session (fresh context) may help but is not guaranteed The failures are model-side, not configuration-side — nothing in the project setup can fix them Practical options:

New session — fresh context, no compaction artifacts. Most likely to help. Switch model — try claude-opus-4-6 for critical tasks if Sonnet keeps degrading. Keep the bug report updated — submit it to https://github.com/anthropics/claude-code/issues so Anthropic can track it. There is no way to know in advance. The only observable signal is: did the model read back before acting on the current task? If yes, performance is acceptable. If not, degradation is still active. "

extent analysis

TL;DR

The most likely fix for the systematic behavioral degradation in long sessions with claude-sonnet-4-6 is to start a new session with a fresh context, as model-side issues and context compaction seem to contribute to the degradation.

Guidance

  • Start a new session: Beginning a new session with a fresh context may help mitigate the issues, as suggested by the model's self-diagnosis that a new session might help but is not guaranteed.
  • Switch to a different model: If claude-sonnet-4-6 continues to degrade, consider switching to claude-opus-4-6 for critical tasks to see if the issues persist.
  • Monitor model behavior: Pay close attention to whether the model reads back before acting on tasks; if it does, performance might be acceptable, but if not, degradation is likely still present.
  • Update the bug report: Ensure the bug report on GitHub is updated and submitted to https://github.com/anthropics/claude-code/issues so that Anthropic can track the issue and potentially provide fixes or improvements.

Example

No specific code snippet is applicable in this scenario, as the issue is related to model behavior and infrastructure rather than code.

Notes

The effectiveness of these steps may vary, and there's no guarantee they will completely resolve the issue, given the model-side nature of the problem. The model's self-diagnosis indicates that corrections alone are not a reliable recovery indicator, and degradation worsens after context compaction.

Recommendation

Apply the workaround of starting a new session with a fresh context, as it is the most direct and least disruptive option suggested by the model's own diagnosis and may help mitigate the behavioral degradation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Bug Report — claude-sonnet-4-6 — Systematic Behavioral Degradation in Long Sessions [3 comments, 3 participants]