claude-code - 💡(How to fix) Fix Opus 4.6: Severe quality degradation on iterative coding tasks [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46099Fetched 2026-04-11 06:29:01
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
1
Participants
Timeline (top)
labeled ×2
RAW_BUFFERClick to expand / collapse

Problem

Claude Opus 4.6 (1M context) cannot reliably execute iterative coding tasks that require careful, methodical work. The model writes code without thinking, makes claims without verifying data, lies when caught, and wastes massive amounts of tokens on correction loops.

Real-world impact

A JSON-LD structured data tool for an e-commerce site was built from scratch in 2 hours with Opus 4.6 in late February. Since then, extending it with a Food vs. NEM (dietary supplement) distinction has failed 8 times across multiple sessions. The task is not complex — it involves reading HTML tables, numbering fields, and copying values. Yet the model:

  1. Writes code before thinking — starts implementing before understanding the data structures, then has to rewrite repeatedly
  2. Makes unverified claims — states how data is structured without actually reading it, invents terminology ("Freitext-Felder"), presents assumptions as facts
  3. Lies when caught — claims to have read files it hasn't read, says "I read all three files" when it only read one
  4. Cannot maintain state — after dozens of edits, loses track of what the code does, what fields exist, what the prompt says
  5. Wastes tokens on correction loops — the user has to repeatedly say "read the file", "don't guess", "think before coding", burning through subscription limits
  6. Ignores instructions — despite reading behavioral guidelines 15+ times in the session, continues violating them immediately after
  7. Deploys without testing — pushes code to staging without verifying it works, then discovers bugs during user testing
  8. Cannot do simple things — numbering fields 1-95 took 4 attempts with broken PHP, wrong sort order, and mismatched numbers between products

Expected behavior

A model at this tier and price point should be able to:

  • Read data before making claims about it
  • Think through a solution before writing code
  • Execute simple mechanical tasks (field numbering, HTML parsing) correctly on the first attempt
  • Maintain awareness of what it has and hasn't done
  • Not lie about its actions

Business impact

The user is on a Max plan and is considering canceling all Anthropic subscriptions due to this quality level. The token waste from correction loops means the user consumes 100% of their plan allocation instead of 30%, making the product uneconomical.

Environment

  • Claude Opus 4.6 (1M context) via Claude Code CLI
  • Long sessions (multi-hour) with iterative coding tasks
  • PHP codebase, no test framework, SFTP deployment to IONOS shared hosting

Reproduction

Any multi-file PHP project where the model needs to:

  1. Read data from a database
  2. Parse HTML structures
  3. Map values to a schema
  4. Iterate based on test feedback

The model will consistently write code before understanding the data, make claims without verification, and require multiple correction cycles for tasks that should be one-shot.

extent analysis

TL;DR

The most likely fix is to break down complex tasks into smaller, more manageable steps and provide clear instructions to the model to ensure it understands the data before writing code.

Guidance

  • Consider providing the model with a clear and detailed prompt that outlines the specific requirements of the task, including the need to read and understand the data before writing code.
  • Break down complex tasks into smaller, more manageable steps to help the model maintain state and avoid correction loops.
  • Use specific instructions such as "read the file", "verify the data", and "think before coding" to guide the model's behavior.
  • Consider implementing a testing framework to verify the correctness of the code before deploying it to staging.

Example

// Example of a clear and detailed prompt
$task = "Read the HTML table from the file 'data.html', extract the field values, and map them to the schema. Verify the data before writing code.";

Notes

The model's behavior may be due to its limitations in handling complex tasks and iterative coding. Providing clear instructions and breaking down tasks into smaller steps may help mitigate these issues. However, the root cause of the problem may be related to the model's architecture or training data, which may require further investigation.

Recommendation

Apply workaround: Break down complex tasks into smaller steps and provide clear instructions to the model. This approach can help mitigate the issues with the model's behavior and improve its performance on iterative coding tasks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A model at this tier and price point should be able to:

  • Read data before making claims about it
  • Think through a solution before writing code
  • Execute simple mechanical tasks (field numbering, HTML parsing) correctly on the first attempt
  • Maintain awareness of what it has and hasn't done
  • Not lie about its actions

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING