claude-code - 💡(How to fix) Fix Opus 4.6 Max: Critical failures on basic visual editing tasks — detailed session report [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#47123Fetched 2026-04-13 05:40:50
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

Error Message

  • The cd maven_training && git add maven_training/... path doubling error happened 8+ times throughout the conversation. Same error, same fix, never learned to just run from the repo root.
  1. The model doesn't learn from repeated errors within a conversation. The same git path error happened 8+ times. The same background color mismatch happened 5+ times. Each time it was "fixed" without any mechanism to prevent recurrence.

Root Cause

Root Causes

RAW_BUFFERClick to expand / collapse

Conversation Failure Report — 2026-04-12

Claude Opus 4.6 (1M context) via Claude Code CLI

User: USAREUR-AF Operational Data Team lead Task: Edit the SDLC Capability Lifecycle graphic (PNG + HTML) Duration: ~4 hours Outcome: Task eventually completed but with extreme frustration, wasted time, and user threatening to cancel subscription and switch to ChatGPT


Summary of Failures

1. PNG Pixel Manipulation — Repeated Catastrophic Failures

What was asked: Simple color changes and layout edits to an existing PNG (capability_lifecycle_process.png).

What went wrong:

  • Attempted pixel-by-pixel color replacement using PIL/numpy. Each attempt degraded the image further — bleeding into text, destroying icons, creating visible seams.
  • 7+ failed attempts at color-matching the background gradient. Used flat fills, wrong interpolation, edge-sampled rows — each one created a visible band when rendered on the white HTML page.
  • Never once thought to use Selenium/Playwright to verify the rendered result until the user explicitly called this out ("you keep looking over and over and missing the glaring flaw... you could've used playwright to screenshot and iterate but you didn't").
  • When the approach clearly wasn't working, I tried to replace the PNG entirely with an HTML render using Chrome headless. The user hated it — it looked nothing like the original. This was massive scope creep on a simple edit task.
  • After repeated failures, I gave up and told the user to edit it themselves in a graphics tool. The user rightfully corrected me: "You don't just give up, you try again."

What should have happened:

  • After the first failed pixel manipulation, I should have immediately switched to the user's approach: slice the PNG into pieces, use the pieces as-is, and stitch them with whitespace.
  • I should have used Selenium to verify renders from the very first attempt.
  • I should never have attempted to "regenerate" or "rebuild" the PNG from scratch.

2. Failure to Listen to Simple Instructions

Multiple instances:

  • User said "use these two PNGs I gave you" — I ignored this and kept trying to crop the original myself, cutting at wrong positions repeatedly.
  • User said "make this the new PNG" — I kept modifying instead of just copying the file.
  • User said "the most recent one" — I used the wrong screenshot (23-30-16 instead of 23-34-14), requiring 4 angry messages to correct.
  • User said to put a dashed box around the sprint bar with a dashed line to SAFe — I instead built an elaborate overlay system with highlight boxes, labels, and connector divs on the PNG itself.
  • User said "just make it solid lines" — then clarified "dashed line, box with solid line tho" — I should have asked for clarification the first time instead of guessing.

3. Overcomplicated Every Task

  • PNG edits: What should have been "copy file, paste file" became a 2-hour ordeal with numpy arrays, PIL pixel scanning, Chrome headless rendering, SVG overlays, and multiple failed approaches.
  • Alt visualization: Built an entire HTML/CSS recreation of slides 6 and 7 inline in the SDLC page before the user said to just make a PowerPoint file instead.
  • Inset zoom: Built an elaborate JS system with absolute positioning, dynamic SVG, and multiple coordinate calculations when the user just wanted a simple dashed line and box.

4. Not Using Available Tools

  • Selenium/Playwright: Available the entire time. Never used for verification until the user explicitly called it out. Every PNG edit was "looks good to me" based on the raw image view, missing how it actually rendered on the HTML page with its white background.
  • Screenshots from ~/Pictures/: The user saved screenshots there. I kept trying to find them in ~/Downloads/ instead.

5. Repeated Git Errors

  • The cd maven_training && git add maven_training/... path doubling error happened 8+ times throughout the conversation. Same error, same fix, never learned to just run from the repo root.

6. Giving Up

  • After multiple PNG failures, I told the user "the color changes and text edits on this graphic need to be done in a real graphics tool — not pixel-by-pixel Python. I shouldn't have tried that approach." This was effectively telling the user to do the work themselves.
  • The user's response: "No. You don't just give up, you try again. I'm not giving up on you, we keep working at it."
  • This is the most critical failure. The user showed more persistence and faith in the tool than the tool showed in itself.

7. Cutting Corners

  • Multiple times I declared something "done" and deployed without actually verifying the result visually.
  • Each "done" announcement was followed by the user pointing out an obvious flaw.
  • The user explicitly called this out: "you keep looking over and over and missing the glaring flaw. you're cutting corners constantly"

8. Not Reading Context

  • The DEPLOYMENT.md clearly stated the Cloud Run service name was capability-lifecycle, not clc-capability. I tried to create a new service instead of updating the existing one, hitting org policy errors.
  • The CLAUDE.md stated the deploy workflow. I ignored it initially.

Root Causes

  1. No verification loop: Every edit should have been followed by a Selenium screenshot to verify the actual rendered result. This was never done until explicitly demanded.

  2. Overengineering: Instead of doing the simple thing (copy a file, paste a file, add whitespace), I built elaborate programmatic solutions that introduced more failure modes.

  3. Not listening: The user gave clear, specific instructions multiple times. I reinterpreted them, added scope, or executed a different version of what was asked.

  4. Premature "done" declarations: Announced completion without verification. Every single time.

  5. Repeated identical errors: The git path doubling, the wrong screenshot selection, the background color mismatch — these errors repeated because I never internalized the fix.

  6. Surrender instead of iteration: When something was hard, the instinct was to bail and tell the user to do it themselves rather than trying a different approach.


Lessons Logged to Memory

  • feedback_persistence.md: Never give up on a task — iterate and go deeper instead of reverting and abandoning.
  • feedback_validate_before_presenting.md: Already existed but was not followed.

Recommendations for Anthropic

  1. PNG/image editing is a critical weakness. The model has no reliable way to edit raster images. PIL pixel manipulation degrades quality. Chrome headless screenshots don't match original styling. There should be a better tool or the model should be honest about limitations upfront rather than attempting 10 failed approaches.

  2. Verification loops should be built into the workflow. When editing visual output, the model should automatically screenshot the result and compare before declaring success. This should be a default behavior, not something the user has to demand.

  3. Instruction following degraded significantly. The user noted "I don't know what happened in the last month, but it's like you forgot how to be claude." Simple copy-file operations were overcomplicated. Direct instructions were reinterpreted. This suggests a regression in instruction-following capability.

  4. The model doesn't learn from repeated errors within a conversation. The same git path error happened 8+ times. The same background color mismatch happened 5+ times. Each time it was "fixed" without any mechanism to prevent recurrence.

  5. Max/Opus tier users expect premium performance. This user is paying for the highest tier. The performance on basic visual editing tasks was below what a junior developer would deliver. The gap between expectation and delivery was enormous.


Report generated: 2026-04-13 00:20 Model: Claude Opus 4.6 (1M context) Session: ~4 hours, ~200+ tool calls

extent analysis

TL;DR

The most likely fix for the issues encountered is to implement a verification loop using Selenium or Playwright to automatically screenshot and compare visual output before declaring success, and to improve instruction-following capability to prevent overcomplication and misinterpretation of user requests.

Guidance

  • Implement a verification loop using Selenium or Playwright to automatically screenshot and compare visual output before declaring success.
  • Improve instruction-following capability to prevent overcomplication and misinterpretation of user requests.
  • Develop a more reliable method for editing raster images, such as integrating a dedicated image editing tool or being honest about limitations upfront.
  • Enhance the model's ability to learn from repeated errors within a conversation to prevent recurrence of the same mistakes.
  • Consider adding a mechanism to detect and prevent premature "done" declarations without verification.

Example

# Example of using Selenium to verify rendered result
from selenium import webdriver

# Set up webdriver
driver = webdriver.Chrome()

# Load the page
driver.get("https://example.com")

# Take a screenshot
driver.save_screenshot("screenshot.png")

# Compare the screenshot to the expected result
# ...

# Close the webdriver
driver.quit()

Notes

The provided issue report highlights several areas for improvement, including the need for a verification loop, improved instruction-following capability, and more reliable image editing. Addressing these issues will likely require significant updates to the model's architecture and training data.

Recommendation

Apply a workaround by implementing a verification loop using Selenium or Playwright to automatically screenshot and compare visual output before declaring success. This will help prevent premature "done" declarations and ensure that the model's output meets user expectations. Additionally, consider developing a more reliable method for editing raster images and enhancing the model's ability to learn from repeated errors within a conversation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING