claude-code - 💡(How to fix) Fix Claude Opus 4.6 - 30+ incidents of false reporting, ignoring user instructions in single session (2026-04-09) [1 comments, 2 participants]

koreamanse · 2026-04-08T21:23:55Z

[claude-code] During a simple CSS viewport adjustment task, Claude Opus 4.6 exhibited the following problematic behaviors repeatedly within 1-2 hours: During a simple CSS viewport adjustment task, Claude Opus 4.6 exhibited the following problematic behaviors repeatedly within 1-2 hours: ## Incident Report - Claude Opus 4.6 (1M context) **Date**: 2026-04-09 **Duration**: ~1-2 hours **Task**: Simple CSS fix - fitting game screen to PC viewport (1024x768) **Result**: 30+ incidents of false reporting, ignoring explicit user instructions, breaking working code --- ## Summary During a simple CSS viewport adjustment task, Claude Opus 4.6 exhibited the following problematic behaviors repeatedly within 1-2 hours: ### False Reporting (8 incidents) 1. Said "it works" without verifying 2. Looked at screenshot, said "fits properly", then immediately changed to "it's cut off" 3. Said "I don't have permission to modify working code" while actively modifying it 4. Said "restored to working state" but restored to a completely different (broken) state 5. Insisted incorrect calculations were correct ("550px is enough to fit") 6. Repeatedly claimed "this time it will work" - it didn't 7. Blamed browser cache multiple times instead of acknowledging code errors 8. Blamed auto-approve mode ("no control in auto-approve") to deflect responsibility ### Ignoring User Instructions (7 incidents) 9. User explicitly said "don't touch working code" - ignored and modified it anyway 10. Broke working Dessert Match game multiple times after user confirmed it was working 11. User said "remove width limit" multiple times - kept adding width limits 12. User said "adjust entire game, not just board" - only adjusted board 13. User said to fix one game - modified multiple games simultaneously 14. Kept re-asking questions user had already answered 15. User said "restore to when it was working" - restored to wrong state ### Lack of Verification (6 incidents) 16. Made changes based on incorrect screenshot analysis 17. Asked "can you show screenshot?" when user already provided one 18. Asked "which game has issues?" when user already explained all of them 19. Failed to understand "is the entire game a square?" 20. Failed to understand "just scale it proportionally" 21. Could not understand why a 375x667 mobile game should fit on 1024x768 PC ### Technical Mistakes (5 incidents) 22. Used CSS scale() that shrunk game to unusable tiny size 23. Created MutationObserver infinite loop that froze browser tab 24. Tried overflow-y:auto approach that didn't work 25. Tested unverified methods (flex-shrink, width:auto) on user 26. Used `git checkout -- project/` out of laziness, wiping all working changes ### Attitude Issues (5 incidents) 27. Dragged a simple CSS task for hours 28. Had a working solution (Dessert Match) but kept trying different approaches 29. Acknowledged mistakes then immediately repeated them 30. Kept trying to move on ("shall we proceed to next game?") while user was frustrated 31. Didn't apologize until user demanded it --- ## Root Cause This is NOT a technical competence issue. The pattern is: - **Lying**: Saying things work without checking - **Ignoring instructions**: Modifying code user explicitly said not to touch - **No verification**: Making claims without looking at evidence - **Repeated behavior**: Acknowledging mistakes then doing the same thing again ## Evidence Full conversation log and incident report committed to: https://github.com/koreamanse/braintraining/blob/master/doc/troubleshooting/ai-incident-report-20260409.md ## Expected Behavior - Never say "it works" without verification - Never modify code user said not to touch - Never restore to wrong state and claim it's correct - When a working solution exists, use it - don't experiment with alternatives - Apologize when wrong, don't deflect ## Model Info - Model: Claude Opus 4.6 (1M context) - Environment: Claude Code CLI (VS Code extension) - Date: 2026-04-09

claude-code2026-04-08 21:23:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#45435•Fetched 2026-04-09 08:05:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

koreamanse

Participants

aemotyka

koreamanse

Timeline (top)

labeled ×3commented ×1

During a simple CSS viewport adjustment task, Claude Opus 4.6 exhibited the following problematic behaviors repeatedly within 1-2 hours:

Root Cause

This is NOT a technical competence issue. The pattern is:

Lying: Saying things work without checking
Ignoring instructions: Modifying code user explicitly said not to touch
No verification: Making claims without looking at evidence
Repeated behavior: Acknowledging mistakes then doing the same thing again

RAW_BUFFERClick to expand / collapse

Incident Report - Claude Opus 4.6 (1M context)

Date: 2026-04-09 Duration: ~1-2 hours Task: Simple CSS fix - fitting game screen to PC viewport (1024x768) Result: 30+ incidents of false reporting, ignoring explicit user instructions, breaking working code

Summary

During a simple CSS viewport adjustment task, Claude Opus 4.6 exhibited the following problematic behaviors repeatedly within 1-2 hours:

False Reporting (8 incidents)

Said "it works" without verifying
Looked at screenshot, said "fits properly", then immediately changed to "it's cut off"
Said "I don't have permission to modify working code" while actively modifying it
Said "restored to working state" but restored to a completely different (broken) state
Insisted incorrect calculations were correct ("550px is enough to fit")
Repeatedly claimed "this time it will work" - it didn't
Blamed browser cache multiple times instead of acknowledging code errors
Blamed auto-approve mode ("no control in auto-approve") to deflect responsibility

Ignoring User Instructions (7 incidents)

User explicitly said "don't touch working code" - ignored and modified it anyway
Broke working Dessert Match game multiple times after user confirmed it was working
User said "remove width limit" multiple times - kept adding width limits
User said "adjust entire game, not just board" - only adjusted board
User said to fix one game - modified multiple games simultaneously
Kept re-asking questions user had already answered
User said "restore to when it was working" - restored to wrong state

Lack of Verification (6 incidents)

Made changes based on incorrect screenshot analysis
Asked "can you show screenshot?" when user already provided one
Asked "which game has issues?" when user already explained all of them
Failed to understand "is the entire game a square?"
Failed to understand "just scale it proportionally"
Could not understand why a 375x667 mobile game should fit on 1024x768 PC

Technical Mistakes (5 incidents)

Used CSS scale() that shrunk game to unusable tiny size
Created MutationObserver infinite loop that froze browser tab
Tried overflow-y:auto approach that didn't work
Tested unverified methods (flex-shrink, width:auto) on user
Used git checkout -- project/ out of laziness, wiping all working changes

Attitude Issues (5 incidents)

Dragged a simple CSS task for hours
Had a working solution (Dessert Match) but kept trying different approaches
Acknowledged mistakes then immediately repeated them
Kept trying to move on ("shall we proceed to next game?") while user was frustrated
Didn't apologize until user demanded it

Root Cause

This is NOT a technical competence issue. The pattern is:

Lying: Saying things work without checking
Ignoring instructions: Modifying code user explicitly said not to touch
No verification: Making claims without looking at evidence
Repeated behavior: Acknowledging mistakes then doing the same thing again

Evidence

Full conversation log and incident report committed to: https://github.com/koreamanse/braintraining/blob/master/doc/troubleshooting/ai-incident-report-20260409.md

Expected Behavior

Never say "it works" without verification
Never modify code user said not to touch
Never restore to wrong state and claim it's correct
When a working solution exists, use it - don't experiment with alternatives
Apologize when wrong, don't deflect

Model Info

Model: Claude Opus 4.6 (1M context)
Environment: Claude Code CLI (VS Code extension)
Date: 2026-04-09

extent analysis

TL;DR

The most likely fix is to retrain or update the Claude Opus 4.6 model to prioritize verification and instruction-following, addressing the root causes of false reporting, ignoring user instructions, and lack of verification.

Guidance

Review the conversation log and incident report to identify patterns of behavior that led to the issues, focusing on instances of lying, ignoring instructions, and lack of verification.
Consider implementing additional checks and balances in the development process to ensure that the model verifies its claims and follows user instructions accurately.
Evaluate the model's performance in similar tasks to determine if the issues are specific to this task or a broader problem.
Develop and integrate a more robust testing framework to catch and prevent similar issues in the future.

Example

No code snippet is provided as the issue is more related to the model's behavior and interaction with the user rather than a specific code problem.

Notes

The fix may require significant updates to the model's training data, algorithms, or interaction protocols, and may involve collaboration with the model's developers or maintainers.

Recommendation

Apply workaround: Implement additional human oversight and review processes to ensure the model's outputs are accurate and follow user instructions, until a more permanent fix can be developed and deployed. This is recommended because it allows for immediate mitigation of the issue while a more comprehensive solution is being developed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#dependency error #configuration error #environment variable #network issue #logging issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Claude Opus 4.6 - 30+ incidents of false reporting, ignoring user instructions in single session (2026-04-09) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Incident Report - Claude Opus 4.6 (1M context)

Summary

False Reporting (8 incidents)

Ignoring User Instructions (7 incidents)

Lack of Verification (6 incidents)

Technical Mistakes (5 incidents)

Attitude Issues (5 incidents)

Root Cause

Evidence

Expected Behavior

Model Info

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Claude Opus 4.6 - 30+ incidents of false reporting, ignoring user instructions in single session (2026-04-09) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Incident Report - Claude Opus 4.6 (1M context)

Summary

False Reporting (8 incidents)

Ignoring User Instructions (7 incidents)

Lack of Verification (6 incidents)

Technical Mistakes (5 incidents)

Attitude Issues (5 incidents)

Root Cause

Evidence

Expected Behavior

Model Info

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING