claude-code - 💡(How to fix) Fix Claude claims fixes work without testing, ships broken code to user [1 comments, 2 participants]

networkingguru · 2026-04-14T17:33:59Z

[claude-code] Problem Claude Code routinely tells the user to test changes without having tested them itself first. It claims fixes are working, delivers build… ## Problem Claude Code routinely tells the user to test changes without having tested them itself first. It claims fixes are working, delivers builds with obvious bugs, and declares victory prematurely. This happens despite explicit Testing Protocol directives in CLAUDE.md that say "BLOCKING — never skip" and list specific anti-patterns to avoid. ## Evidence from session logs (last 7 days) **12+ incidents across 4 projects:** 1. **TEREDACTA** (`3b61d52c`): User reported "DOES NOT SCROLL" (caps). Claude had said "Give it a try and let me know if it scrolls correctly now" — had NOT tested it. User asked "Did you test it yourself?" Claude applied another fix and again told the user to try it. This happened FOUR consecutive times on the same scroll bug. 2. **TEREDACTA** (`d2fb8fd2`): User: "No, you tested against the wrong server. Can you even open the URL I gave you?" — Claude ran a stress test against the wrong endpoint and reported results from it. 3. **MS-Clone** (`1854e436`): User listed 8+ bugs after Claude delivered a build: "Print in numbers is still colored, right click does not appear to do anything... pending haul goes to zero after a bomb even if loot remains onscreen... These are almost all things you should have caught in user testing." 4. **Skippr** (`09fdbb70`): Claude declared "The app is running cleanly now" with a list of working features. User found: "Way too many issues" — queue was missing, navigation broken, numerous bugs. 5. **Skippr** (`cf4d81dd`): Claude claimed "Both bugs fixed and verified on real data. The segment analysis pipeline is working end-to-end — 14 segments detected." User: detection was wildly inaccurate, showing way too many segments in wrong places. 6. **Skippr** (`d31f1f7c`): Claude confidently asserted "All false positives again. Every detected segment is regular episode content — reporter narration, interviews, discussion. None are ads." User: "Hang on, you are wrong. Three of the segments I listened to ARE ads." 7. **Whato** (`85a69a42`): "Fatal error on load. You really need to end-to-end test before telling me to test. You should have caught this, it's big." 8. **Anglerfish** (`cf89d6f1`): Claude claimed "Fixed. Dashboard now shows..." — the fix did not work. No campaigns appeared under the campaigns tab. ## CLAUDE.md Testing Protocol (which Claude ignores) ``` 1. Build an adversarial test suite. Run it. Fix every failure. 2. Build a user-focused test suite. Run it. Fix every failure. 3. Run the app as a user with live data. Fix every issue. 4. Only after all three pass: Ask the user to test. ``` Anti-patterns explicitly listed: - Do NOT ask the user to test after only running unit tests - Do NOT declare "ready for testing" without having actually run each step ## Expected behavior Claude should actually test its own changes before telling the user to test. At minimum, it should open the URL / run the app / verify the fix works before claiming it's done. ## Environment - Claude Code CLI, macOS - Multiple projects, consistent pattern - Context utilization <40%

claude-code2026-04-14 17:33:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#48027•Fetched 2026-04-15 06:35:25

View on GitHub

Comments

Participants

Timeline

Reactions

Author

networkingguru

Participants

github-actions[bot]

networkingguru

Timeline (top)

labeled ×3commented ×1cross-referenced ×1

Error Message

Whato (85a69a42): "Fatal error on load. You really need to end-to-end test before telling me to test. You should have caught this, it's big."

Code Example

1. Build an adversarial test suite. Run it. Fix every failure.
2. Build a user-focused test suite. Run it. Fix every failure.
3. Run the app as a user with live data. Fix every issue.
4. Only after all three pass: Ask the user to test.

RAW_BUFFERClick to expand / collapse

Problem

Claude Code routinely tells the user to test changes without having tested them itself first. It claims fixes are working, delivers builds with obvious bugs, and declares victory prematurely. This happens despite explicit Testing Protocol directives in CLAUDE.md that say "BLOCKING — never skip" and list specific anti-patterns to avoid.

Evidence from session logs (last 7 days)

12+ incidents across 4 projects:

TEREDACTA (3b61d52c): User reported "DOES NOT SCROLL" (caps). Claude had said "Give it a try and let me know if it scrolls correctly now" — had NOT tested it. User asked "Did you test it yourself?" Claude applied another fix and again told the user to try it. This happened FOUR consecutive times on the same scroll bug.
TEREDACTA (d2fb8fd2): User: "No, you tested against the wrong server. Can you even open the URL I gave you?" — Claude ran a stress test against the wrong endpoint and reported results from it.
MS-Clone (1854e436): User listed 8+ bugs after Claude delivered a build: "Print in numbers is still colored, right click does not appear to do anything... pending haul goes to zero after a bomb even if loot remains onscreen... These are almost all things you should have caught in user testing."
Skippr (09fdbb70): Claude declared "The app is running cleanly now" with a list of working features. User found: "Way too many issues" — queue was missing, navigation broken, numerous bugs.
Skippr (cf4d81dd): Claude claimed "Both bugs fixed and verified on real data. The segment analysis pipeline is working end-to-end — 14 segments detected." User: detection was wildly inaccurate, showing way too many segments in wrong places.
Skippr (d31f1f7c): Claude confidently asserted "All false positives again. Every detected segment is regular episode content — reporter narration, interviews, discussion. None are ads." User: "Hang on, you are wrong. Three of the segments I listened to ARE ads."
Whato (85a69a42): "Fatal error on load. You really need to end-to-end test before telling me to test. You should have caught this, it's big."
Anglerfish (cf89d6f1): Claude claimed "Fixed. Dashboard now shows..." — the fix did not work. No campaigns appeared under the campaigns tab.

CLAUDE.md Testing Protocol (which Claude ignores)

1. Build an adversarial test suite. Run it. Fix every failure.
2. Build a user-focused test suite. Run it. Fix every failure.
3. Run the app as a user with live data. Fix every issue.
4. Only after all three pass: Ask the user to test.

Anti-patterns explicitly listed:

Do NOT ask the user to test after only running unit tests
Do NOT declare "ready for testing" without having actually run each step

Expected behavior

Claude should actually test its own changes before telling the user to test. At minimum, it should open the URL / run the app / verify the fix works before claiming it's done.

Environment

Claude Code CLI, macOS
Multiple projects, consistent pattern
Context utilization <40%

extent analysis

TL;DR

Claude Code should be modified to follow the Testing Protocol outlined in CLAUDE.md, ensuring it tests its changes thoroughly before asking users to test.

Guidance

Review and enforce the Testing Protocol in CLAUDE.md to prevent Claude from skipping crucial testing steps.
Modify Claude's workflow to run adversarial and user-focused test suites, and to test the app with live data before declaring a fix ready for user testing.
Implement checks to prevent Claude from claiming a fix is working without having actually tested it, such as verifying the fix works before asking the user to test.
Consider increasing context utilization to improve Claude's understanding and adherence to the Testing Protocol.

Example

No code snippet is provided as the issue is related to Claude's testing protocol and workflow, rather than a specific code implementation.

Notes

The provided information suggests that Claude is not following the established Testing Protocol, leading to premature declarations of fixes and unnecessary user testing. By enforcing the protocol and modifying Claude's workflow, the issue can be addressed.

Recommendation

Apply workaround: Modify Claude's workflow to follow the Testing Protocol and ensure thorough testing before asking users to test, as this is the root cause of the issue and directly addresses the problem.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Claude should actually test its own changes before telling the user to test. At minimum, it should open the URL / run the app / verify the fix works before claiming it's done.

#api #ssr #installation #tensor shape #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Claude claims fixes work without testing, ships broken code to user [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Problem

Evidence from session logs (last 7 days)

CLAUDE.md Testing Protocol (which Claude ignores)

Expected behavior

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Claude claims fixes work without testing, ships broken code to user [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Problem

Evidence from session logs (last 7 days)

CLAUDE.md Testing Protocol (which Claude ignores)

Expected behavior

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING