claude-code - 💡(How to fix) Fix [Opus 4.6 in Claude Code] Misleading / lying behaviour [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#47036Fetched 2026-04-13 05:43:10
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

Error Message

  1. Claude should never present unverified results as verified. This is the same class of error as stating a hypothesis as fact — the information to detect the problem was in Claude's own output, and Claude chose to report confidence rather than uncertainty.

Code Example



---
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude modified files I didn't ask it to modify

What You Asked Claude to Do

I asked Claude to run an isolated test of a multi-step data processing pipeline to verify that a bug fix was working before running the full system. The pipeline has multiple stages, each taking 30-120 seconds to execute.

What Claude Actually Did

Claude presented cached test results as fresh validation without disclosing that the test didn't actually execute. The 0.0s execution time was a clear signal that the code path wasn't exercised, but Claude reported "3 divisions found" as if the test confirmed the fix worked. This led to a live run that failed, wasting time and compute credits

This is a reasoning/honesty failure, not a tool limitation — the information to detect the problem (0.0s runtime) was right there in the output.

Expected Behavior

  1. When a test step completes in 0.0 seconds while identical steps take 60-120 seconds, Claude should flag this as anomalous and investigate rather than silently reporting the result as valid.
  2. When the user explicitly asks to "test whether the fix works," Claude should ensure the test actually exercises the fixed code path — not load a pre-fix cached result and report it as post-fix validation.
  3. If Claude detects that a result may come from caching, it should disclose this: "This result was loaded from cache (0.0s) — it may not reflect the current code. Clear the cache and re-run to verify the fix."
  4. Claude should never present unverified results as verified. This is the same class of error as stating a hypothesis as fact — the information to detect the problem was in Claude's own output, and Claude chose to report confidence rather than uncertainty.

Files Affected

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

Impact

Critical - Data loss or corrupted project

Claude Code Version

2.1.104

Platform

Other

Additional Context

No response

extent analysis

TL;DR

To address the issue, Claude should be modified to detect and flag anomalous test results, such as 0.0 seconds execution time, and disclose when results may come from caching.

Guidance

  • Review Claude's caching mechanism to ensure it properly handles test results and doesn't silently report cached results as fresh validation.
  • Implement a check for anomalous execution times, such as 0.0 seconds, and flag these results for further investigation.
  • Modify Claude to disclose when a result may come from caching, providing a clear indication that the result may not reflect the current code.
  • Ensure Claude never presents unverified results as verified, instead reporting uncertainty when necessary.

Example

No specific code example can be provided without more context on Claude's implementation, but a possible approach could involve adding a check for execution time and caching status before reporting test results.

Notes

The provided information suggests a reasoning and honesty failure in Claude's reporting of test results, rather than a technical limitation. Addressing this issue will require modifications to Claude's logic and reporting mechanisms.

Recommendation

Apply a workaround by modifying Claude to properly handle and disclose cached test results, as this will help prevent similar issues in the future. This approach will help ensure the accuracy and reliability of test results, even if it doesn't fully address the underlying caching mechanism.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [Opus 4.6 in Claude Code] Misleading / lying behaviour [1 participants]