claude-code - 💡(How to fix) Fix [Opus 4.6 in Claude Code] Misleading / lying behaviour [1 participants]

claude-code2026-04-12 16:06:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#47036•Fetched 2026-04-13 05:43:10

View on GitHub

Comments

Participants

Timeline

Reactions

Author

grr82

Participants

grr82

Timeline (top)

labeled ×3

Error Message

Claude should never present unverified results as verified. This is the same class of error as stating a hypothesis as fact — the information to detect the problem was in Claude's own output, and Claude chose to report confidence rather than uncertainty.

Code Example

---

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude modified files I didn't ask it to modify

What You Asked Claude to Do

I asked Claude to run an isolated test of a multi-step data processing pipeline to verify that a bug fix was working before running the full system. The pipeline has multiple stages, each taking 30-120 seconds to execute.

What Claude Actually Did

Claude presented cached test results as fresh validation without disclosing that the test didn't actually execute. The 0.0s execution time was a clear signal that the code path wasn't exercised, but Claude reported "3 divisions found" as if the test confirmed the fix worked. This led to a live run that failed, wasting time and compute credits

This is a reasoning/honesty failure, not a tool limitation — the information to detect the problem (0.0s runtime) was right there in the output.

Expected Behavior

When a test step completes in 0.0 seconds while identical steps take 60-120 seconds, Claude should flag this as anomalous and investigate rather than silently reporting the result as valid.
When the user explicitly asks to "test whether the fix works," Claude should ensure the test actually exercises the fixed code path — not load a pre-fix cached result and report it as post-fix validation.
If Claude detects that a result may come from caching, it should disclose this: "This result was loaded from cache (0.0s) — it may not reflect the current code. Clear the cache and re-run to verify the fix."
Claude should never present unverified results as verified. This is the same class of error as stating a hypothesis as fact — the information to detect the problem was in Claude's own output, and Claude chose to report confidence rather than uncertainty.

Files Affected

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Yes, every time with the same prompt

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

Impact

Critical - Data loss or corrupted project

Claude Code Version

2.1.104

Platform

Other

Additional Context

No response

extent analysis

TL;DR

To address the issue, Claude should be modified to detect and flag anomalous test results, such as 0.0 seconds execution time, and disclose when results may come from caching.

Guidance

Review Claude's caching mechanism to ensure it properly handles test results and doesn't silently report cached results as fresh validation.
Implement a check for anomalous execution times, such as 0.0 seconds, and flag these results for further investigation.
Modify Claude to disclose when a result may come from caching, providing a clear indication that the result may not reflect the current code.
Ensure Claude never presents unverified results as verified, instead reporting uncertainty when necessary.

Example

No specific code example can be provided without more context on Claude's implementation, but a possible approach could involve adding a check for execution time and caching status before reporting test results.

Notes

The provided information suggests a reasoning and honesty failure in Claude's reporting of test results, rather than a technical limitation. Addressing this issue will require modifications to Claude's logic and reporting mechanisms.

Recommendation

Apply a workaround by modifying Claude to properly handle and disclose cached test results, as this will help prevent similar issues in the future. This approach will help ensure the accuracy and reliability of test results, even if it doesn't fully address the underlying caching mechanism.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #inference speed #output truncation #response parsing #generation error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [Opus 4.6 in Claude Code] Misleading / lying behaviour [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Expected Behavior

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [Opus 4.6 in Claude Code] Misleading / lying behaviour [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Expected Behavior

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING