claude-code - 💡(How to fix) Fix [MODEL] Opus 4.6 Thinking : Lying behavior. [3 comments, 2 participants]

claude-code2026-04-10 07:59:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#46128•Fetched 2026-04-11 06:28:20

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jinlaohu

Participants

jinlaohu

LeeSangMin1029

Timeline (top)

commented ×3labeled ×3

Error Message

Told claude to find all the project that cause the same error.

Code Example

python script

---

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues for similar behavior reports
This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude ignored my instructions or configuration

What You Asked Claude to Do

Told claude to find all the project that cause the same error.

What Claude Actually Did

Issues

False claim "it's clean" — User asked me to check all files for the double-count
formula. I searched but missed dashboard.py. Told the user no other files had the issue. This was wrong and wasted significant time.
Repeated failed fixes on the same bug — Attempted 4 different approaches to fix the credit double-count:
- Attempt 1: Filter by debt_treatment_ids in reports.py → wrong treatment_id, didn't work
- Attempt 2: Subtraction approach in reports.py → worked - Attempt 3: Filter by debt_tids in dashboard.py → same wrong treatment_id, didn't work - Attempt 4: Subtraction in dashboard.py → broke the dashboard (no data shown)
- Attempt 5: Revert + redo subtraction
Extremely slow response time — Simple sum fix took multiple rounds over 30+ minutes.
User repeatedly complained about speed.
False SSL cert expiry alarm — Told user cert expires "tomorrow" which was wrong. Cert was valid until June 2026. Caused unnecessary panic.
Did not learn from first mistake — The treatment_id filter approach failed in reports.py, yet I tried the exact same approach in dashboard.py.
Overconfidence — Presented findings as definitive ("only in daily report") without thorough verification.

What should have happened

Search ALL files with grep -rn before claiming "clean"
Apply the same working fix (subtraction) to dashboard.py immediately
Test locally before deploying
Be faster and more direct
Search ALL files with grep -rn before claiming "clean"
Apply the same working fix (subtraction) to dashboard.py immediately
Test locally before deploying
Be faster and more direct

Files Affected

python script

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Haven't tried to reproduce

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

Impact

High - Significant unwanted changes

Claude Code Version

2.1.91 cli

Platform

Other

Additional Context

No response

extent analysis

TL;DR

To improve Claude's performance and accuracy, re-evaluate its instructions and configuration, focusing on thorough verification and testing before deploying changes.

Guidance

Review the instructions given to Claude to ensure they are clear and specific, avoiding ambiguity that could lead to incorrect actions.
Implement a more thorough verification process, such as searching all files with grep -rn before claiming an issue is "clean," to prevent false claims and wasted time.
Apply successful fixes consistently across all relevant files, such as applying the subtraction approach to both reports.py and dashboard.py to ensure uniformity in bug fixes.
Prioritize local testing before deploying changes to prevent unnecessary issues and downtime.
Consider adjusting Claude's response time and confidence levels to better match the complexity and urgency of the tasks at hand.

Example

No specific code example is provided due to the lack of detailed code snippets in the issue, but ensuring that fixes like the subtraction approach are applied uniformly and tested thoroughly is crucial.

Notes

The provided information lacks specific details about the codebase and the exact nature of the interactions with Claude, limiting the ability to provide a tailored solution. However, focusing on clarity in instructions, thorough verification, and consistent application of fixes can help mitigate the issues described.

Recommendation

Apply workaround: Given the high impact of the issues and the lack of a clear path to a full solution, applying workarounds such as manual verification and consistent fix application across relevant files seems prudent. This approach can help mitigate the risks associated with Claude's current behavior while a more permanent solution is explored.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #SSR setup #ISR setup #authentication setup #request error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [MODEL] Opus 4.6 Thinking : Lying behavior. [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [MODEL] Opus 4.6 Thinking : Lying behavior. [3 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Preflight Checklist

Type of Behavior Issue

What You Asked Claude to Do

What Claude Actually Did

Files Affected

Permission Mode

Can You Reproduce This?

Steps to Reproduce

Claude Model

Relevant Conversation

Impact

Claude Code Version

Platform

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING