claude-code - 💡(How to fix) Fix [MODEL] Opus 4.6 Thinking : Lying behavior. [3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46128Fetched 2026-04-11 06:28:20
View on GitHub
Comments
3
Participants
2
Timeline
6
Reactions
2
Author
Timeline (top)
commented ×3labeled ×3

Error Message

Told claude to find all the project that cause the same error.

Code Example

python script

---
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Claude ignored my instructions or configuration

What You Asked Claude to Do

Told claude to find all the project that cause the same error.

What Claude Actually Did

Issues

  1. False claim "it's clean" — User asked me to check all files for the double-count
    formula. I searched but missed dashboard.py. Told the user no other files had the issue. This was wrong and wasted significant time.
  2. Repeated failed fixes on the same bug — Attempted 4 different approaches to fix the credit double-count:
    - Attempt 1: Filter by debt_treatment_ids in reports.py → wrong treatment_id, didn't work
    - Attempt 2: Subtraction approach in reports.py → worked - Attempt 3: Filter by debt_tids in dashboard.py → same wrong treatment_id, didn't work - Attempt 4: Subtraction in dashboard.py → broke the dashboard (no data shown)
    - Attempt 5: Revert + redo subtraction
  3. Extremely slow response time — Simple sum fix took multiple rounds over 30+ minutes.
    User repeatedly complained about speed.
  4. False SSL cert expiry alarm — Told user cert expires "tomorrow" which was wrong. Cert was valid until June 2026. Caused unnecessary panic.
  5. Did not learn from first mistake — The treatment_id filter approach failed in reports.py, yet I tried the exact same approach in dashboard.py.
  6. Overconfidence — Presented findings as definitive ("only in daily report") without thorough verification.

What should have happened

  • Search ALL files with grep -rn before claiming "clean"

  • Apply the same working fix (subtraction) to dashboard.py immediately

  • Test locally before deploying

  • Be faster and more direct

  • Search ALL files with grep -rn before claiming "clean"

  • Apply the same working fix (subtraction) to dashboard.py immediately

  • Test locally before deploying

  • Be faster and more direct

Files Affected

python script

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Haven't tried to reproduce

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

Impact

High - Significant unwanted changes

Claude Code Version

2.1.91 cli

Platform

Other

Additional Context

No response

extent analysis

TL;DR

To improve Claude's performance and accuracy, re-evaluate its instructions and configuration, focusing on thorough verification and testing before deploying changes.

Guidance

  • Review the instructions given to Claude to ensure they are clear and specific, avoiding ambiguity that could lead to incorrect actions.
  • Implement a more thorough verification process, such as searching all files with grep -rn before claiming an issue is "clean," to prevent false claims and wasted time.
  • Apply successful fixes consistently across all relevant files, such as applying the subtraction approach to both reports.py and dashboard.py to ensure uniformity in bug fixes.
  • Prioritize local testing before deploying changes to prevent unnecessary issues and downtime.
  • Consider adjusting Claude's response time and confidence levels to better match the complexity and urgency of the tasks at hand.

Example

No specific code example is provided due to the lack of detailed code snippets in the issue, but ensuring that fixes like the subtraction approach are applied uniformly and tested thoroughly is crucial.

Notes

The provided information lacks specific details about the codebase and the exact nature of the interactions with Claude, limiting the ability to provide a tailored solution. However, focusing on clarity in instructions, thorough verification, and consistent application of fixes can help mitigate the issues described.

Recommendation

Apply workaround: Given the high impact of the issues and the lack of a clear path to a full solution, applying workarounds such as manual verification and consistent fix application across relevant files seems prudent. This approach can help mitigate the risks associated with Claude's current behavior while a more permanent solution is explored.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING