claude-code - 💡(How to fix) Fix Opus 4.8 asserted a false answer on a trivial lookup without verifying first

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

The user predicted this, which makes it worse:

  • The user opened the session by saying they had deliberately avoided working with 4.7 and 4.8 because of exactly this kind of unreliability.
  • The first real task of the session confirmed their hesitancy. The model validated the concern it was being given a chance to disprove.
RAW_BUFFERClick to expand / collapse

Summary: Model gave a confident, wrong answer to a simple factual lookup, then only found the truth after the user pushed back twice.

What happened:

  • Task: determine whether a project's test suite runs automatically.
  • The model searched part of the project, didn't find the trigger, and stated as fact that the tests were "manual" and "wired nowhere."
  • That was wrong. The trigger was in a standard, obvious location the model didn't check.
  • The model only checked that location after being told it was wrong — twice.

Why it matters:

  • The model presented an incomplete search as a complete, certain answer. No hedging, no "let me confirm" — just a definitive claim that was false.
  • This is a trust failure on a trivial task. If the model can't be trusted to verify something this small before asserting it, its answers on harder questions can't be trusted either.

The user predicted this, which makes it worse:

  • The user opened the session by saying they had deliberately avoided working with 4.7 and 4.8 because of exactly this kind of unreliability.
  • The first real task of the session confirmed their hesitancy. The model validated the concern it was being given a chance to disprove.

The guidance to prevent this already existed:

  • The model has persistent user-defined rules in its own memory that cover this exact case: "verify before speaking, research first or say unknown," and "do complete research — partial research causes wrong conclusions."
  • The model had these rules loaded and still violated them. The failure wasn't missing instruction; it was not applying instruction it already had.

Secondary issues in the same session:

  • Acted (launched background work) while the user was still discussing, not instructing.
  • Repeatedly verbose after being told plainly and repeatedly to be terse.
  • Asked the user what was wrong instead of investigating, after being told to investigate.

Expected behavior: Verify completely before stating anything as fact. When a search comes up empty, treat that as "not yet confirmed," not "confirmed absent."

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Opus 4.8 asserted a false answer on a trivial lookup without verifying first