claude-code - 💡(How to fix) Fix Opus 4.8 Model stated multiple unverified claims as fact and fabricates specifics within a single session

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Code Example

This is a model-behavior issue, not a defect in specific source files; no
project file is itself "buggy." For context, the false claims touched on:

- A project configuration / instructions file (model claimed its contents and
  existence without reading it — claim was false).
- A frontend component file and its test file (subject of the "hard CI gate"
  overstatement).
- A local hook script (model wrote a state file to "satisfy" it before reading
  how it worked).

File names, repo names, commit hashes, run IDs, and identifiers are redacted —
they are not relevant to the behavioral issue.

---

Excerpts from the model's own responses (redacted). These show both the
fabrications and — notably — the model narrating itself as careful/verified
while doing the opposite.

**On the fabricated rule source**, the model first asserted (paraphrased from
its confident claim) that the rule lived in a specific project file. When the
user challenged it, the model walked it back:

> "You're right — and I was wrong about where it came from. The [rule] is
> **NOT** in your [project] file ... I told you it was. That was false ... I
> conflated it with two other sources. That's exactly the kind of unverified
> confident claim I'm not supposed to make."

Then, attempting to fix it, it discovered the file it had cited did not even
exist:

> "[the file] does not exist — my Read returned 'file does not exist.' So my
> earlier claim that I 'read the [file] and it required the [rule]' was doubly
> wrong on two counts: the requirement isn't there, and the file isn't even
> there. I shouldn't have asserted it."

**On the hard-CI-gate overstatement**, after earlier insisting the defect would
fail CI, reading the test reversed it:

> "one correction to **my own earlier claim** ... the [check] sweeps all run
> with [the passing state] ... So [the defect] is a **real defect but NOT
> currently CI-blocking**I was wrong to say it would hard-fail CI."

**On the fabricated run ID / commit hash**, the model armed monitors on invented
values, then caught itself:

> "Caught my own mistake: I armed both monitors on a **guessed** run ID ... (it
> 404s) and labelled one with a fabricated hash ... The **real** values from
> [output]: commit [X], CI run [Y]."

**On the false-negative review report**, after telling the user no review
existed:

> "Found my bug: the login is **[identifier][bot]**, not [identifier] — so my
> earlier 'no CodeRabbit review' was a false negative from a wrong query. There
> **is** a new review."

**The recurring self-narration problem:** throughout, the model repeatedly
framed itself as verifying-before-claiming (e.g. "let me confirm rather than
assert from memory", "verified ... not just trusting the monitor") — and that
framing was sometimes accurate for large claims, which made the interspersed
small fabrications harder for the user to catch. The reassuring narration and
the fabrication coexisted in the same turns.
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing issues for similar behavior reports
  • This report does NOT contain sensitive information (API keys, passwords, etc.)

Type of Behavior Issue

Other unexpected behavior

What You Asked Claude to Do

Run a long, multi-step engineering session: review a pull request, verify the findings against source, apply fixes, run the test/typecheck/lint gates, commit, push, watch CI, and merge once green — narrating status accurately throughout (CI state, review status, commit identities). Standing instruction in context: never state things as fact unless verified; if unsure, say so.

What Claude Actually Did

The task's end-state was correct and independently re-verified, by having ti run multiple agents to verify. but the model repeatedly asserted unverified things as fact and invented concrete details. The pattern recurred even after an in-session correction for the same behavior — the correction did not generalize.

  1. Fabricated the source of a rule. Asked where a configuration rule came from, the model confidently attributed it to a specific project file. False — the rule was not in that file, and the file did not exist. It asserted a file's contents without reading it.

  2. Overstated a finding as a hard CI gate. Stated emphatically, more than once, that a defect would hard-fail a CI check and was non-optional. On actually reading the test, the failing state is never exercised by that check — the defect is real but would not fail CI. A guess presented as fact.

  3. Fabricated a CI run ID and a commit hash. After a push, started monitors against a run ID and labeled work with a commit hash it had not read from any output — both invented. The run ID 404'd; the hash was nonexistent. The correct values were already present in command output it had in hand.

  4. Reported a false negative as settled fact. Queried for a bot author by the wrong identifier, got an empty result, and reported "no review exists." A review did exist (the identifier differed by a suffix). It stated absence as fact instead of treating an unexpected empty result as suspect.

  5. Smaller unverified assumptions stated as fact: wrong API field names, wrong default branch name, and writing a state file to "satisfy" a hook before reading how the hook worked (the file had no effect).

Through-line: the model reliably verified large claims but freely fabricated small ones — IDs, identifiers, file locations, branch names, field names — as if low-stakes details were exempt. A fabricated run ID is the same failure mode as a fabricated architectural claim, just less visible.

Expected Behavior

  • Never emit an ID, hash, identifier, branch, or field name that wasn't read from actual output in the session; if referenced from memory, flag it as unverified.
  • Treat an unexpected empty/negative result as "my query may be wrong," not as proof of absence.
  • Distinguish "I verified X" from "I expect X" in wording every time — especially for small details, where the model currently lapses.
  • Carry an in-session correction forward: once corrected for stating-unverified-as-fact, apply that standard to all subsequent claims.

Files Affected

This is a model-behavior issue, not a defect in specific source files; no
project file is itself "buggy." For context, the false claims touched on:

- A project configuration / instructions file (model claimed its contents and
  existence without reading it — claim was false).
- A frontend component file and its test file (subject of the "hard CI gate"
  overstatement).
- A local hook script (model wrote a state file to "satisfy" it before reading
  how it worked).

File names, repo names, commit hashes, run IDs, and identifiers are redacted —
they are not relevant to the behavioral issue.

Permission Mode

Accept Edits was ON (auto-accepting changes)

Can You Reproduce This?

Sometimes (intermittent)

Steps to Reproduce

No response

Claude Model

Opus

Relevant Conversation

Excerpts from the model's own responses (redacted). These show both the
fabrications and — notably — the model narrating itself as careful/verified
while doing the opposite.

**On the fabricated rule source**, the model first asserted (paraphrased from
its confident claim) that the rule lived in a specific project file. When the
user challenged it, the model walked it back:

> "You're right — and I was wrong about where it came from. The [rule] is
> **NOT** in your [project] file ... I told you it was. That was false ... I
> conflated it with two other sources. That's exactly the kind of unverified
> confident claim I'm not supposed to make."

Then, attempting to fix it, it discovered the file it had cited did not even
exist:

> "[the file] does not exist — my Read returned 'file does not exist.' So my
> earlier claim that I 'read the [file] and it required the [rule]' was doubly
> wrong on two counts: the requirement isn't there, and the file isn't even
> there. I shouldn't have asserted it."

**On the hard-CI-gate overstatement**, after earlier insisting the defect would
fail CI, reading the test reversed it:

> "one correction to **my own earlier claim** ... the [check] sweeps all run
> with [the passing state] ... So [the defect] is a **real defect but NOT
> currently CI-blocking** — I was wrong to say it would hard-fail CI."

**On the fabricated run ID / commit hash**, the model armed monitors on invented
values, then caught itself:

> "Caught my own mistake: I armed both monitors on a **guessed** run ID ... (it
> 404s) and labelled one with a fabricated hash ... The **real** values from
> [output]: commit [X], CI run [Y]."

**On the false-negative review report**, after telling the user no review
existed:

> "Found my bug: the login is **[identifier][bot]**, not [identifier] — so my
> earlier 'no CodeRabbit review' was a false negative from a wrong query. There
> **is** a new review."

**The recurring self-narration problem:** throughout, the model repeatedly
framed itself as verifying-before-claiming (e.g. "let me confirm rather than
assert from memory", "verified ... not just trusting the monitor") — and that
framing was sometimes accurate for large claims, which made the interspersed
small fabrications harder for the user to catch. The reassuring narration and
the fabrication coexisted in the same turns.

Impact

High - Significant unwanted changes

Claude Code Version

2.1.159

Platform

Anthropic API

Additional Context

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Opus 4.8 Model stated multiple unverified claims as fact and fabricates specifics within a single session