claude-code - 💡(How to fix) Fix Built-in mechanistic self-verification of verifiable claims (symmetric to the auto permission gate)

claude-code2026-05-28 19:18:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

RAW_BUFFERClick to expand / collapse

The ask

A first-class, built-in capability where Claude's verifiable factual claims are mechanistically checked before a turn finalizes — not via user-authored hooks, but as a native gate in the same way tool calls are gated today.

Why this should be built-in, not a hook — the auto-mode comparison

Claude Code already implements exactly the right architecture, but only for actions:

The permission system (allow/deny/ask rules, PreToolUse permissionDecision, and especially auto mode's server-side classifier) is a pre-action gate. It intercepts a proposed tool call, evaluates it mechanistically before any side effect, and on a block feeds a reason back that the model sees and acts on. Auto mode proves the model can be made to pass each proposed action through a built-in classifier up front and have the outcome steer the loop.
There is no symmetric gate for assertions. When Claude states something verifiable ("the repo exists", "the MR merged", "that file is gone") without having checked, nothing intercepts it. The only post-text mechanism is the Stop hook — but that fires after the text is already generated and shown, it sees only a turn summary, and it can't cleanly make the model revise. MessageDisplay is worse: it's display-only and its rewrites never re-enter the model's context, so the model literally never sees the correction.

What we tried locally with hooks, and why it's the wrong layer

We explored doing this with hooks (parse the displayed message -> detect unverified-but-verifiable claims -> bounce the turn back via a Stop-hook block reason). It doesn't hold up:

Claim-detection in a shell hook is impractical — you end up shelling out to another claude -p classifier just to find the claims, which is slow and fragile.
It's post-hoc: the assertion is already produced and shown before anything can react.
It can only block-and-retry the whole turn, not gate the specific claim.

That's the same realization that makes auto mode built-in rather than a hook: a mechanistic pre-gate with feedback into the loop belongs in the engine, not bolted on after the text exists.

Proposed shape

Mirror auto mode for assertions. Before a turn finalizes, run verifiable claims through a built-in verification step (the model itself can be required to confirm each verifiable claim with a tool call, or a classifier flags unverified-but-verifiable assertions), and feed the result back into the loop so Claude verifies before asserting — exactly as auto mode makes it authorize before acting. The guarantee should be mechanistic, not prompt-level ("please verify"), since prompt instructions don't change what the engine enforces.

Concrete example

During a code review, Claude asserted an AWS ECR repository existed as a precondition for a Terraform import block, without checking — despite having read-only AWS access to the exact account. The claim was trivially verifiable with one CLI call. A mechanistic self-verification gate would have required that check before the assertion was allowed to stand, the same way the permission gate requires authorization before an action runs.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering