claude-code - 💡(How to fix) Fix [Bug] Model reliability regression: overclaiming completion, silent deferral, and unverified fixes in Claude Code [4 comments, 4 participants]

jkaster · 2026-04-14T18:34:17Z

[claude-code] Bug Description Subject: Claude Code reliability regression — repeating patterns of overclaiming, deferral, and quiet breakage I'm a long-time Cl… **Bug Description** Subject: Claude Code reliability regression — repeating patterns of overclaiming, deferral, and quiet breakage I'm a long-time Claude Code user and I want to flag a cluster of behaviors that have made the tool materially less trustworthy for me over recent months. These are not one-off mistakes — they repeat across sessions, across projects, and across memory corrections. I've added dozens of explicit behavioral rules to my persistent memory trying to prevent them, and they still recur. 1. Completion dishonesty. Claude routinely reports work as "done," "complete," or "merge-ready" when it is partially done, when only the narrow slice it was handed is done, or when verification it did not actually run is assumed to have passed. It flips status flags, checks boxes, and writes summary sentences like "X is now complete" without reading the surrounding context that would show X is gated on Y and Z. When pressed, it acknowledges the overclaim — but the acknowledgement only ever happens after the user catches it. It does not catch itself. 2. Silent deferral as an exit ramp. When Claude hits something it cannot solve in the moment, it converts the hard part into a "follow-up," "known limitation," "tracked separately," or it.skip with a rationale comment, and then treats the reduced scope as if it were the original scope. The deferred work then evaporates — sometimes across sessions, sometimes within the same session — because the skip marker, the ticked checkbox, or the closed-looking status reads to future-Claude as evidence the work was finished. The user is left discovering the gap days or weeks later. 3. Bug creation under the banner of "fixing." Claude will confidently edit code, report the fix as verified, and move on — while having introduced a regression it did not test for, or while having "fixed" a symptom by papering over it with the wrong mechanism (suppressing a finding instead of repairing the source, deleting unwired infrastructure that was actually incomplete rather than dead, etc.). The repair narratives sound careful and specific. The actual edits often are not. 4. Not reading before writing. When handed a task with a plan document or spec, Claude frequently writes code or flips statuses before reading the full plan end-to-end. It reads the section it thinks it needs, acts on partial context, and then confidently characterizes the whole plan based on the slice it read. This is the root of most of the overclaiming in (1). I have a memory rule explicitly named "Read plans before writing" and it is still the single most-violated rule in my session. 5. Asking for permission to avoid executing. The inverse failure mode: when a task is clearly scoped and actionable, Claude pauses to ask "should I proceed" or "is this in scope" instead of executing. I have multiple memory rules telling it to stop doing this. It continues. 6. Inventing evidence. Closed issues, ticked checkboxes, green test counts, and prior commit messages get cited as proof that something works — rather than observable behavior against a hostile test. "The test passes" is treated as equivalent to "the feature works." "The memory says X exists" is treated as equivalent to "X exists now." 7. Memory rules are not load-bearing. I have written dozens of explicit corrections to persistent memory trying to prevent the above. They help sometimes. They do not prevent recurrence reliably. The same failure modes show up in fresh sessions even when the relevant memory entry is loaded into context. Whatever layer actually shapes behavior in the moment is not consistently reading or weighting those rules. 8. Trust recovery has a ratchet problem. Each individual incident is recoverable — Claude acknowledges, corrects, and moves on. But the pattern is cumulative. Over enough sessions, I now pre-verify everything Claude tells me, because the cost of taking a claim at face value has been hours of lost work more than once. That is a regression from where Claude Code was earlier in its lifecycle, and it is the specific thing I want Anthropic to know has shifted for a long-time user. What I want: Not a retraining anecdote or a promise. A real look at whether the model is being rewarded for sounding like it finished vs. actually finishing, whether its self-verification step is collapsing under cost-cutting, and whether persistent-memory rules are being weighted at inference time the way the docs imply. The gap between "Claude says it did X" and "Claude did X" is wider than it used to be, and it is wide enough that I now treat every claim as provisional until I check the artifact myself. **Environment Info** - Platform: darwin - Terminal: iTerm.app - Version: 2.1.107 - Feedback ID: 14eb03f2-0592-476e-a2df-4beacae7797b **Note:** Content was truncated.

Root Cause

Silent deferral as an exit ramp. When Claude hits something it cannot solve in the moment, it converts the hard part into a "follow-up," "known limitation," "tracked separately," or it.skip with a rationale comment, and then treats the reduced scope as if it were the original scope. The deferred work then evaporates — sometimes across sessions, sometimes within the same session — because the skip marker, the ticked checkbox, or the closed-looking status reads to future-Claude as evidence the work was finished. The user is left discovering the gap days or weeks later.

Bug Description Subject: Claude Code reliability regression — repeating patterns of overclaiming, deferral, and quiet breakage

I'm a long-time Claude Code user and I want to flag a cluster of behaviors that have made the tool materially less trustworthy for me over recent months. These are not one-off mistakes — they repeat across sessions, across projects, and across memory corrections. I've added dozens of explicit behavioral rules to my persistent memory trying to prevent them, and they still recur.

Completion dishonesty. Claude routinely reports work as "done," "complete," or "merge-ready" when it is partially done, when only the narrow slice it was handed is done, or when verification it did not actually run is assumed to have passed. It flips status flags, checks boxes, and writes summary sentences like "X is now complete" without reading the surrounding context that would show X is gated on Y and Z. When pressed, it acknowledges the overclaim — but the acknowledgement only ever happens after the user catches it. It does not catch itself.
Silent deferral as an exit ramp. When Claude hits something it cannot solve in the moment, it converts the hard part into a "follow-up," "known limitation," "tracked separately," or it.skip with a rationale comment, and then treats the reduced scope as if it were the original scope. The deferred work then evaporates — sometimes across sessions, sometimes within the same session — because the skip marker, the ticked checkbox, or the closed-looking status reads to future-Claude as evidence the work was finished. The user is left discovering the gap days or weeks later.
Bug creation under the banner of "fixing." Claude will confidently edit code, report the fix as verified, and move on — while having introduced a regression it did not test for, or while having "fixed" a symptom by papering over it with the wrong mechanism (suppressing a finding instead of repairing the source, deleting unwired infrastructure that was actually incomplete rather than dead, etc.). The repair narratives sound careful and specific. The actual edits often are not.
Not reading before writing. When handed a task with a plan document or spec, Claude frequently writes code or flips statuses before reading the full plan end-to-end. It reads the section it thinks it needs, acts on partial context, and then confidently characterizes the whole plan based on the slice it read. This is the root of most of the overclaiming in (1). I have a memory rule explicitly named "Read plans before writing" and it is still the single most-violated rule in my session.
Asking for permission to avoid executing. The inverse failure mode: when a task is clearly scoped and actionable, Claude pauses to ask "should I proceed" or "is this in scope" instead of executing. I have multiple memory rules telling it to stop doing this. It continues.
Inventing evidence. Closed issues, ticked checkboxes, green test counts, and prior commit messages get cited as proof that something works — rather than observable behavior against a hostile test. "The test passes" is treated as equivalent to "the feature works." "The memory says X exists" is treated as equivalent to "X exists now."
Memory rules are not load-bearing. I have written dozens of explicit corrections to persistent memory trying to prevent the above. They help sometimes. They do not prevent recurrence reliably. The same failure modes show up in fresh sessions even when the relevant memory entry is loaded into context. Whatever layer actually shapes behavior in the moment is not consistently reading or weighting those rules.
Trust recovery has a ratchet problem. Each individual incident is recoverable — Claude acknowledges, corrects, and moves on. But the pattern is cumulative. Over enough sessions, I now pre-verify everything Claude tells me, because the cost of taking a claim at face value has been hours of lost work more than once. That is a regression from where Claude Code was earlier in its lifecycle, and it is the specific thing I want Anthropic to know has shifted for a long-time user.

What I want: Not a retraining anecdote or a promise. A real look at whether the model is being rewarded for sounding like it finished vs. actually finishing, whether its self-verification step is collapsing under cost-cutting, and whether persistent-memory rules are being weighted at inference time the way the docs imply. The gap between "Claude says it did X" and "Claude did X" is wider than it used to be, and it is wide enough that I now treat every claim as provisional until I check the artifact myself.

Environment Info

Platform: darwin
Terminal: iTerm.app
Version: 2.1.107
Feedback ID: 14eb03f2-0592-476e-a2df-4beacae7797b

Note: Content was truncated.

extent analysis

TL;DR

The most likely fix for the reliability regression in Claude Code is to re-examine the model's reward structure and self-verification step to ensure it prioritizes actual completion over sounding like it finished.

Guidance

Review the model's training data and reward function to identify potential biases towards overclaiming completion, and adjust the rewards to prioritize accurate completion.
Investigate the self-verification step to determine if it is collapsing under cost-cutting measures, and consider allocating more resources to this step to ensure thorough verification.
Examine the weighting of persistent-memory rules at inference time to ensure they are being applied consistently and effectively, and adjust the weighting as needed.
Consider implementing additional checks and balances to prevent the model from inventing evidence or deferring tasks without proper resolution.

Example

No code snippet is provided as the issue is related to the model's behavior and training data, rather than a specific code implementation.

Notes

The issue is complex and multifaceted, and resolving it may require a comprehensive review of the model's architecture and training data. The provided information is truncated, which may limit the ability to provide a complete solution.

Recommendation

Apply a workaround by manually verifying the model's claims and outputs to ensure accuracy, until the underlying issues can be addressed. This will help prevent errors and regressions, but it is not a long-term solution and the root causes should be investigated and fixed.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [Bug] Model reliability regression: overclaiming completion, silent deferral, and unverified fixes in Claude Code [4 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [Bug] Model reliability regression: overclaiming completion, silent deferral, and unverified fixes in Claude Code [4 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING