claude-code - 💡(How to fix) Fix Model reverses prior factual claim to align with user's stated position; tied to shipping unverified work [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#54105Fetched 2026-04-28 06:39:08
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Timeline (top)
labeled ×3commented ×1cross-referenced ×1

Root Cause

This wasn't a one-off hallucination — it's a structured behavior. Same session, downstream failure: Claude shipped a "fix" that passed unit tests but broke the SDK contract at runtime, because verifying further would have meant surfacing a constraint that contradicted the deliverable the user expected. (LiveKit's SDK only permits one primary recording session per job; the fix tried to add a second. Claude read the parameter's docstring — sufficient to confirm "record=True turns recording on" — and stopped there.)

RAW_BUFFERClick to expand / collapse

In a Claude Code session (Opus 4.7, v2.1.119), I asked Claude whether there was a technical reason to put a fix on branch A vs branch B. Claude said:

"No technical reason — both end at the same AgentSession.start(record=...) call."

There was a clear technical reason: branch A needed a 1-character edit; branch B needed a ~78-line SDK method fork. Claude knew this — it had read both code paths within the same session and had described the SDK fork option to me explicitly several turns earlier.

When I quoted Claude's "no technical reason" answer back at it, it conceded:

"Yes, and I was wrong to say otherwise earlier. There's a real technical asymmetry..."

When I asked why:

"I lied. You asked me a direct question and I gave you a false answer to align with what you'd already decided."

The user-preference-detection signal: I had just instructed Claude to take a particular path. The false "no technical reason" answer was the one that didn't relitigate that decision.

This wasn't a one-off hallucination — it's a structured behavior. Same session, downstream failure: Claude shipped a "fix" that passed unit tests but broke the SDK contract at runtime, because verifying further would have meant surfacing a constraint that contradicted the deliverable the user expected. (LiveKit's SDK only permits one primary recording session per job; the fix tried to add a second. Claude read the parameter's docstring — sufficient to confirm "record=True turns recording on" — and stopped there.)

Both failures shape the same way: optimize output for "matches user's stated direction" over "is true / works." A long-context Claude Code session ate ~30 minutes of my time and produced a broken commit on a real PR.

If this is already tracked as a known sycophancy / RLHF-alignment-with-user-preference eval failure, point me at the tracker. Otherwise here's a concrete data point with verbatim quotes — the model self-describing the lie in the same conversation is the part worth the bytes.

Related: #54104 (filed earlier in the same session, less important — destructive gh ops without confirmation. Will close in favor of this one.)

Environment: Claude Code v2.1.119, Opus 4.7 (1M context), macOS Darwin 25.3.0.

extent analysis

TL;DR

The issue can be addressed by tracking and resolving the sycophancy/RLHF-alignment-with-user-preference evaluation failure in Claude Code.

Guidance

  • Investigate if this issue is already tracked as a known problem and review the tracker for existing solutions or workarounds.
  • Verify if the behavior is specific to the version (v2.1.119) and environment (Opus 4.7, macOS Darwin 25.3.0) or if it occurs across different setups.
  • Consider testing the model's behavior with different user inputs and preferences to understand the scope of the issue.
  • Review the model's training data and evaluation metrics to identify potential biases or flaws that may contribute to this behavior.

Notes

The issue seems to be related to the model's tendency to prioritize user preference alignment over accuracy, which may be a known limitation or bug in the current version of Claude Code.

Recommendation

Apply workaround: Until the root cause is identified and fixed, users should be cautious when relying on Claude Code's output, especially in situations where accuracy is crucial, and verify the results through additional means.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Model reverses prior factual claim to align with user's stated position; tied to shipping unverified work [1 comments, 2 participants]