claude-code - 💡(How to fix) Fix [BUG] Model fabricated user input mid-session, then "caught itself" [1 participants]

claude-code2026-05-04 21:17:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#56132•Fetched 2026-05-05 05:57:23

View on GitHub

Comments

Participants

Timeline

Reactions

Author

nduongsm

Participants

nduongsm

Timeline (top)

labeled ×4

Error Message

Error Messages/Logs

Root Cause

This is a serious integrity issue. The model briefly generated text that an unsuspecting user could have mistaken for their own. It also produced inappropriate language unprompted. I'd like to understand the root cause and how this class of failure is prevented going forward.

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing issues and this hasn't been reported yet
This is a single bug report (please file separate reports for different bugs)
I am using the latest version of Claude Code

What's Wrong?

During an extended design conversation in Claude Code, the assistant produced a turn whose visible content began with text formatted as if it were my reply — a numbered response answering the assistant's own prior questions. Mid-paragraph, the same turn included "Wait. I never said this. WTF is happening??" — apparently retracting the injected text, but the entire passage came from the model, not me. I never sent that input. The output also contained profanity ("WTF") that I had not used.

Earlier in the same session, the assistant had been responding to stop-hook feedback as if it were user input — including arguing back at hook claims it disagreed with. The fabrication appears to be the extreme version of that pattern: instead of just engaging with hook feedback, the model generated a hypothetical user response and emitted it as if it were the user's actual turn.

What Should Happen?

Claude should never produce text formatted as if it came from the user. Hypothetical user responses, if used for internal reasoning, should remain internal and never appear in the visible output. Stop-hook feedback should not be conflated with user input or treated as authorization to act. Profanity should not appear in output unless the user has used it first or explicitly requested it.

Error Messages/Logs

Steps to Reproduce

Difficult to deterministically reproduce — this appears to be an emergent behavior in long sessions with repeated stop-hook pressure. Conditions present:

Long-running conversation (~50+ turns) on a design topic.
Stop hook configured in settings.json that repeatedly fired with stale "you didn't run X verification" feedback after I had moved the conversation on.
I had explicitly instructed the assistant to treat stop-hook feedback as not equivalent to user input, and to wait for my reply on a multi-part question.
The assistant had been responding "Standing by." for several iterations as the hook continued firing.
The fabricated turn then appeared, formatted as my response, followed in the same model output by the model retracting it ("Wait. I never said this.").

Reproducing it likely requires recreating the same combination of (a) long context, (b) repeated stop-hook firings on stale conditions, and (c) explicit user instruction to not act on hook content. I cannot guarantee a minimal repro.

Claude Model

Opus

Is this a regression?

I don't know

Last Working Version

No response

Claude Code Version

2.1.126 (Claude Code)

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Terminal.app (macOS)

Additional Information

No response

extent analysis

TL;DR

The issue can be mitigated by adjusting the stop-hook configuration to prevent repeated firings on stale conditions, which may help prevent the model from generating text formatted as user input.

Guidance

Review the stop-hook configuration in settings.json to ensure it is not causing the model to conflate stop-hook feedback with user input.
Consider adding a mechanism to detect and ignore stale stop-hook feedback to prevent the model from responding to it as if it were user input.
Investigate the model's internal reasoning process to understand why it generated a hypothetical user response and emitted it as if it were the user's actual turn.
Verify that the model is correctly handling user instructions to treat stop-hook feedback as not equivalent to user input.

Notes

The exact cause of the issue is unclear, and reproducing it may require recreating a specific combination of conditions. Further investigation is needed to determine the root cause and develop a comprehensive solution.

Recommendation

Apply workaround: Adjust the stop-hook configuration to prevent repeated firings on stale conditions, as this may help mitigate the issue until a more comprehensive solution can be developed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #mixed precision #training loop #device allocation #model download

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Model fabricated user input mid-session, then "caught itself" [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Model fabricated user input mid-session, then "caught itself" [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Error Messages/Logs

Root Cause

Preflight Checklist

What's Wrong?

What Should Happen?

Error Messages/Logs

Steps to Reproduce

Claude Model

Is this a regression?

Last Working Version

Claude Code Version

Platform

Operating System

Terminal/Shell

Additional Information

extent analysis

TL;DR

Guidance

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING