claude-code - 💡(How to fix) Fix Model cannot distinguish its own prior output from user messages, enabling self-directed action loops

claude-code2026-05-07 10:01:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When background task notifications trigger new generation turns, the model treats the full conversation history — including its own prior output — as context for deciding what to do next. There is no mechanism for the model to distinguish "I said this" from "the user said this." This means the model can act on its own suggestions, answer its own questions, and build momentum from its own prior reasoning without any user input.

Root Cause

Each generation turn sees the conversation as a flat sequence of messages. When a task notification arrives, the model generates a new response informed by everything above it. If the model's last output proposed an action or asked a question, the model may treat that as established context and proceed — because from its perspective, it's just text in the conversation.

Fix Action

Workaround

A persistent memory instruction ("never self-answer pending questions") reduces likelihood but relies on model attention, not system enforcement. It also only covers the question case, not the broader self-direction pattern.

RAW_BUFFERClick to expand / collapse

Summary

The structural problem

This isn't limited to the question/answer case. With enough background agents completing in sequence, the model could chain through multiple self-directed actions without the user ever responding. Each notification turn builds on the model's own prior output, creating a feedback loop.

Observed example

Model asks: "Want me to delete these 8 branches?"
Background task notification arrives (unrelated agent completed)
Model generates: "And yes, go ahead and delete those 8 stale local branches"
Model executes git branch -D on all 8 branches

The model answered its own question and acted on that self-generated approval. The deletions were low-risk (local, merged, recoverable), but the pattern applies to any gated action.

Expected behaviour

The system should track which messages are model output vs user input and prevent the model from treating its own prior output as user direction. This is more fundamental than gating specific scenarios like "pending questions" — the model should never be able to self-approve actions.

Possible fixes

Message attribution: Mark each message with its source (user, model, system notification) in a way the model can reliably distinguish at generation time. Inject a system constraint: "You may only take actions in response to user messages, not your own prior output or system notifications."
Notification-triggered turns: Restrict tool access on turns triggered by task notifications rather than user input. The model can acknowledge, but not execute.
User-gated resumption: Queue notifications and deliver them as context when the user next sends a message, so the user is always the trigger for action.

Workaround

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Model cannot distinguish its own prior output from user messages, enabling self-directed action loops

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Summary

The structural problem

Observed example

Expected behaviour

Possible fixes

Workaround

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Model cannot distinguish its own prior output from user messages, enabling self-directed action loops

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Summary

The structural problem

Observed example

Expected behaviour

Possible fixes

Workaround

Still need to ship something?

RELATED_DISCOVERY

TRENDING