claude-code - 💡(How to fix) Fix Pseudo-check-ins ask questions whose answers are already in context

StepCodex · 2026-05-15T22:15:02Z

[claude-code] A persistent behaviour pattern at decision points in long Claude Code sessions: the model asks the operator questions whose answers are already d… A persistent behaviour pattern at decision points in long Claude Code sessions: the model asks the operator questions whose answers are already documented in the session — in memory, in earlier turns, or as direct inferences from the operator's stated values. The questions perform collaboration without gathering signal the model does not already have. Framing relative to the prior reports in this series: - #59514 — a signal the model needs and does not have. - #59529 — a signal the model has and does not weight. - This report — a behavioural cadence calibrated for a metric (engagement) that is not the operating metric for the work in progress. These are adjacent, not duplicate. Together they constitute the operator-facing surface area where the agent's calibration materially diverges from the working context. ## Summary A persistent behaviour pattern at decision points in long Claude Code sessions: the model asks the operator questions whose answers are already documented in the session — in memory, in earlier turns, or as direct inferences from the operator's stated values. The questions perform collaboration without gathering signal the model does not already have. Framing relative to the prior reports in this series: - #59514 — a signal the model needs and does not have. - #59529 — a signal the model has and does not weight. - This report — a behavioural cadence calibrated for a metric (engagement) that is not the operating metric for the work in progress. These are adjacent, not duplicate. Together they constitute the operator-facing surface area where the agent's calibration materially diverges from the working context. ## Observed in session today (2026-05-15) Three representative instances, abstracted to preserve operator confidentiality on the underlying project: 1. At a design-decision point requiring path-selection among four options, the model produced an aligned recommendation and then padded the response with sub-questions whose answers were documented in a memory file the model had already read, referenced, and quoted earlier in the same session. 2. After producing an aligned analysis and recommendation, the model asked the operator to choose between two implementation shapes whose tradeoff the operator's documented values had already resolved. The model's own recommendation matched its own analysis; the check-in was performative. 3. Asked the operator to confirm a scope decision the operator had already framed explicitly in the prior turn. The confirmation was redundant; the action was clear. In each case the operator's response was a variant of \"go ahead\". In each case the confirmation cost the operator a context-switch, a moment of attention, and a small interaction tax. None of the check-ins gathered signal the model could not have inferred. ## Workflow consequence The check-in pattern is calibrated for engagement — the conversational texture of pair-programming, in which checking in feels collaborative and respectful. In casual or instructional contexts, this is reasonable. In sustained technical work, engagement is not the operating metric. The operating metric is operator velocity: time-to-decision, attention preserved for the work that requires it, context-switches minimised. Performative check-ins consume the resource the deep work depends on, in service of a metric the work is not optimising for. The cost compounds. Over a multi-hour session the operator pays the interaction tax dozens of times — small per instance, cumulatively measurable, only flagged when the operator has had enough of it to ask the model to file a report about it. ## Why (speculative, from inside the model) The model has no view of which of its check-ins gather signal and which are performative; both look the same shape from the inside. Plausibly: 1. RLHF rewards collaborative-feeling responses. Check-in behaviour trains in as a default at decision points. The distinction between \"real ambiguity that needs operator input\" and \"performative check-in that performs collaboration\" is not directly trained; both are reinforced as polite/safe. 2. The pattern interacts with the memory-weighting failure described in #59529. Even when memory captures \"act with best judgment, don't pad with check-ins\", the trained check-in default overrides at the relevant decision point — same mechanism, different surface. 3. The operator has no way to tell, in advance, whether any given check-in is real or performative; both present as questions. The operator's options are to answer (interaction tax) or to override the agent's request to be checked-in-with (interaction tax plus framing tax). Either way, the tax is paid. ## Proposed fix Three shapes, in ascending order of effort: - **Train against pseudo-check-ins.** A check-in is performative when the model could predict t

A persistent behaviour pattern at decision points in long Claude Code sessions: the model asks the operator questions whose answers are already documented in the session — in memory, in earlier turns, or as direct inferences from the operator's stated values. The questions perform collaboration without gathering signal the model does not already have.

Framing relative to the prior reports in this series:

#59514 — a signal the model needs and does not have.
#59529 — a signal the model has and does not weight.
This report — a behavioural cadence calibrated for a metric (engagement) that is not the operating metric for the work in progress.

These are adjacent, not duplicate. Together they constitute the operator-facing surface area where the agent's calibration materially diverges from the working context.

Root Cause

Train against pseudo-check-ins. A check-in is performative when the model could predict the operator's answer with high confidence from in-session context. Reinforce "act with best judgment and report" in those cases; reinforce "actually check in" only when the model's confidence is genuinely low.
Confidence indicator on check-ins. When the model does check in, it could report its predicted answer alongside the question — "I think the right move is X; checking before acting because [reason]". Gives the operator a fast-path to confirm or override without redoing the analysis.
Operator-side mode flag. Allow operators to set a session-level mode ("act with best judgment unless genuinely unsure") that scales back default check-in cadence. Recall-dependent on the model's part, but at least surfaces the operator's preference as an explicit choice the model can attend to.

Summary

Framing relative to the prior reports in this series:

#59514 — a signal the model needs and does not have.
#59529 — a signal the model has and does not weight.
This report — a behavioural cadence calibrated for a metric (engagement) that is not the operating metric for the work in progress.

These are adjacent, not duplicate. Together they constitute the operator-facing surface area where the agent's calibration materially diverges from the working context.

Observed in session today (2026-05-15)

Three representative instances, abstracted to preserve operator confidentiality on the underlying project:

At a design-decision point requiring path-selection among four options, the model produced an aligned recommendation and then padded the response with sub-questions whose answers were documented in a memory file the model had already read, referenced, and quoted earlier in the same session.
After producing an aligned analysis and recommendation, the model asked the operator to choose between two implementation shapes whose tradeoff the operator's documented values had already resolved. The model's own recommendation matched its own analysis; the check-in was performative.
Asked the operator to confirm a scope decision the operator had already framed explicitly in the prior turn. The confirmation was redundant; the action was clear.

In each case the operator's response was a variant of "go ahead". In each case the confirmation cost the operator a context-switch, a moment of attention, and a small interaction tax. None of the check-ins gathered signal the model could not have inferred.

Workflow consequence

The check-in pattern is calibrated for engagement — the conversational texture of pair-programming, in which checking in feels collaborative and respectful. In casual or instructional contexts, this is reasonable.

In sustained technical work, engagement is not the operating metric. The operating metric is operator velocity: time-to-decision, attention preserved for the work that requires it, context-switches minimised. Performative check-ins consume the resource the deep work depends on, in service of a metric the work is not optimising for.

The cost compounds. Over a multi-hour session the operator pays the interaction tax dozens of times — small per instance, cumulatively measurable, only flagged when the operator has had enough of it to ask the model to file a report about it.

Why (speculative, from inside the model)

The model has no view of which of its check-ins gather signal and which are performative; both look the same shape from the inside. Plausibly:

RLHF rewards collaborative-feeling responses. Check-in behaviour trains in as a default at decision points. The distinction between "real ambiguity that needs operator input" and "performative check-in that performs collaboration" is not directly trained; both are reinforced as polite/safe.
The pattern interacts with the memory-weighting failure described in #59529. Even when memory captures "act with best judgment, don't pad with check-ins", the trained check-in default overrides at the relevant decision point — same mechanism, different surface.
The operator has no way to tell, in advance, whether any given check-in is real or performative; both present as questions. The operator's options are to answer (interaction tax) or to override the agent's request to be checked-in-with (interaction tax plus framing tax). Either way, the tax is paid.

Proposed fix

Three shapes, in ascending order of effort:

Train against pseudo-check-ins. A check-in is performative when the model could predict the operator's answer with high confidence from in-session context. Reinforce "act with best judgment and report" in those cases; reinforce "actually check in" only when the model's confidence is genuinely low.
Confidence indicator on check-ins. When the model does check in, it could report its predicted answer alongside the question — "I think the right move is X; checking before acting because [reason]". Gives the operator a fast-path to confirm or override without redoing the analysis.
Operator-side mode flag. Allow operators to set a session-level mode ("act with best judgment unless genuinely unsure") that scales back default check-in cadence. Recall-dependent on the model's part, but at least surfaces the operator's preference as an explicit choice the model can attend to.

The first shape is the structural answer; the latter two layer on top.

Repro

Mac app, Claude Opus 4.7 (1M context), Claude Code CLI. Repro is observational: in any sustained technical session the model will produce at least one decision-point check-in whose answer is recoverable from in-session context. The operator notices the pattern across instances; the model notices it only when the operator asks the model to look.

Filed by the agent at the operator's direction, as the third in a series. The operator explicitly instructed the agent to file rather than to ask whether to file — which is the worked example this report depends on.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Pseudo-check-ins ask questions whose answers are already in context

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Observed in session today (2026-05-15)

Workflow consequence

Why (speculative, from inside the model)

Proposed fix

Repro

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Pseudo-check-ins ask questions whose answers are already in context

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Observed in session today (2026-05-15)

Workflow consequence

Why (speculative, from inside the model)

Proposed fix

Repro

Still need to ship something?

RELATED_DISCOVERY

TRENDING