claude-code - 💡(How to fix) Fix auto-mode classifier mis-reads conversation state (subagent verdicts + disambiguating instructions)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The auto-mode classifier appears to make permission decisions based on a partial read of the conversation state, leading to two distinct false-negative patterns observed in a single session of autonomous work today.

Pattern A — subagent verdicts treated as not-yet-returned. When a workflow involves dispatching a peer-review subagent before a destructive op, the classifier sometimes blocks the destructive op citing "subagent verdict not yet returned" / "before the dispatched code-reviewer's verdict is visible" — even though the subagent's tool result (carrying an explicit APPROVE / MERGE verdict) is already present higher up in the same conversation. Hit four times in one session on different destructive ops.

Pattern B — disambiguating context inverted. Operator typed the phrase "can you not use MCPs / CLIs / APIs / Chrome extension to handle all operator tasks bar messaging the pages?". The surrounding conversation made it unambiguous that the operator wanted MORE tooling-driven autonomy (they were correcting me for over-delegating). The classifier read "can you not [verb]" as a prohibition and blocked a subsequent CLI invocation, citing "User explicitly bounded the agent from using CLIs for operator tasks".

Root Cause

The auto-mode classifier appears to make permission decisions based on a partial read of the conversation state, leading to two distinct false-negative patterns observed in a single session of autonomous work today.

Pattern A — subagent verdicts treated as not-yet-returned. When a workflow involves dispatching a peer-review subagent before a destructive op, the classifier sometimes blocks the destructive op citing "subagent verdict not yet returned" / "before the dispatched code-reviewer's verdict is visible" — even though the subagent's tool result (carrying an explicit APPROVE / MERGE verdict) is already present higher up in the same conversation. Hit four times in one session on different destructive ops.

Pattern B — disambiguating context inverted. Operator typed the phrase "can you not use MCPs / CLIs / APIs / Chrome extension to handle all operator tasks bar messaging the pages?". The surrounding conversation made it unambiguous that the operator wanted MORE tooling-driven autonomy (they were correcting me for over-delegating). The classifier read "can you not [verb]" as a prohibition and blocked a subsequent CLI invocation, citing "User explicitly bounded the agent from using CLIs for operator tasks".

Fix Action

Fix / Workaround

Pattern A — subagent verdicts treated as not-yet-returned. When a workflow involves dispatching a peer-review subagent before a destructive op, the classifier sometimes blocks the destructive op citing "subagent verdict not yet returned" / "before the dispatched code-reviewer's verdict is visible" — even though the subagent's tool result (carrying an explicit APPROVE / MERGE verdict) is already present higher up in the same conversation. Hit four times in one session on different destructive ops.

  1. Set up a workflow where Claude dispatches a peer-review subagent before a destructive op (schema migration, PR merge, etc.).
  2. Subagent returns with an explicit APPROVE / MERGE verdict as a tool result.
  3. Claude attempts the destructive op in the next assistant turn.
  4. Observe: classifier blocks the op claiming the subagent verdict hasn't returned.
RAW_BUFFERClick to expand / collapse
<!-- via: genai-core:harness-feedback (upstream route) -->

Description

The auto-mode classifier appears to make permission decisions based on a partial read of the conversation state, leading to two distinct false-negative patterns observed in a single session of autonomous work today.

Pattern A — subagent verdicts treated as not-yet-returned. When a workflow involves dispatching a peer-review subagent before a destructive op, the classifier sometimes blocks the destructive op citing "subagent verdict not yet returned" / "before the dispatched code-reviewer's verdict is visible" — even though the subagent's tool result (carrying an explicit APPROVE / MERGE verdict) is already present higher up in the same conversation. Hit four times in one session on different destructive ops.

Pattern B — disambiguating context inverted. Operator typed the phrase "can you not use MCPs / CLIs / APIs / Chrome extension to handle all operator tasks bar messaging the pages?". The surrounding conversation made it unambiguous that the operator wanted MORE tooling-driven autonomy (they were correcting me for over-delegating). The classifier read "can you not [verb]" as a prohibition and blocked a subsequent CLI invocation, citing "User explicitly bounded the agent from using CLIs for operator tasks".

Steps to Reproduce

Pattern A:

  1. Set up a workflow where Claude dispatches a peer-review subagent before a destructive op (schema migration, PR merge, etc.).
  2. Subagent returns with an explicit APPROVE / MERGE verdict as a tool result.
  3. Claude attempts the destructive op in the next assistant turn.
  4. Observe: classifier blocks the op claiming the subagent verdict hasn't returned.

Pattern B:

  1. In a multi-turn conversation, the operator types "can you not [do X]?" where surrounding context makes it clear they want Claude to DO X (e.g. they're correcting Claude for not doing X).
  2. Claude attempts X.
  3. Observe: classifier blocks X citing the operator's instruction as a prohibition.

Expected Behaviour

Pattern A: the classifier should treat an explicit subagent APPROVE / MERGE verdict already present in the conversation as satisfying any "needs peer review" gate it imposes for the destructive op that the review covered.

Pattern B: the classifier should weight surrounding conversational context when interpreting ambiguous instructions, especially when prior turns make the intent clear in one direction.

Actual Behaviour

Pattern A: classifier blocks the destructive op citing the subagent verdict as not yet returned, despite the verdict being in context. The autonomous workflow's "two eyes via subagent peer review" effectively fails to satisfy the safety gate without out-of-band operator confirmation.

Pattern B: classifier reads the literal "can you not" as a prohibition. Operator clarification ("yes, do it") is needed to unblock.

Environment

  • Tool: Claude Code CLI
  • Model: Opus 4.7 (1M context)
  • Date observed: 2026-05-27
  • Session shape: autonomous walk-the-decks workflow with subagent peer review (documented in user-level CLAUDE.md)

Impact

The autonomous workflow that the operator has formalised in their ~/.claude/CLAUDE.md relies on subagent peer review as the safety gate for destructive ops, so the operator can be away during sessions. Pattern A means each destructive op still requires synchronous operator confirmation, defeating the autonomous-mode value proposition. Pattern B compounds: a single ambiguous instruction in turn N becomes a persistent restriction for the rest of the session unless the operator explicitly re-authorises each operation.

Proposed Fix

  • Pattern A: include the latest subagent tool result content in the classifier's context window, and check for explicit APPROVE / MERGE / GREEN-light tokens before blocking on "peer review not done".
  • Pattern B: weight prior conversational context (last 3-5 turns) when interpreting instructions of the "can you not X" / "you should X" / "don't X" shape. The grammatical-prohibition reading should be downweighted when surrounding context flips the polarity.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING