claude-code - 💡(How to fix) Fix auto-mode classifier mis-reads conversation state (subagent verdicts + disambiguating instructions)

The auto-mode classifier appears to make permission decisions based on a partial read of the conversation state, leading to two distinct false-negative patterns observed in a single session of autonomous work today.

Pattern A — subagent verdicts treated as not-yet-returned. When a workflow involves dispatching a peer-review subagent before a destructive op, the classifier sometimes blocks the destructive op citing "subagent verdict not yet returned" / "before the dispatched code-reviewer's verdict is visible" — even though the subagent's tool result (carrying an explicit APPROVE / MERGE verdict) is already present higher up in the same conversation. Hit four times in one session on different destructive ops.

Pattern B — disambiguating context inverted. Operator typed the phrase "can you not use MCPs / CLIs / APIs / Chrome extension to handle all operator tasks bar messaging the pages?". The surrounding conversation made it unambiguous that the operator wanted MORE tooling-driven autonomy (they were correcting me for over-delegating). The classifier read "can you not [verb]" as a prohibition and blocked a subsequent CLI invocation, citing "User explicitly bounded the agent from using CLIs for operator tasks".

Root Cause

Fix Action

Fix / Workaround

Set up a workflow where Claude dispatches a peer-review subagent before a destructive op (schema migration, PR merge, etc.).
Subagent returns with an explicit APPROVE / MERGE verdict as a tool result.
Claude attempts the destructive op in the next assistant turn.
Observe: classifier blocks the op claiming the subagent verdict hasn't returned.

Description

Steps to Reproduce

Pattern A:

Set up a workflow where Claude dispatches a peer-review subagent before a destructive op (schema migration, PR merge, etc.).
Subagent returns with an explicit APPROVE / MERGE verdict as a tool result.
Claude attempts the destructive op in the next assistant turn.
Observe: classifier blocks the op claiming the subagent verdict hasn't returned.

Pattern B:

In a multi-turn conversation, the operator types "can you not [do X]?" where surrounding context makes it clear they want Claude to DO X (e.g. they're correcting Claude for not doing X).
Claude attempts X.
Observe: classifier blocks X citing the operator's instruction as a prohibition.

Expected Behaviour

Pattern A: the classifier should treat an explicit subagent APPROVE / MERGE verdict already present in the conversation as satisfying any "needs peer review" gate it imposes for the destructive op that the review covered.

Pattern B: the classifier should weight surrounding conversational context when interpreting ambiguous instructions, especially when prior turns make the intent clear in one direction.

Actual Behaviour

Pattern A: classifier blocks the destructive op citing the subagent verdict as not yet returned, despite the verdict being in context. The autonomous workflow's "two eyes via subagent peer review" effectively fails to satisfy the safety gate without out-of-band operator confirmation.

Pattern B: classifier reads the literal "can you not" as a prohibition. Operator clarification ("yes, do it") is needed to unblock.

Environment

Tool: Claude Code CLI
Model: Opus 4.7 (1M context)
Date observed: 2026-05-27
Session shape: autonomous walk-the-decks workflow with subagent peer review (documented in user-level CLAUDE.md)

Impact

The autonomous workflow that the operator has formalised in their ~/.claude/CLAUDE.md relies on subagent peer review as the safety gate for destructive ops, so the operator can be away during sessions. Pattern A means each destructive op still requires synchronous operator confirmation, defeating the autonomous-mode value proposition. Pattern B compounds: a single ambiguous instruction in turn N becomes a persistent restriction for the rest of the session unless the operator explicitly re-authorises each operation.

Proposed Fix

Pattern A: include the latest subagent tool result content in the classifier's context window, and check for explicit APPROVE / MERGE / GREEN-light tokens before blocking on "peer review not done".
Pattern B: weight prior conversational context (last 3-5 turns) when interpreting instructions of the "can you not X" / "you should X" / "don't X" shape. The grammatical-prohibition reading should be downweighted when surrounding context flips the polarity.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering