claude-code - 💡(How to fix) Fix [FEATURE] Subagent observability gap — parent agents and operators are blind to subagent tool calls [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46253Fetched 2026-04-11 06:25:12
View on GitHub
Comments
2
Participants
2
Timeline
6
Reactions
0
Author
Timeline (top)
labeled ×3commented ×2cross-referenced ×1

Root Cause

  1. No audit trail — if a subagent overwrites a file, runs a destructive command, or leaks data, there is no record of it outside the subagent's own (inaccessible) context.
  2. No enforcement verification — even if hooks are implemented for subagents (per #45427), you cannot verify they actually fired.
  3. No learning loop input — governance feedback systems that learn from execution history (signal detection, recurrence analysis, pattern optimization) are blind to subagent behavior. Subagents cannot improve because there is no data to improve from.

Fix Action

Fix / Workaround

Governance feedback systems that convert runtime outcomes into reusable signals (recurrence detection, pattern confidence adjustment) depend on receipt data. Receipts capture what the dispatched worker reported, not what subagents did internally. When a subagent causes a failure, the learning loop sees the symptom (GATE_FAILURE) but not the cause (which subagent tool call created the problem). The system learns that things fail but not why subagents cause them to fail. This makes feedback loops progressively less useful as subagent usage increases.

Learning Loop FunctionRequires Subagent Events?Status with Built-in Agent Tool
Signal extraction from receiptsNoWorks (symptom-level)
Recurrence detection across dispatchesNoWorks (pattern-level)
Root cause attributionYesBroken — can't trace to subagent action
Pattern confidence adjustmentYesBroken — no subagent pattern data
Failure class derivationYesDegraded — misattributed to parent
ApproachVerdictWhy
External process spawning (our workaround)Works but wrong long-termRequires building full process management; duplicates what Agent tool should provide
Post-hoc session log retrievalInsufficientNo standard API to retrieve subagent session logs; timing gap between action and detection
Parent-level output parsingFragileParsing final text output for clues is unreliable and lossy
Periodic polling of filesystemFragileDetecting file changes doesn't tell you which agent made them

Code Example

{
  "type": "tool_use",
  "agent_id": "subagent_abc123",
  "agent_depth": 1,
  "parent_agent_id": "parent_xyz789",
  "tool": "Write",
  "input": { "file_path": "/path/to/file.py", "content": "..." },
  "call_id": "call_001"
}
RAW_BUFFERClick to expand / collapse

Preflight Checklist

  • I have searched existing requests and this feature hasn't been requested yet
  • This is a single feature request (not multiple features)

Problem Statement

When Claude Code spawns a subagent via the built-in Agent tool, the parent agent receives only the final text output plus basic metadata (token count, duration, tool use count). The operator, even in --output-format stream-json --verbose mode, sees only 4 event types: task_started, rate_limit_event, task_notification, and tool_use_result. None of these contain the individual tool calls the subagent made.

This creates three cascading problems:

  1. No audit trail — if a subagent overwrites a file, runs a destructive command, or leaks data, there is no record of it outside the subagent's own (inaccessible) context.
  2. No enforcement verification — even if hooks are implemented for subagents (per #45427), you cannot verify they actually fired.
  3. No learning loop input — governance feedback systems that learn from execution history (signal detection, recurrence analysis, pattern optimization) are blind to subagent behavior. Subagents cannot improve because there is no data to improve from.

You cannot govern what you cannot see. You cannot improve what you cannot measure.

Related RFC: #45427 (Dimitri Geelen — Deterministic tool gate). His RFC addresses enforcement; this RFC addresses the complementary observability side.

What the parent agent sees

The parent agent describes the subagent as "essentially a black box." It receives:

  • Final text output from the subagent
  • Basic metadata: token count, duration, number of tool uses

It does not receive: individual tool calls, tool results, reasoning/thinking, conversation history, file modification details, or bash command details.

What the operator sees (stream-json)

EventContains
task_startedSubagent was spawned
rate_limit_eventRate limit encountered (if any)
task_notificationStatus: completed
tool_use_resultFinal text output from subagent

What external processes provide

When spawning a separate claude -p --output-format stream-json --verbose process instead of using the built-in Agent tool, you get full observability: every tool_use event, every tool_result event, every file edit, every bash command. This asymmetry is the core problem.

Impact on learning loops

Governance feedback systems that convert runtime outcomes into reusable signals (recurrence detection, pattern confidence adjustment) depend on receipt data. Receipts capture what the dispatched worker reported, not what subagents did internally. When a subagent causes a failure, the learning loop sees the symptom (GATE_FAILURE) but not the cause (which subagent tool call created the problem). The system learns that things fail but not why subagents cause them to fail. This makes feedback loops progressively less useful as subagent usage increases.

Learning Loop FunctionRequires Subagent Events?Status with Built-in Agent Tool
Signal extraction from receiptsNoWorks (symptom-level)
Recurrence detection across dispatchesNoWorks (pattern-level)
Root cause attributionYesBroken — can't trace to subagent action
Pattern confidence adjustmentYesBroken — no subagent pattern data
Failure class derivationYesDegraded — misattributed to parent

Proposed Solution

Forward subagent tool call events to the parent stream in stream-json mode.

Minimum event set per subagent tool call

{
  "type": "tool_use",
  "agent_id": "subagent_abc123",
  "agent_depth": 1,
  "parent_agent_id": "parent_xyz789",
  "tool": "Write",
  "input": { "file_path": "/path/to/file.py", "content": "..." },
  "call_id": "call_001"
}

Key properties

  1. Agent attribution — every event carries agent_id and agent_depth so operators can distinguish parent from subagent events
  2. Nested forwarding — events bubble up through all levels with incrementing agent_depth
  3. Opt-in granularity — operators can filter by agent_depth if they only want parent events (backward compatible)
  4. No content truncation — tool inputs and outputs forwarded as-is
  5. Append-only semantics — events are ordered and immutable once emitted

The Three Pillars of Agent Governance

PillarPurposeRFC
EnforcementPrevent unauthorized actions#45427 (Tool Gate)
ObservabilityProve what happenedThis RFC (Event Forwarding)
LearningImprove from execution historyEnabled by this RFC

Minimum viable ask

  1. Forward tool_use and tool_result events from subagents to the parent stream in stream-json mode, with agent_id and agent_depth fields
  2. Include subagent events in --verbose output so operators see the full activity log
  3. Provide a --subagent-events flag (or similar) to control forwarding granularity
  4. Document the event schema for subagent events so tooling can be built on top
  5. Emit subagent lifecycle events (subagent_started, subagent_completed, subagent_failed) with summary metadata

Alternative Solutions

ApproachVerdictWhy
External process spawning (our workaround)Works but wrong long-termRequires building full process management; duplicates what Agent tool should provide
Post-hoc session log retrievalInsufficientNo standard API to retrieve subagent session logs; timing gap between action and detection
Parent-level output parsingFragileParsing final text output for clues is unreliable and lossy
Periodic polling of filesystemFragileDetecting file changes doesn't tell you which agent made them

We currently work around this in VNX Orchestration by never using the built-in Agent tool. Instead we spawn all workers as external claude -p --output-format stream-json --verbose processes via a SubprocessAdapter, which gives full stream capture. But this is a workaround — the built-in Agent tool should provide the same observability.

Priority

Medium - Would be very helpful

Feature Category

CLI commands and flags

Use Case Example

  1. I operate VNX Orchestration (https://github.com/Vinix24/vnx-orchestration), a multi-terminal agent system on Claude Code
  2. An orchestrator (T0) dispatches work to worker terminals (T1/T2/T3) through a structured dispatch queue with human approval
  3. A worker spawns a subagent via the Agent tool to "refactor the utility module"
  4. The subagent rewrites utils.py with a breaking change
  5. Gate fails: tests don't pass
  6. Receipt records: GATE_FAILURE, codex_gate, test_failure
  7. But the root cause — the subagent's destructive rewrite — is invisible
  8. The learning loop recommendation targets the gate or the dispatch, never the subagent behavior
  9. With subagent event forwarding, I would see the exact tool_use event where the subagent called Write on utils.py, enabling root cause attribution

Additional Context

Related issues: #43772, #32376, #32193, #31250, #39903 Related RFC: #45427 (Dimitri Geelen — Deterministic tool gate) Tested on: Claude Code v2.1.100, --output-format stream-json --verbose

This RFC is complementary to #45427. Enforcement (#45427) without observability (this RFC) is unverifiable. Observability without enforcement is informational only. Together they form a complete governance framework.

Evidence and workaround implementation: https://github.com/Vinix24/vnx-orchestration (see scripts/lib/subprocess_adapter.py for full stream capture)

extent analysis

TL;DR

Forward subagent tool call events to the parent stream in stream-json mode to provide observability into subagent behavior.

Guidance

  1. Implement event forwarding: Modify the built-in Agent tool to forward tool_use and tool_result events from subagents to the parent stream.
  2. Include subagent events in verbose output: Update the --verbose output to include subagent events, providing a full activity log.
  3. Add a --subagent-events flag: Introduce a flag to control the granularity of subagent event forwarding.
  4. Document the event schema: Define and document the event schema for subagent events to enable tooling development.
  5. Emit subagent lifecycle events: Include events for subagent startup, completion, and failure, along with summary metadata.

Example

A sample tool_use event:

{
  "type": "tool_use",
  "agent_id": "subagent_abc123",
  "agent_depth": 1,
  "parent_agent_id": "parent_xyz789",
  "tool": "Write",
  "input": { "file_path": "/path/to/file.py", "content": "..." },
  "call_id": "call_001"
}

Notes

This solution focuses on providing observability into subagent behavior, which is essential for governance and learning loops. The implementation should ensure that subagent events are properly attributed and forwarded to the parent stream.

Recommendation

Apply the workaround by spawning subagents as external claude -p --output-format stream-json --verbose processes via a SubprocessAdapter, as implemented in VNX Orchestration, until the built-in Agent tool is updated to provide subagent event forwarding.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING