claude-code - 💡(How to fix) Fix [FEATURE] Subagent observability gap — parent agents and operators are blind to subagent tool calls [2 comments, 2 participants]

Vinix24 · 2026-04-10T14:00:44Z

[claude-code] Preflight Checklist - x I have searched existing requests https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20label%3Aenhancement and… ## Fix / Workaround Governance feedback systems that convert runtime outcomes into reusable signals (recurrence detection, pattern confidence adjustment) depend on receipt data. Receipts capture what the dispatched worker reported, not what subagents did internally. When a subagent causes a failure, the learning loop sees the symptom (`GATE_FAILURE`) but not the cause (which subagent tool call created the problem). The system learns *that* things fail but not *why* subagents cause them to fail. This makes feedback loops progressively less useful as subagent usage increases. | Learning Loop Function | Requires Subagent Events? | Status with Built-in Agent Tool | |---|---|---| | Signal extraction from receipts | No | Works (symptom-level) | | Recurrence detection across dispatches | No | Works (pattern-level) | | Root cause attribution | **Yes** | **Broken** — can't trace to subagent action | | Pattern confidence adjustment | **Yes** | **Broken** — no subagent pattern data | | Failure class derivation | **Yes** | **Degraded** — misattributed to parent | | Approach | Verdict | Why | |---|---|---| | External process spawning (our workaround) | Works but wrong long-term | Requires building full process management; duplicates what Agent tool should provide | | Post-hoc session log retrieval | Insufficient | No standard API to retrieve subagent session logs; timing gap between action and detection | | Parent-level output parsing | Fragile | Parsing final text output for clues is unreliable and lossy | | Periodic polling of filesystem | Fragile | Detecting file changes doesn't tell you which agent made them | ### Preflight Checklist - [x] I have searched [existing requests](https://github.com/anthropics/claude-code/issues?q=is%3Aissue%20label%3Aenhancement) and this feature hasn't been requested yet - [x] This is a single feature request (not multiple features) ### Problem Statement When Claude Code spawns a subagent via the built-in `Agent` tool, the parent agent receives only the final text output plus basic metadata (token count, duration, tool use count). The operator, even in `--output-format stream-json --verbose` mode, sees only 4 event types: `task_started`, `rate_limit_event`, `task_notification`, and `tool_use_result`. None of these contain the individual tool calls the subagent made. This creates three cascading problems: 1. **No audit trail** — if a subagent overwrites a file, runs a destructive command, or leaks data, there is no record of it outside the subagent's own (inaccessible) context. 2. **No enforcement verification** — even if hooks are implemented for subagents (per #45427), you cannot verify they actually fired. 3. **No learning loop input** — governance feedback systems that learn from execution history (signal detection, recurrence analysis, pattern optimization) are blind to subagent behavior. Subagents cannot improve because there is no data to improve from. **You cannot govern what you cannot see. You cannot improve what you cannot measure.** Related RFC: #45427 (Dimitri Geelen — Deterministic tool gate). His RFC addresses enforcement; this RFC addresses the complementary observability side. ### What the parent agent sees The parent agent describes the subagent as "essentially a black box." It receives: - Final text output from the subagent - Basic metadata: token count, duration, number of tool uses It does **not** receive: individual tool calls, tool results, reasoning/thinking, conversation history, file modification details, or bash command details. ### What the operator sees (stream-json) | Event | Contains | |---|---| | `task_started` | Subagent was spawned | | `rate_limit_event` | Rate limit encountered (if any) | | `task_notification` | Status: completed | | `tool_use_result` | Final text output from subagent | ### What external processes provide When spawning a separate `claude -p --output-format stream-json --verbose` process instead of using the built-in `Agent` tool, you get **full observability**: every `tool_use` event, every `tool_result` event, every file edit, every bash command. This asymmetry is the core problem. ### Impact on learning loops Governance feedback systems that convert runtime outcomes into reusable signals (recurrence detection, pattern confidence adjustment) depend on receipt data. Receipts capture what the dispatched worker reported, not what subagents did internally. When a subagent causes a failure, the learning loop sees the symptom (`GATE_FAILURE`) but not the cause (which subagent tool call created the problem). The system learns *that* things fail but not *why* subagents cause them to fail. This makes feedback loops progressively less useful as subagent usage increases. | Learning Loop Function | Requires Subagent Events? | Status with Built-in Agent Tool | |---|---|---| | Signal extract

claude-code2026-04-10 14:00:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

anthropics/claude-code#46253•Fetched 2026-04-11 06:25:12

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Vinix24

Participants

github-actions[bot]

Vinix24

Timeline (top)

labeled ×3commented ×2cross-referenced ×1

Root Cause

No audit trail — if a subagent overwrites a file, runs a destructive command, or leaks data, there is no record of it outside the subagent's own (inaccessible) context.
No enforcement verification — even if hooks are implemented for subagents (per #45427), you cannot verify they actually fired.
No learning loop input — governance feedback systems that learn from execution history (signal detection, recurrence analysis, pattern optimization) are blind to subagent behavior. Subagents cannot improve because there is no data to improve from.

Fix Action

Fix / Workaround

Governance feedback systems that convert runtime outcomes into reusable signals (recurrence detection, pattern confidence adjustment) depend on receipt data. Receipts capture what the dispatched worker reported, not what subagents did internally. When a subagent causes a failure, the learning loop sees the symptom (GATE_FAILURE) but not the cause (which subagent tool call created the problem). The system learns that things fail but not why subagents cause them to fail. This makes feedback loops progressively less useful as subagent usage increases.

Learning Loop Function	Requires Subagent Events?	Status with Built-in Agent Tool
Signal extraction from receipts	No	Works (symptom-level)
Recurrence detection across dispatches	No	Works (pattern-level)
Root cause attribution	Yes	Broken — can't trace to subagent action
Pattern confidence adjustment	Yes	Broken — no subagent pattern data
Failure class derivation	Yes	Degraded — misattributed to parent

Approach	Verdict	Why
External process spawning (our workaround)	Works but wrong long-term	Requires building full process management; duplicates what Agent tool should provide
Post-hoc session log retrieval	Insufficient	No standard API to retrieve subagent session logs; timing gap between action and detection
Parent-level output parsing	Fragile	Parsing final text output for clues is unreliable and lossy
Periodic polling of filesystem	Fragile	Detecting file changes doesn't tell you which agent made them

Code Example

{
  "type": "tool_use",
  "agent_id": "subagent_abc123",
  "agent_depth": 1,
  "parent_agent_id": "parent_xyz789",
  "tool": "Write",
  "input": { "file_path": "/path/to/file.py", "content": "..." },
  "call_id": "call_001"
}

RAW_BUFFERClick to expand / collapse

Preflight Checklist

I have searched existing requests and this feature hasn't been requested yet
This is a single feature request (not multiple features)

Problem Statement

When Claude Code spawns a subagent via the built-in Agent tool, the parent agent receives only the final text output plus basic metadata (token count, duration, tool use count). The operator, even in --output-format stream-json --verbose mode, sees only 4 event types: task_started, rate_limit_event, task_notification, and tool_use_result. None of these contain the individual tool calls the subagent made.

This creates three cascading problems:

No audit trail — if a subagent overwrites a file, runs a destructive command, or leaks data, there is no record of it outside the subagent's own (inaccessible) context.
No enforcement verification — even if hooks are implemented for subagents (per #45427), you cannot verify they actually fired.
No learning loop input — governance feedback systems that learn from execution history (signal detection, recurrence analysis, pattern optimization) are blind to subagent behavior. Subagents cannot improve because there is no data to improve from.

You cannot govern what you cannot see. You cannot improve what you cannot measure.

Related RFC: #45427 (Dimitri Geelen — Deterministic tool gate). His RFC addresses enforcement; this RFC addresses the complementary observability side.

What the parent agent sees

The parent agent describes the subagent as "essentially a black box." It receives:

Final text output from the subagent
Basic metadata: token count, duration, number of tool uses

It does not receive: individual tool calls, tool results, reasoning/thinking, conversation history, file modification details, or bash command details.

What the operator sees (stream-json)

Event	Contains
`task_started`	Subagent was spawned
`rate_limit_event`	Rate limit encountered (if any)
`task_notification`	Status: completed
`tool_use_result`	Final text output from subagent

What external processes provide

When spawning a separate claude -p --output-format stream-json --verbose process instead of using the built-in Agent tool, you get full observability: every tool_use event, every tool_result event, every file edit, every bash command. This asymmetry is the core problem.

Impact on learning loops

Learning Loop Function	Requires Subagent Events?	Status with Built-in Agent Tool
Signal extraction from receipts	No	Works (symptom-level)
Recurrence detection across dispatches	No	Works (pattern-level)
Root cause attribution	Yes	Broken — can't trace to subagent action
Pattern confidence adjustment	Yes	Broken — no subagent pattern data
Failure class derivation	Yes	Degraded — misattributed to parent

Proposed Solution

Forward subagent tool call events to the parent stream in stream-json mode.

Minimum event set per subagent tool call

{
  "type": "tool_use",
  "agent_id": "subagent_abc123",
  "agent_depth": 1,
  "parent_agent_id": "parent_xyz789",
  "tool": "Write",
  "input": { "file_path": "/path/to/file.py", "content": "..." },
  "call_id": "call_001"
}

Key properties

Agent attribution — every event carries agent_id and agent_depth so operators can distinguish parent from subagent events
Nested forwarding — events bubble up through all levels with incrementing agent_depth
Opt-in granularity — operators can filter by agent_depth if they only want parent events (backward compatible)
No content truncation — tool inputs and outputs forwarded as-is
Append-only semantics — events are ordered and immutable once emitted

The Three Pillars of Agent Governance

Pillar	Purpose	RFC
Enforcement	Prevent unauthorized actions	#45427 (Tool Gate)
Observability	Prove what happened	This RFC (Event Forwarding)
Learning	Improve from execution history	Enabled by this RFC

Minimum viable ask

Forward tool_use and tool_result events from subagents to the parent stream in stream-json mode, with agent_id and agent_depth fields
Include subagent events in --verbose output so operators see the full activity log
Provide a --subagent-events flag (or similar) to control forwarding granularity
Document the event schema for subagent events so tooling can be built on top
Emit subagent lifecycle events (subagent_started, subagent_completed, subagent_failed) with summary metadata

Alternative Solutions

Approach	Verdict	Why
External process spawning (our workaround)	Works but wrong long-term	Requires building full process management; duplicates what Agent tool should provide
Post-hoc session log retrieval	Insufficient	No standard API to retrieve subagent session logs; timing gap between action and detection
Parent-level output parsing	Fragile	Parsing final text output for clues is unreliable and lossy
Periodic polling of filesystem	Fragile	Detecting file changes doesn't tell you which agent made them

We currently work around this in VNX Orchestration by never using the built-in Agent tool. Instead we spawn all workers as external claude -p --output-format stream-json --verbose processes via a SubprocessAdapter, which gives full stream capture. But this is a workaround — the built-in Agent tool should provide the same observability.

Priority

Medium - Would be very helpful

Feature Category

CLI commands and flags

Use Case Example

I operate VNX Orchestration (https://github.com/Vinix24/vnx-orchestration), a multi-terminal agent system on Claude Code
An orchestrator (T0) dispatches work to worker terminals (T1/T2/T3) through a structured dispatch queue with human approval
A worker spawns a subagent via the Agent tool to "refactor the utility module"
The subagent rewrites utils.py with a breaking change
Gate fails: tests don't pass
Receipt records: GATE_FAILURE, codex_gate, test_failure
But the root cause — the subagent's destructive rewrite — is invisible
The learning loop recommendation targets the gate or the dispatch, never the subagent behavior
With subagent event forwarding, I would see the exact tool_use event where the subagent called Write on utils.py, enabling root cause attribution

Additional Context

Related issues: #43772, #32376, #32193, #31250, #39903 Related RFC: #45427 (Dimitri Geelen — Deterministic tool gate) Tested on: Claude Code v2.1.100, --output-format stream-json --verbose

This RFC is complementary to #45427. Enforcement (#45427) without observability (this RFC) is unverifiable. Observability without enforcement is informational only. Together they form a complete governance framework.

Evidence and workaround implementation: https://github.com/Vinix24/vnx-orchestration (see scripts/lib/subprocess_adapter.py for full stream capture)

extent analysis

TL;DR

Forward subagent tool call events to the parent stream in stream-json mode to provide observability into subagent behavior.

Guidance

Implement event forwarding: Modify the built-in Agent tool to forward tool_use and tool_result events from subagents to the parent stream.
Include subagent events in verbose output: Update the --verbose output to include subagent events, providing a full activity log.
Add a --subagent-events flag: Introduce a flag to control the granularity of subagent event forwarding.
Document the event schema: Define and document the event schema for subagent events to enable tooling development.
Emit subagent lifecycle events: Include events for subagent startup, completion, and failure, along with summary metadata.

Example

A sample tool_use event:

{
  "type": "tool_use",
  "agent_id": "subagent_abc123",
  "agent_depth": 1,
  "parent_agent_id": "parent_xyz789",
  "tool": "Write",
  "input": { "file_path": "/path/to/file.py", "content": "..." },
  "call_id": "call_001"
}

Notes

This solution focuses on providing observability into subagent behavior, which is essential for governance and learning loops. The implementation should ensure that subagent events are properly attributed and forwarded to the parent stream.

Recommendation

Apply the workaround by spawning subagents as external claude -p --output-format stream-json --verbose processes via a SubprocessAdapter, as implemented in VNX Orchestration, until the built-in Agent tool is updated to provide subagent event forwarding.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #optimization #conversation history #tool integration #LLM response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.