claude-code - 💡(How to fix) Fix Harness silently executes duplicated parallel tool_use blocks: subagent fan-out runs N× the intended count (6 → 24)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In a single assistant turn that fans out a fixed set of parallel subagents (the Task/Agent tool), the model can degenerate into re-emitting the same batch of parallel tool_use blocks multiple times before yielding the turn. Claude Code executes every emitted block, so an intended fan-out of 6 subagents became 24 — each a full subagent burning large token counts. There is no deduplication of identical parallel tool_use calls within a turn, no cap on concurrent subagent fan-out, and no warning, so the blowup is silent until the running-agents count is noticed.

This is triggered by model degeneration, but the cost is a harness concern: the harness is what converts emitted blocks into billed, executed work, and it has no backstop against a stuttered re-emission of an identical batch.

Error Message

  1. Warn when a turn emits the same tool-call signature more than K times — a strong degeneration signal regardless of dedup policy.
  • #20640 / #20693 (tool_use ids must be unique) — same "duplicate tool_use blocks in one turn" family, but those terminate in an API 400 error; here the IDs are unique and all copies execute silently.

Root Cause

In a single assistant turn that fans out a fixed set of parallel subagents (the Task/Agent tool), the model can degenerate into re-emitting the same batch of parallel tool_use blocks multiple times before yielding the turn. Claude Code executes every emitted block, so an intended fan-out of 6 subagents became 24 — each a full subagent burning large token counts. There is no deduplication of identical parallel tool_use calls within a turn, no cap on concurrent subagent fan-out, and no warning, so the blowup is silent until the running-agents count is noticed.

This is triggered by model degeneration, but the cost is a harness concern: the harness is what converts emitted blocks into billed, executed work, and it has no backstop against a stuttered re-emission of an identical batch.

Fix Action

Fix / Workaround

A turn was supposed to dispatch a fixed panel of 6 parallel subagents in one message. Instead the UI showed 24 concurrent subagents — the same 6-member batch repeated ~4× (individual members appearing 3–5 times each), each running to completion at ~70k–220k tokens.

Evidence that this was one non-yielding turn (not legitimate sequential re-dispatch)

  • Between the first subagent dispatch and the first subagent result returning, there were 18 subagent dispatches with ZERO interleaved tool-results. Sequential tool-calling cannot emit call #2 before call #1 returns; emitting 18 with no results in between is only possible by repeatedly emitting the parallel batch within a single non-yielding turn.
  • The dispatch description text degraded across repeats — the first batch carried full descriptions, later batches collapsed to a truncated form. Degrading, repeating output is the fingerprint of autoregressive degeneration.
  • The fanned-out calls differed only in the subagent type — identical short prompt, near-identical description across the batch. That maximally-repetitive shape is exactly what seeds a tool-call repetition loop.
RAW_BUFFERClick to expand / collapse

Summary

In a single assistant turn that fans out a fixed set of parallel subagents (the Task/Agent tool), the model can degenerate into re-emitting the same batch of parallel tool_use blocks multiple times before yielding the turn. Claude Code executes every emitted block, so an intended fan-out of 6 subagents became 24 — each a full subagent burning large token counts. There is no deduplication of identical parallel tool_use calls within a turn, no cap on concurrent subagent fan-out, and no warning, so the blowup is silent until the running-agents count is noticed.

This is triggered by model degeneration, but the cost is a harness concern: the harness is what converts emitted blocks into billed, executed work, and it has no backstop against a stuttered re-emission of an identical batch.

Environment

  • Claude Code CLI 2.1.158
  • Model: Opus (extended thinking enabled)
  • Plain Task/Agent subagent fan-out (not the experimental Agent Teams feature)

What happened

A turn was supposed to dispatch a fixed panel of 6 parallel subagents in one message. Instead the UI showed 24 concurrent subagents — the same 6-member batch repeated ~4× (individual members appearing 3–5 times each), each running to completion at ~70k–220k tokens.

Evidence that this was one non-yielding turn (not legitimate sequential re-dispatch)

From the session transcript:

  • Between the first subagent dispatch and the first subagent result returning, there were 18 subagent dispatches with ZERO interleaved tool-results. Sequential tool-calling cannot emit call #2 before call #1 returns; emitting 18 with no results in between is only possible by repeatedly emitting the parallel batch within a single non-yielding turn.
  • The dispatch description text degraded across repeats — the first batch carried full descriptions, later batches collapsed to a truncated form. Degrading, repeating output is the fingerprint of autoregressive degeneration.
  • The fanned-out calls differed only in the subagent type — identical short prompt, near-identical description across the batch. That maximally-repetitive shape is exactly what seeds a tool-call repetition loop.

Impact

  • Intended fan-out: 6 subagents. Actual: 24. ~4× token spend and wall-clock for one operation.
  • Silent — no cap, no dedup, no warning; only noticeable via the running-agents count.
  • Non-deterministic (model variance). A milder instance (a single subagent emitted 3× in one turn) was observed days earlier in the same setup, so this is a latent class, not a one-off.

Expected behavior / suggested fixes (harness side)

  1. Deduplicate identical parallel tool_use blocks within a single assistant turn before execution — at minimum, collapse exact-duplicate subagent dispatches (same subagent type + identical prompt) emitted in one turn. Highest-value backstop.
  2. Cap concurrent subagent fan-out with a soft limit + confirmation above a threshold (e.g. "About to launch 24 agents — continue?"), so a degenerate emission can't silently run.
  3. Warn when a turn emits the same tool-call signature more than K times — a strong degeneration signal regardless of dedup policy.

A prompt/skill author cannot reliably prevent this, since the degenerating model is the same one that would have to read a "don't repeat" instruction. The reliable fix is at the layer that turns emitted blocks into executed work.

Possibly related (but believed distinct)

  • #55586 (Agent Teams: single spawn creates many duplicate workers) — its "within-turn duplication" mechanism is behaviorally the same N→k×N pattern, but that report's repro is gated on the experimental Agent Teams feature; this occurs on plain Task/Agent fan-out.
  • #20640 / #20693 (tool_use ids must be unique) — same "duplicate tool_use blocks in one turn" family, but those terminate in an API 400 error; here the IDs are unique and all copies execute silently.

Note on transcript logging (secondary)

In the session .jsonl, requestId, message.id, and stop_reason were stamped identically across an entire multi-minute turn-group, including unrelated earlier tool calls that completed with their own results. Those fields therefore look like a turn-group/UI grouping value rather than a raw per-API-response identifier, which makes them unreliable for attributing a truncation to a specific API response. Possibly worth confirming whether that field reuse is intended.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Harness silently executes duplicated parallel tool_use blocks: subagent fan-out runs N× the intended count (6 → 24)