claude-code - 💡(How to fix) Fix [BUG] Task() tool has no timeout — subagent hang leaves orchestrator stuck indefinitely on Windows (no MCP servers required) (refs #28126) [4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#49150Fetched 2026-04-17 08:49:28
View on GitHub
Comments
4
Participants
3
Timeline
12
Reactions
0
Author
Timeline (top)
commented ×4labeled ×3cross-referenced ×2closed ×1

Task() tool blocks the orchestrator indefinitely when a subagent hits a communication failure. No timeout, no watchdog, no heartbeat. Observed: 30+ min hang on Windows while the subagent's work-product sat complete on disk and 35k tokens burned in the meantime.

Refs #28126 (closed + locked — lock message says to file new issue and link). That issue focused on MCP server duplication as the deadlock cause. This case has zero user-configured MCP servers — which strengthens the case that the parent↔subagent stdio fragility is not MCP-specific, and the missing-timeout is a real gap independent of MCP load.

Error Message

  • Silent — no error surfaces to the orchestrator
  1. returns a structured "subagent timed out" error to the orchestrator when exceeded (so the orchestrator code can handle it)
  2. includes, in the error, last-stdout-timestamp + whether any files were created during the attempt (so orchestrators can recover useful work)

Root Cause

Post-mortem evidence (task state dir persists because no cleanup fires on force-kill):

~/.claude/tasks/{uuid}/
  1.json      pattern-mapper  status: completed
  2.json      gsd-planner     status: in_progress  (mtime 14:29 — never flipped)
  3.json      plan-checker    status: pending      (never spawned)
  4.json      state-update    status: pending
  .lock       still present

Fix Action

Fix / Workaround

Possibilities include: pipe buffer deadlock, Windows-specific handle inheritance, JSON chunk that straddles a buffer boundary and is never flushed, subagent process exited but parent didn't get SIGCHLD-equivalent. Without runtime tracing on the CLI I can't narrow further — but the symptom is reproducible enough to warrant a runtime-level mitigation regardless of the exact cause.

Workarounds (for others hitting this today)

Code Example

~/.claude/tasks/{uuid}/
  1.json      pattern-mapper  status: completed
  2.json      gsd-planner     status: in_progress  (mtime 14:29 — never flipped)
  3.json      plan-checker    status: pending      (never spawned)
  4.json      state-update    status: pending
  .lock       still present

---

36-01-PLAN.md  14:32  20 KB
36-02-PLAN.md  14:36  33 KB
36-03-PLAN.md  14:42  35 KB
36-04-PLAN.md  14:46  37 KB
36-05-PLAN.md  14:52  28 KB  ← last write, ends mid-conversation with parent
RAW_BUFFERClick to expand / collapse

Summary

Task() tool blocks the orchestrator indefinitely when a subagent hits a communication failure. No timeout, no watchdog, no heartbeat. Observed: 30+ min hang on Windows while the subagent's work-product sat complete on disk and 35k tokens burned in the meantime.

Refs #28126 (closed + locked — lock message says to file new issue and link). That issue focused on MCP server duplication as the deadlock cause. This case has zero user-configured MCP servers — which strengthens the case that the parent↔subagent stdio fragility is not MCP-specific, and the missing-timeout is a real gap independent of MCP load.

Concrete incident (2026-04-16, Windows 11)

Setup:

  • Claude Code (current stable, v2.1.5x)
  • Windows 11 (26200)
  • ~/.claude/settings.json has no mcpServers block
  • No .mcp.json anywhere in the user's config or project
  • mcp-needs-auth-cache.json is empty
  • Running a GSD workflow (/gsd-plan-phase) which spawns gsd-planner via Task() to write 5 PLAN.md files

Timeline:

  • 14:24 Task() spawned gsd-planner subagent
  • 14:29 ~/.claude/tasks/{uuid}/2.json flipped gsd-planner to in_progress
  • 14:32–14:52 All 5 PLAN.md files written to disk by the subagent. Last file is 28 KB, fully formed (frontmatter, tasks, threat_model, verification sections all intact).
  • 14:52 Last filesystem write from the subagent.
  • 14:52 → ~15:22 30 min of dead silence. Orchestrator stayed inside the Task() call. Tokens continued to burn (~35k during the hang). User force-killed the terminal.

Post-mortem evidence (task state dir persists because no cleanup fires on force-kill):

~/.claude/tasks/{uuid}/
  1.json      pattern-mapper  status: completed
  2.json      gsd-planner     status: in_progress  (mtime 14:29 — never flipped)
  3.json      plan-checker    status: pending      (never spawned)
  4.json      state-update    status: pending
  .lock       still present

Plans written by the subagent:

36-01-PLAN.md  14:32  20 KB
36-02-PLAN.md  14:36  33 KB
36-03-PLAN.md  14:42  35 KB
36-04-PLAN.md  14:46  37 KB
36-05-PLAN.md  14:52  28 KB  ← last write, ends mid-conversation with parent

Every file is syntactically complete. The subagent finished its work. What didn't reach the parent was the tool-call return value.

What this rules out

  • MCP duplication is not required. This system has no MCP servers configured. Whatever is hanging the parent↔subagent channel operates with zero MCP load.
  • Subagent crash before completing work is not the cause. All output files are intact and complete.
  • Network / external services are not involved. The subagent wrote local files only.

Root cause (hypothesis)

Something in the parent↔subagent stdio / IPC path on Windows can enter a state where:

  • The subagent has done its work and emitted its final response
  • The parent CLI process never reads / never reaps that response
  • Task() tool call in the orchestrator blocks forever waiting

Possibilities include: pipe buffer deadlock, Windows-specific handle inheritance, JSON chunk that straddles a buffer boundary and is never flushed, subagent process exited but parent didn't get SIGCHLD-equivalent. Without runtime tracing on the CLI I can't narrow further — but the symptom is reproducible enough to warrant a runtime-level mitigation regardless of the exact cause.

Impact

Every Claude Code user running an orchestration workflow with long-running Task() calls on Windows is exposed. The failure is:

  • Silent — no error surfaces to the orchestrator
  • Expensive — the agent loop continues generating, burning tokens during the hang
  • Unrecoverable from within the session — only terminal-kill works
  • Invisible in logs — task state JSON shows "in_progress" forever

Ask

Expose a timeout option on Task() — e.g. Task(..., timeout_ms=N) or a per-session default — that:

  1. returns a structured "subagent timed out" error to the orchestrator when exceeded (so the orchestrator code can handle it)
  2. SIGTERM/SIGKILLs the subagent process tree
  3. removes the ~/.claude/tasks/{uuid}/ dir on timeout
  4. includes, in the error, last-stdout-timestamp + whether any files were created during the attempt (so orchestrators can recover useful work)

Even a default-off opt-in would unblock self-protection. Orchestrators could wrap Task() in a retry/fallback path; today they cannot.

Per-subagent MCP scoping requests (#4380, #4476) address one cause of hangs. This issue addresses the orchestrator's inability to detect the symptom — which matters even when MCP is not the cause, as proven here.

Workarounds (for others hitting this today)

  • Split long subagent work into multiple short Task() calls. Between calls, the orchestrator is alive and can verify filesystem state, commit incrementally, and detect "files written but no completion marker."
  • After a hang + terminal kill: rm -rf ~/.claude/tasks/* and manually clean orphan node processes (Windows doesn't reap children when the parent is force-killed).

extent analysis

TL;DR

Implementing a timeout option for the Task() function, such as Task(..., timeout_ms=N), would allow the orchestrator to detect and handle subagent timeouts, preventing indefinite hangs.

Guidance

  • The issue is likely caused by a deadlock or missing timeout in the parent↔subagent stdio/IPC path on Windows, resulting in the Task() tool call blocking forever.
  • To mitigate this, consider splitting long subagent work into multiple short Task() calls, allowing the orchestrator to verify filesystem state and detect potential issues between calls.
  • After a hang, manually cleaning up the ~/.claude/tasks/* directory and orphan node processes can help recover from the issue.
  • Implementing a timeout option for Task() would enable the orchestrator to handle subagent timeouts and prevent similar issues in the future.

Example

No code snippet is provided as the issue does not contain sufficient information to create a specific example.

Notes

The exact cause of the issue is unclear without runtime tracing, but implementing a timeout option for Task() would provide a robust mitigation strategy.

Recommendation

Apply a workaround by splitting long subagent work into multiple short Task() calls until a timeout option is implemented for the Task() function. This will allow the orchestrator to detect and handle potential issues, preventing indefinite hangs and token burn.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Task() tool has no timeout — subagent hang leaves orchestrator stuck indefinitely on Windows (no MCP servers required) (refs #28126) [4 comments, 3 participants]