claude-code - 💡(How to fix) Fix [BUG] Task() tool has no timeout — subagent hang leaves orchestrator stuck indefinitely on Windows (no MCP servers required) (refs #28126) [4 comments, 3 participants]

Error Message

Silent — no error surfaces to the orchestrator

returns a structured "subagent timed out" error to the orchestrator when exceeded (so the orchestrator code can handle it)
includes, in the error, last-stdout-timestamp + whether any files were created during the attempt (so orchestrators can recover useful work)

Root Cause

Post-mortem evidence (task state dir persists because no cleanup fires on force-kill):

~/.claude/tasks/{uuid}/
  1.json      pattern-mapper  status: completed
  2.json      gsd-planner     status: in_progress  (mtime 14:29 — never flipped)
  3.json      plan-checker    status: pending      (never spawned)
  4.json      state-update    status: pending
  .lock       still present

Fix Action

Fix / Workaround

Possibilities include: pipe buffer deadlock, Windows-specific handle inheritance, JSON chunk that straddles a buffer boundary and is never flushed, subagent process exited but parent didn't get SIGCHLD-equivalent. Without runtime tracing on the CLI I can't narrow further — but the symptom is reproducible enough to warrant a runtime-level mitigation regardless of the exact cause.

Workarounds (for others hitting this today)

Code Example

~/.claude/tasks/{uuid}/
  1.json      pattern-mapper  status: completed
  2.json      gsd-planner     status: in_progress  (mtime 14:29 — never flipped)
  3.json      plan-checker    status: pending      (never spawned)
  4.json      state-update    status: pending
  .lock       still present

---

36-01-PLAN.md  14:32  20 KB
36-02-PLAN.md  14:36  33 KB
36-03-PLAN.md  14:42  35 KB
36-04-PLAN.md  14:46  37 KB
36-05-PLAN.md  14:52  28 KB  ← last write, ends mid-conversation with parent

Summary

Task() tool blocks the orchestrator indefinitely when a subagent hits a communication failure. No timeout, no watchdog, no heartbeat. Observed: 30+ min hang on Windows while the subagent's work-product sat complete on disk and 35k tokens burned in the meantime.

Refs #28126 (closed + locked — lock message says to file new issue and link). That issue focused on MCP server duplication as the deadlock cause. This case has zero user-configured MCP servers — which strengthens the case that the parent↔subagent stdio fragility is not MCP-specific, and the missing-timeout is a real gap independent of MCP load.

Concrete incident (2026-04-16, Windows 11)

Setup:

Claude Code (current stable, v2.1.5x)
Windows 11 (26200)
~/.claude/settings.json has no mcpServers block
No .mcp.json anywhere in the user's config or project
mcp-needs-auth-cache.json is empty
Running a GSD workflow (/gsd-plan-phase) which spawns gsd-planner via Task() to write 5 PLAN.md files

Timeline:

14:24 Task() spawned gsd-planner subagent
14:29 ~/.claude/tasks/{uuid}/2.json flipped gsd-planner to in_progress
14:32–14:52 All 5 PLAN.md files written to disk by the subagent. Last file is 28 KB, fully formed (frontmatter, tasks, threat_model, verification sections all intact).
14:52 Last filesystem write from the subagent.
14:52 → ~15:22 30 min of dead silence. Orchestrator stayed inside the Task() call. Tokens continued to burn (~35k during the hang). User force-killed the terminal.

Post-mortem evidence (task state dir persists because no cleanup fires on force-kill):

~/.claude/tasks/{uuid}/
  1.json      pattern-mapper  status: completed
  2.json      gsd-planner     status: in_progress  (mtime 14:29 — never flipped)
  3.json      plan-checker    status: pending      (never spawned)
  4.json      state-update    status: pending
  .lock       still present

Plans written by the subagent:

36-01-PLAN.md  14:32  20 KB
36-02-PLAN.md  14:36  33 KB
36-03-PLAN.md  14:42  35 KB
36-04-PLAN.md  14:46  37 KB
36-05-PLAN.md  14:52  28 KB  ← last write, ends mid-conversation with parent

Every file is syntactically complete. The subagent finished its work. What didn't reach the parent was the tool-call return value.

What this rules out

MCP duplication is not required. This system has no MCP servers configured. Whatever is hanging the parent↔subagent channel operates with zero MCP load.
Subagent crash before completing work is not the cause. All output files are intact and complete.
Network / external services are not involved. The subagent wrote local files only.

Root cause (hypothesis)

Something in the parent↔subagent stdio / IPC path on Windows can enter a state where:

The subagent has done its work and emitted its final response
The parent CLI process never reads / never reaps that response
Task() tool call in the orchestrator blocks forever waiting

Impact

Every Claude Code user running an orchestration workflow with long-running Task() calls on Windows is exposed. The failure is:

Silent — no error surfaces to the orchestrator
Expensive — the agent loop continues generating, burning tokens during the hang
Unrecoverable from within the session — only terminal-kill works
Invisible in logs — task state JSON shows "in_progress" forever

Ask

Expose a timeout option on Task() — e.g. Task(..., timeout_ms=N) or a per-session default — that:

returns a structured "subagent timed out" error to the orchestrator when exceeded (so the orchestrator code can handle it)
SIGTERM/SIGKILLs the subagent process tree
removes the ~/.claude/tasks/{uuid}/ dir on timeout
includes, in the error, last-stdout-timestamp + whether any files were created during the attempt (so orchestrators can recover useful work)

Even a default-off opt-in would unblock self-protection. Orchestrators could wrap Task() in a retry/fallback path; today they cannot.

Per-subagent MCP scoping requests (#4380, #4476) address one cause of hangs. This issue addresses the orchestrator's inability to detect the symptom — which matters even when MCP is not the cause, as proven here.

Workarounds (for others hitting this today)

Split long subagent work into multiple short Task() calls. Between calls, the orchestrator is alive and can verify filesystem state, commit incrementally, and detect "files written but no completion marker."
After a hang + terminal kill: rm -rf ~/.claude/tasks/* and manually clean orphan node processes (Windows doesn't reap children when the parent is force-killed).

extent analysis

TL;DR

Implementing a timeout option for the Task() function, such as Task(..., timeout_ms=N), would allow the orchestrator to detect and handle subagent timeouts, preventing indefinite hangs.

Guidance

The issue is likely caused by a deadlock or missing timeout in the parent↔subagent stdio/IPC path on Windows, resulting in the Task() tool call blocking forever.
To mitigate this, consider splitting long subagent work into multiple short Task() calls, allowing the orchestrator to verify filesystem state and detect potential issues between calls.
After a hang, manually cleaning up the ~/.claude/tasks/* directory and orphan node processes can help recover from the issue.
Implementing a timeout option for Task() would enable the orchestrator to handle subagent timeouts and prevent similar issues in the future.

Example

No code snippet is provided as the issue does not contain sufficient information to create a specific example.

Notes

The exact cause of the issue is unclear without runtime tracing, but implementing a timeout option for Task() would provide a robust mitigation strategy.

Recommendation

Apply a workaround by splitting long subagent work into multiple short Task() calls until a timeout option is implemented for the Task() function. This will allow the orchestrator to detect and handle potential issues, preventing indefinite hangs and token burn.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [BUG] Task() tool has no timeout — subagent hang leaves orchestrator stuck indefinitely on Windows (no MCP servers required) (refs #28126) [4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workarounds (for others hitting this today)

Code Example

Summary

Concrete incident (2026-04-16, Windows 11)

What this rules out

Root cause (hypothesis)

Impact

Ask

Workarounds (for others hitting this today)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Task() tool has no timeout — subagent hang leaves orchestrator stuck indefinitely on Windows (no MCP servers required) (refs #28126) [4 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workarounds (for others hitting this today)

Code Example

Summary

Concrete incident (2026-04-16, Windows 11)

What this rules out

Root cause (hypothesis)

Impact

Ask

Workarounds (for others hitting this today)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING