claude-code - 💡(How to fix) Fix Tool dispatch stalls silently in 2.1.121–2.1.123: tool_use emitted, no tool_result, no disk side-effects, no errors [7 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#54847Fetched 2026-04-30 06:34:15
View on GitHub
Comments
7
Participants
4
Timeline
26
Reactions
0
Author
Timeline (top)
commented ×7cross-referenced ×5mentioned ×5subscribed ×5

In claude CLI versions 2.1.121, 2.1.122, and 2.1.123 (Claude Code), local tool calls (Write, Bash, Edit, Read) intermittently stall after the model emits a tool_use block. The tool handler is never invoked: no disk I/O occurs, no tool_result is emitted, no permission prompt fires, and no error is logged on the CLI side. The stall persists indefinitely from the CLI's perspective; only an external watchdog can recover the subprocess.

This appears to be a regression. We have not observed the stall on 2.1.120 or earlier in the same workload.

Error Message

In claude CLI versions 2.1.121, 2.1.122, and 2.1.123 (Claude Code), local tool calls (Write, Bash, Edit, Read) intermittently stall after the model emits a tool_use block. The tool handler is never invoked: no disk I/O occurs, no tool_result is emitted, no permission prompt fires, and no error is logged on the CLI side. The stall persists indefinitely from the CLI's perspective; only an external watchdog can recover the subprocess. There is no accompanying error on stderr, no permission-prompt-tool error, and no Tool permission stream closed before response received message. The CLI gives no observable signal that anything has gone wrong; from the assistant's perspective the call is simply pending forever. We previously observed a different hang class on 2.1.120 where the --permission-prompt-tool stdio stream broke and subsequent tool calls returned is_error: true with body Tool permission stream closed before response received. That class appears patched on 2.1.121+ — zero occurrences of that error string in any session jsonl or gateway log since 2026-04-27. The current dispatch stall is silent (no error string, no perm-stream traffic at all), affects the same tools, and has comparable wall-clock impact.

Root Cause

In claude CLI versions 2.1.121, 2.1.122, and 2.1.123 (Claude Code), local tool calls (Write, Bash, Edit, Read) intermittently stall after the model emits a tool_use block. The tool handler is never invoked: no disk I/O occurs, no tool_result is emitted, no permission prompt fires, and no error is logged on the CLI side. The stall persists indefinitely from the CLI's perspective; only an external watchdog can recover the subprocess.

This appears to be a regression. We have not observed the stall on 2.1.120 or earlier in the same workload.

Fix Action

Fix / Workaround

Tool dispatch stalls silently in 2.1.121–2.1.123: tool_use emitted, no tool_result, no disk side-effects, no errors

We previously observed a different hang class on 2.1.120 where the --permission-prompt-tool stdio stream broke and subsequent tool calls returned is_error: true with body Tool permission stream closed before response received. That class appears patched on 2.1.121+ — zero occurrences of that error string in any session jsonl or gateway log since 2026-04-27. The current dispatch stall is silent (no error string, no perm-stream traffic at all), affects the same tools, and has comparable wall-clock impact.

We're not claiming causation, but the timing is suggestive: the perm-stream class disappears in 2.1.121 and the dispatch-stall class appears the same week. May or may not be related changes.

RAW_BUFFERClick to expand / collapse

Tool dispatch stalls silently in 2.1.121–2.1.123: tool_use emitted, no tool_result, no disk side-effects, no errors

Summary

In claude CLI versions 2.1.121, 2.1.122, and 2.1.123 (Claude Code), local tool calls (Write, Bash, Edit, Read) intermittently stall after the model emits a tool_use block. The tool handler is never invoked: no disk I/O occurs, no tool_result is emitted, no permission prompt fires, and no error is logged on the CLI side. The stall persists indefinitely from the CLI's perspective; only an external watchdog can recover the subprocess.

This appears to be a regression. We have not observed the stall on 2.1.120 or earlier in the same workload.

Environment

  • OS: macOS 25.4.0 (Darwin)
  • Node: (default; can supply on request)
  • Install path: /opt/homebrew/bin/claude@anthropic-ai/claude-code
  • Versions where observed: 2.1.121, 2.1.122, 2.1.123
  • Last known good: 2.1.120
  • Invocation: claude --print --permission-prompt-tool stdio (subprocess invoked from a parent harness that responds to permission prompts on the CLI's stdio channel)

Signature (how to detect the bug from logs)

Three coincident facts in a single CLI subprocess turn:

  1. The session jsonl at ~/.claude/projects/<project>/<sessionId>.jsonl contains an assistant message with a tool_use block ({"type":"tool_use","id":"toolu_…","name":"Write",…}).
  2. No matching tool_result entry for that tool_use_id appears anywhere later in the same jsonl.
  3. The intended side-effect of the tool did not happen — e.g. for Write, the target file_path does not exist on disk (or exists only with content from an unrelated later attempt).

There is no accompanying error on stderr, no permission-prompt-tool error, and no Tool permission stream closed before response received message. The CLI gives no observable signal that anything has gone wrong; from the assistant's perspective the call is simply pending forever.

Empirical incidence

In a single host running real interactive workloads:

DateDistinct stalled tool-uses
2026-04-26 (CLI 2.1.120)0 (different bug class observed — see "Distinguishing from prior bug" below)
2026-04-27 (CLI 2.1.120 → 2.1.121)0
2026-04-28 (CLI 2.1.121 → 2.1.122 → 2.1.123)1
2026-04-29 (CLI 2.1.123)11

Tool distribution across the 12 events: Write ×6, Bash ×3, Edit ×2, Read ×1. No correlation with input size or content type — Read ×1 was a small local file; Write ×6 ranged from short text to multi-KB content. No correlation with concurrent MCP load that we can see.

Reproducer status

We have not been able to reproduce the stall with a minimal claude --print "Write hello to /tmp/x"-style invocation. Out of two minimal-repro attempts on 2026-04-29, both completed cleanly in ~9s. The stall appears to require some combination of: a real model turn with prior context, MCP servers loaded, multiple back-to-back tool calls, or model latency above some threshold. We have not isolated the trigger.

If your team can reproduce with a constructed prompt that emits multiple tool calls under MCP load, that would likely be faster than my isolation effort. Happy to provide:

  • Sanitized session jsonl excerpts containing the orphan tool_use blocks and the surrounding turn structure
  • Full gateway-side log of all 12 stalled events with timestamps and tool IDs
  • Our MCP server config (raven, linear, openclaw, framer, outseta)

Distinguishing from the prior perm-stream bug

We previously observed a different hang class on 2.1.120 where the --permission-prompt-tool stdio stream broke and subsequent tool calls returned is_error: true with body Tool permission stream closed before response received. That class appears patched on 2.1.121+ — zero occurrences of that error string in any session jsonl or gateway log since 2026-04-27. The current dispatch stall is silent (no error string, no perm-stream traffic at all), affects the same tools, and has comparable wall-clock impact.

We're not claiming causation, but the timing is suggestive: the perm-stream class disappears in 2.1.121 and the dispatch-stall class appears the same week. May or may not be related changes.

Workaround in place

External watchdog: parent process tracks tool_use start times and aborts the CLI subprocess when any tool exceeds a configurable max age (default 5 min, currently tuned to 10 min general / 240 s for "fast" tools like Read). On abort, the next user message starts a fresh CLI subprocess and recovers cleanly.

This bounds the loss but doesn't fix the bug. Each incident still costs:

  • 1 full conversational turn discarded
  • ~5 min wall-clock wait
  • Whatever the tool was supposed to do (writes lost, edits lost)
  • Partial-output orphan in the assistant stream

At 11 incidents on 2026-04-29 alone, that's ~55 min/day of wasted runtime and ~11 user-visible dropped tool calls per day on this one host.

Asks

  1. Triage: confirm whether anyone else is seeing this on 2.1.121+ and whether you have an internal repro.
  2. If actionable: point me at a code path or a diagnostic flag I can flip to surface the stall on the CLI side. I'd much rather catch it inside the CLI than from a parent watchdog.
  3. If a fix lands: I'll re-run the same watchdog-instrumented workload and report incidence at the new version. Trivial to verify.

Happy to provide additional artifacts. Thanks for the work on Claude Code — it has been excellent overall, this is a noisy edge.


Reporter context: this issue is filed by a power user running Claude Code as a subprocess of a custom harness. All tool_use IDs cited above are from real internal sessions; full logs available on request via private channel if helpful.

extent analysis

TL;DR

Downgrade to version 2.1.120 or implement a more robust watchdog mechanism to mitigate the tool dispatch stall issue in Claude CLI versions 2.1.121-2.1.123.

Guidance

  1. Verify the issue: Check the session jsonl files for tool_use blocks without corresponding tool_result entries and confirm that the intended tool side-effects did not occur.
  2. Check for error patterns: Although no errors are logged on the CLI side, review the gateway logs for any patterns or anomalies that might be related to the stall.
  3. Test with minimal invocations: Attempt to reproduce the issue with simple claude --print "Write hello to /tmp/x"-style invocations to see if the stall can be isolated to specific conditions.
  4. Collaborate on reproduction: Work with the development team to create a reproducible test case that emits multiple tool calls under MCP load, which may help identify the root cause.

Example

No code snippet is provided as the issue does not imply a specific code fix but rather a version or configuration issue.

Notes

The stall appears to be a regression introduced in version 2.1.121, and downgrading to 2.1.120 may temporarily resolve the issue. The exact cause and trigger for the stall are still unknown and require further investigation.

Recommendation

Apply a workaround, such as the external watchdog mechanism already in place, to mitigate the issue until a fix is available. This approach can help bound the loss and provide a temporary solution while the root cause is being investigated.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING