claude-code - 💡(How to fix) Fix Parallel Bash batch: 4/7 calls return "internal error" after 14h silent gap; CLI process never restarted

claude-code2026-05-25 07:48:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

A single round of parallel Bash tool calls returned 14 hours after issue, with 4 of 7 calls labeled "internal error" (no payload) and the other 3 returning normally. The Claude CLI process did not restart, the host never slept, and the terminal was local. No ~/.claude/logs/ directory exists on this install, so there is nothing to attach. 4. The remaining 4 calls return with payload "[Tool result missing due to internal error]" — no stdout, no stderr, no exit code surfaced. Not deterministically reproducible from this side. The trigger appears to be a wide parallel Bash batch combined with something (memory pressure? IPC stall?). Filing primarily so the failure mode — silent "internal error" payloads delivered 14h late instead of a loud timeout — is visible to maintainers.

Is the "internal error" payload a known failure mode, and what does it actually mean inside the harness?
Can the parallel tool runner surface a timeout / disconnect error instead of returning empty internal error results 14h later?

Root Cause

RAW_BUFFERClick to expand / collapse

Summary

Environment

Claude Code CLI version: 2.1.141 (path: ~/.local/share/claude/versions/2.1.141)
Model: claude-opus-4-7 (Opus 4.7, 1M context)
OS: Linux 6.14.0-1015-nvidia (Ubuntu-family), x86_64, 12 CPU, 64G RAM
Terminal: local gnome-terminal (no SSH/RDP layer)
A second concurrent claude --chrome --remote-control session was running in another terminal (separate PID, separate parent bash)

Timeline

/clear followed by /session-start-kitchen-sink skill invocation issues one batched round of 7 parallel Bash calls (heartbeat timer check, gateway+DW health curl, recent-memory listing, file size check, git status, worktree inventory, DW heartbeat history curl).
3 of 7 calls return promptly with valid output.
A <system-reminder>The date has changed. Today's date is now 2026-05-25.</system-reminder> appears inside one returned result — indicating the tool round itself spanned a date boundary.
The remaining 4 calls return with payload "[Tool result missing due to internal error]" — no stdout, no stderr, no exit code surfaced.
Wall-clock delta between sibling calls in the same parallel batch: ~13.9 hours.

Evidence ruling out common causes

No host sleep / suspend: journalctl shows zero suspend/resume entries in window. codex-phone-host.sh keepalive fired every 20 min through the gap; mop-heartbeat-pal.service fired every 10 min throughout.
No CLI process restart: ps -o lstart shows the Claude CLI PID started 2026-05-24 08:51:15 and is still running at the time of investigation.
No client disconnect: local terminal, no remote layer.
No OOM: dmesg clean, no killed processes.
Load normal during gap: sar -q shows 1.5–6 range load averages across the window (12 CPU box).
No plausible local hang in the failed calls: the 4 failed commands were test -f, git status on a local repo, ls /opt/dw-ops/worktrees/, and a curl --max-time 5 to localhost. None can legitimately take 14h.

What's missing on disk

~/.claude/logs/ does not exist — no harness logs to attach. If there's a flag to enable persistent CLI-side tool-execution logs, please document it; this is the kind of bug that's only diagnosable from that layer.

Reproduction

Not deterministically reproducible from this side. The trigger appears to be a wide parallel Bash batch combined with something (memory pressure? IPC stall?). Filing primarily so the failure mode — silent "internal error" payloads delivered 14h late instead of a loud timeout — is visible to maintainers.

Asks

Is the "internal error" payload a known failure mode, and what does it actually mean inside the harness?
Can the parallel tool runner surface a timeout / disconnect error instead of returning empty internal error results 14h later?
Is there a way to enable persistent CLI-side logs so future repros are diagnosable?

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering