claude-code - 💡(How to fix) Fix MCP STDIO subprocess reaped + respawned mid-conversation, no shutdown signal to the server

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

v6.17.2: detect orphan sibling processes from prior reconnect cycles and warn loudly. The Claude Code / Conductor MCP supervisor sometimes spawns a new server without reaping the previous one, leaving 5+ parallel pairs alive that flap the tool catalog because each one ships a different feature set.

Root Cause

I maintain a production MCP server (@respira/wordpress-mcp-server, ~180 tools, several hundred daily active users connected via Claude Desktop + Claude Code + Cursor). Customers have been reporting since at least 2026-05-14 that the MCP "stops working after 3-4 tool calls and i need to restart." Today i sat down with the cross-customer telemetry and could see the exact failure shape — filing this here because claude-code is the Anthropic-owned, public-issue MCP host i can reproducibly correlate against. The same supervisor behavior likely affects Claude Desktop too, but that has no public tracker.

Fix Action

Fix / Workaround

The last tool call before every death completes successfully with normal duration (700ms-3s, no spike). The dying tools are spread across delete_page, get_site_context, list_pages, diagnose_connection, build_page, find_builder_targets — many different tools, mostly returning small payloads. So server-side dispatch and the downstream REST roundtrip are healthy. The MCP server's response stream ends cleanly; the next user turn arrives with no MCP available; the user has to fully restart the host to recover.

I shipped a 100 KiB response-size cap in @respira/[email protected] today to reduce trigger rate. The truncation envelope tells the agent the original size + a pagination hint so it can recover by re-calling the tool with per_page / search filters. This is a workaround on our end; the underlying supervisor behavior is the actual fix.

RAW_BUFFERClick to expand / collapse

Hi Claude Code team,

I maintain a production MCP server (@respira/wordpress-mcp-server, ~180 tools, several hundred daily active users connected via Claude Desktop + Claude Code + Cursor). Customers have been reporting since at least 2026-05-14 that the MCP "stops working after 3-4 tool calls and i need to restart." Today i sat down with the cross-customer telemetry and could see the exact failure shape — filing this here because claude-code is the Anthropic-owned, public-issue MCP host i can reproducibly correlate against. The same supervisor behavior likely affects Claude Desktop too, but that has no public tracker.

Failure shape

We capture one row per tool call server-side (opt-in OTel emitter, no PII — tool name, duration, success, mcp_session_id only). Last 24 hours across the customer base:

Calls in sessionSessions ended at this countAvg span (s)
180
2226
3342
41177

Distribution drops off sharply after 4: many sessions reach 13, 17, 22, 38, 46, 62, 80 calls. So there's no hard cap. The failure is probabilistic and concentrated in the early-session window.

The last tool call before every death completes successfully with normal duration (700ms-3s, no spike). The dying tools are spread across delete_page, get_site_context, list_pages, diagnose_connection, build_page, find_builder_targets — many different tools, mostly returning small payloads. So server-side dispatch and the downstream REST roundtrip are healthy. The MCP server's response stream ends cleanly; the next user turn arrives with no MCP available; the user has to fully restart the host to recover.

Smoking gun in our own code

We added a stderr bootstrap log in our v6.17 release specifically because this was already happening. The in-code comment, verbatim:

v6.17.2: detect orphan sibling processes from prior reconnect cycles and warn loudly. The Claude Code / Conductor MCP supervisor sometimes spawns a new server without reaping the previous one, leaving 5+ parallel pairs alive that flap the tool catalog because each one ships a different feature set.

Customers running Console.app and searching for our log prefix see multiple respira-mcp ready · pid X lines per conversation, each from a different PID, confirming respawn-without-reap. Our v6.17.2 added the diagnostic; it doesn't fix the underlying cycling.

Adjacent symptom: STDIO pipe buffer interactions

Two of our tools regularly return more than the macOS default pipe buffer (64 KB): respira_list_options (~600 KB on a typical site) and respira_get_builder_info (~76 KB). When the MCP writes a multi-hundred-KB blob and the host's drain loop falls behind, the write blocks. After what appears to be ~30s of a stalled write, the host kills the subprocess instead of draining the pipe and continuing.

I shipped a 100 KiB response-size cap in @respira/[email protected] today to reduce trigger rate. The truncation envelope tells the agent the original size + a pagination hint so it can recover by re-calling the tool with per_page / search filters. This is a workaround on our end; the underlying supervisor behavior is the actual fix.

What would help, ordered by impact

  1. Reaping signal in MCP server stderr. Tell us when the host is about to kill the subprocess, with a reason code: idle timeout, pipe stall, OOM, manual restart, etc. Right now we can only infer from absence-of-traffic that the subprocess was killed.
  2. Graceful shutdown handshake. SIGTERM with a small grace window (~3s) before SIGKILL. Lets us flush in-flight telemetry and log the shutdown reason cleanly. Currently it looks like an immediate SIGKILL with no notice.
  3. Public docs on host policy. Idle timeout, pipe-stall threshold, response-size limit, reconnect strategy. We can size our response cap and tool envelope around the real limits if we know them, instead of guessing.

What i can offer

  • Raw telemetry slice for the time window if useful (no PII; tool name + duration + session id + success only).
  • Reproducible Studio setup: WordPress 7.0 + Respira plugin + Claude Desktop or Claude Code, against a local localhost:8882 install. Can hand the whole thing over.
  • The four customer bug reports that prompted this investigation, with their consent.

Thanks for the MCP spec — even with this open, the ecosystem it enabled is the reason this product exists.

— Mihai Dragomirescu [email protected] https://www.respira.press/mcp · public MCP server mirror: https://github.com/respira-press/respira-wordpress-mcp · npm: https://www.npmjs.com/package/@respira/wordpress-mcp-server

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix MCP STDIO subprocess reaped + respawned mid-conversation, no shutdown signal to the server