claude-code - 💡(How to fix) Fix MCP STDIO subprocess reaped + respawned mid-conversation, no shutdown signal to the server

Error Message

v6.17.2: detect orphan sibling processes from prior reconnect cycles and warn loudly. The Claude Code / Conductor MCP supervisor sometimes spawns a new server without reaping the previous one, leaving 5+ parallel pairs alive that flap the tool catalog because each one ships a different feature set.

Root Cause

I maintain a production MCP server (@respira/wordpress-mcp-server, ~180 tools, several hundred daily active users connected via Claude Desktop + Claude Code + Cursor). Customers have been reporting since at least 2026-05-14 that the MCP "stops working after 3-4 tool calls and i need to restart." Today i sat down with the cross-customer telemetry and could see the exact failure shape — filing this here because claude-code is the Anthropic-owned, public-issue MCP host i can reproducibly correlate against. The same supervisor behavior likely affects Claude Desktop too, but that has no public tracker.

Fix Action

Fix / Workaround

The last tool call before every death completes successfully with normal duration (700ms-3s, no spike). The dying tools are spread across delete_page, get_site_context, list_pages, diagnose_connection, build_page, find_builder_targets — many different tools, mostly returning small payloads. So server-side dispatch and the downstream REST roundtrip are healthy. The MCP server's response stream ends cleanly; the next user turn arrives with no MCP available; the user has to fully restart the host to recover.

I shipped a 100 KiB response-size cap in @respira/[email protected] today to reduce trigger rate. The truncation envelope tells the agent the original size + a pagination hint so it can recover by re-calling the tool with per_page / search filters. This is a workaround on our end; the underlying supervisor behavior is the actual fix.

Hi Claude Code team,

Failure shape

We capture one row per tool call server-side (opt-in OTel emitter, no PII — tool name, duration, success, mcp_session_id only). Last 24 hours across the customer base:

Calls in session	Sessions ended at this count	Avg span (s)
1	8	0
2	2	26
3	3	42
4	1	177

Distribution drops off sharply after 4: many sessions reach 13, 17, 22, 38, 46, 62, 80 calls. So there's no hard cap. The failure is probabilistic and concentrated in the early-session window.

Smoking gun in our own code

We added a stderr bootstrap log in our v6.17 release specifically because this was already happening. The in-code comment, verbatim:

v6.17.2: detect orphan sibling processes from prior reconnect cycles and warn loudly. The Claude Code / Conductor MCP supervisor sometimes spawns a new server without reaping the previous one, leaving 5+ parallel pairs alive that flap the tool catalog because each one ships a different feature set.

Customers running Console.app and searching for our log prefix see multiple respira-mcp ready · pid X lines per conversation, each from a different PID, confirming respawn-without-reap. Our v6.17.2 added the diagnostic; it doesn't fix the underlying cycling.

Adjacent symptom: STDIO pipe buffer interactions

Two of our tools regularly return more than the macOS default pipe buffer (64 KB): respira_list_options (~600 KB on a typical site) and respira_get_builder_info (~76 KB). When the MCP writes a multi-hundred-KB blob and the host's drain loop falls behind, the write blocks. After what appears to be ~30s of a stalled write, the host kills the subprocess instead of draining the pipe and continuing.

What would help, ordered by impact

Reaping signal in MCP server stderr. Tell us when the host is about to kill the subprocess, with a reason code: idle timeout, pipe stall, OOM, manual restart, etc. Right now we can only infer from absence-of-traffic that the subprocess was killed.
Graceful shutdown handshake. SIGTERM with a small grace window (~3s) before SIGKILL. Lets us flush in-flight telemetry and log the shutdown reason cleanly. Currently it looks like an immediate SIGKILL with no notice.
Public docs on host policy. Idle timeout, pipe-stall threshold, response-size limit, reconnect strategy. We can size our response cap and tool envelope around the real limits if we know them, instead of guessing.

What i can offer

Raw telemetry slice for the time window if useful (no PII; tool name + duration + session id + success only).
Reproducible Studio setup: WordPress 7.0 + Respira plugin + Claude Desktop or Claude Code, against a local localhost:8882 install. Can hand the whole thing over.
The four customer bug reports that prompted this investigation, with their consent.

Thanks for the MCP spec — even with this open, the ecosystem it enabled is the reason this product exists.

— Mihai Dragomirescu [email protected] https://www.respira.press/mcp · public MCP server mirror: https://github.com/respira-press/respira-wordpress-mcp · npm: https://www.npmjs.com/package/@respira/wordpress-mcp-server

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering