openclaw - 💡(How to fix) Fix Bug: MCP child processes survive Gateway restart, accumulating zombie processes

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When the Gateway restarts, MCP server child processes spawned via bundle-mcp-runtime are not killed. Each restart creates a new set of MCP processes while the old ones become orphans. This accumulates across restarts and causes SSH connection exhaustion, session lock contention, and Gateway slowdowns.

Root Cause

Root Cause (3 factors)

Code Example

const child = spawn(preparedSpawn.command, preparedSpawn.args, {
    detached: process.platform !== "win32",
    ...
});

---

async function disposeSession(session) {
    session.detachStderr?.();
    if (session.transportType === "streamable-http") 
        await session.transport.terminateSession().catch(() => {});
    await session.transport.close().catch(() => {});   // closes pipe, doesn't kill process
    await session.client.close().catch(() => {});       // closes protocol client
    // ❌ killProcessTree(pid) is never called here
}

---

async function disposeSession(session) {
    session.detachStderr?.();
    if (session.transportType === "streamable-http") 
        await session.transport.terminateSession().catch(() => {});
    await session.transport.close().catch(() => {});
    await session.client.close().catch(() => {});
    // ✅ Kill the child process tree
    if (session.process?.pid) {
        killProcessTree(session.process.pid);
    }
}
RAW_BUFFERClick to expand / collapse

Bug: MCP child processes survive Gateway restart, accumulating zombies

Summary

When the Gateway restarts, MCP server child processes spawned via bundle-mcp-runtime are not killed. Each restart creates a new set of MCP processes while the old ones become orphans. This accumulates across restarts and causes SSH connection exhaustion, session lock contention, and Gateway slowdowns.

Root Cause (3 factors)

1. detached: true on macOS/Linux

In pi-bundle-mcp-runtime, MCP child processes are spawned with detached: process.platform !== "win32":

const child = spawn(preparedSpawn.command, preparedSpawn.args, {
    detached: process.platform !== "win32",
    ...
});

This puts the child in its own process group, so it survives parent exit.

2. disposeSession() does not call killProcessTree

The killProcessTree utility exists in the codebase (kill-tree-C3duXFqv.js) but is NOT called during normal shutdown. The disposeSession() function only closes transport/client abstractions:

async function disposeSession(session) {
    session.detachStderr?.();
    if (session.transportType === "streamable-http") 
        await session.transport.terminateSession().catch(() => {});
    await session.transport.close().catch(() => {});   // closes pipe, doesn't kill process
    await session.client.close().catch(() => {});       // closes protocol client
    // ❌ killProcessTree(pid) is never called here
}

3. SSH-based MCP servers leak remote processes

For MCP servers configured via ssh (e.g., ssh jet@host node mcp-server.mjs), even if the local ssh client is killed, the remote Node.js process on the target machine survives because Node.js ignores SIGPIPE on broken stdio.

Steps to Reproduce

  1. Configure 1+ MCP servers in mcp.servers (especially SSH-based ones)
  2. Start the Gateway
  3. Observe MCP child processes: ps aux | grep mcp
  4. Restart the Gateway
  5. Observe: old MCP processes remain, new ones are spawned alongside them
  6. Repeat restart → linear accumulation of zombie MCP processes

Impact

  • After 4 restarts, 20+ zombie MCP processes (5 platforms × 4 restarts)
  • SSH connections to remote hosts exhausted
  • Session lock contention from orphan processes writing to session files
  • sessions_list and gateway status timeouts
  • Gateway CPU ↑65%, memory ↑800MB

Suggested Fix

Add killProcessTree(child.pid) to disposeSession() after transport.close():

async function disposeSession(session) {
    session.detachStderr?.();
    if (session.transportType === "streamable-http") 
        await session.transport.terminateSession().catch(() => {});
    await session.transport.close().catch(() => {});
    await session.client.close().catch(() => {});
    // ✅ Kill the child process tree
    if (session.process?.pid) {
        killProcessTree(session.process.pid);
    }
}

The killProcessTree function already exists in kill-tree-C3duXFqv.js and handles graceful SIGTERM → SIGKILL with a configurable grace period. It works correctly with detached processes on Unix (uses process.kill(-pid, "SIGTERM") to target the process group).

For SSH-based MCP servers, additionally consider SSH -t (force PTY allocation) or a remote-side watchdog so the remote process exits when the SSH connection drops.

Environment

  • OpenClaw version: v2026.5.22 (a374c3a)
  • OS: macOS 12.6 (Darwin 21.6.0) + Ubuntu 22.04 remote
  • Node.js: v22.22.0
  • MCP config: 5 servers (3 SSH to Ubuntu, 2 local Node.js)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING