openclaw - 💡(How to fix) Fix openclaw agent / openclaw infer CLI processes don't exit; MCP stdio server children orphaned and accumulating (~66 MB each) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71457Fetched 2026-04-26 05:12:32
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
0
Participants
Timeline (top)
referenced ×2closed ×1commented ×1cross-referenced ×1

On a host where the OpenClaw gateway runs continuously and openclaw agent is invoked many times (in our case, once per inbound email via a third-party Microsoft Graph bridge), MCP stdio server children spawned for the local-bridge MCP integration are never reaped. They accumulate at roughly 66 MB RSS each with no upper bound until the gateway is restarted.

The MCP server itself is a textbook stdio implementation using @modelcontextprotocol/sdk@^1.12.0:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
const server = new McpServer({ name: 'local-bridge-mcp', version: '0.1.0' });
// ... server.registerTool(...) ×N ...
const transport = new StdioServerTransport();
await server.connect(transport);

The same server.js works correctly under Claude Code and other MCP hosts — the children exit cleanly when the host closes our stdin or sends a shutdown request, and Node exits naturally because nothing is left holding the event loop. So the MCP server code is not the issue.

Error Message

console.error([mcp] max lifetime ${MAX_LIFETIME_MS}ms reached, exiting); catch { console.error('[mcp] parent gone, exiting'); process.exit(0); }

Root Cause

For an agent that runs continuously (e.g., embedded in any kind of always-on automation — email, calendar, cron-like flows), the leak is unbounded. At ~66 MB per turn and a steady traffic of even tens of turns per day, the host runs out of RAM in a week or two without external intervention. The workaround above is acceptable belt-and-suspenders, but the right fix has to be on the host (OpenClaw) side: either send shutdown per the MCP spec, or close stdio and reap the child after each turn.

Fix Action

Fix / Workaround

Workaround we deployed (defensive, not a fix)

We added a hard max-lifetime + parent-watch in our MCP server to cap the leak:

Why this matters

For an agent that runs continuously (e.g., embedded in any kind of always-on automation — email, calendar, cron-like flows), the leak is unbounded. At ~66 MB per turn and a steady traffic of even tens of turns per day, the host runs out of RAM in a week or two without external intervention. The workaround above is acceptable belt-and-suspenders, but the right fix has to be on the host (OpenClaw) side: either send shutdown per the MCP spec, or close stdio and reap the child after each turn.

Code Example

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
const server = new McpServer({ name: 'local-bridge-mcp', version: '0.1.0' });
// ... server.registerTool(...) ×N ...
const transport = new StdioServerTransport();
await server.connect(transport);

---

$ ps -A -o pid,ppid,etime,command | grep -E 'openclaw-(agent|infer|gateway)'
10828 10826   58:54 openclaw-infer
11260 11257   39:09 openclaw-agent
11463     1   28:58 openclaw-gateway

---

$ lsof -p <leaked-mcp-pid>
node    12719 &lt;user&gt;   4     PIPE 0xc6b48af03ed5df52     16384  ->0x799250810c02e91f
node    12719 &lt;user&gt;   5     PIPE 0x799250810c02e91f     16384  ->0xc6b48af03ed5df52

---

$ ps -A -o pid,rss,etime,command | grep '[m]cp-server/server.js' \
    | awk '{rss+=$2; n++} END {printf "%d processes, total RSS: %.1f MB\n", n, rss/1024}'
17 processes, total RSS: 1130.2 MB

---

const MAX_LIFETIME_MS = parseInt(process.env.MCP_MAX_LIFETIME_MS || '600000', 10);
setTimeout(() => {
  console.error(`[mcp] max lifetime ${MAX_LIFETIME_MS}ms reached, exiting`);
  process.exit(0);
}, MAX_LIFETIME_MS).unref();

setInterval(() => {
  try { process.kill(process.ppid, 0); }
  catch { console.error('[mcp] parent gone, exiting'); process.exit(0); }
}, 30_000).unref();
RAW_BUFFERClick to expand / collapse

Summary

On a host where the OpenClaw gateway runs continuously and openclaw agent is invoked many times (in our case, once per inbound email via a third-party Microsoft Graph bridge), MCP stdio server children spawned for the local-bridge MCP integration are never reaped. They accumulate at roughly 66 MB RSS each with no upper bound until the gateway is restarted.

The MCP server itself is a textbook stdio implementation using @modelcontextprotocol/sdk@^1.12.0:

import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
const server = new McpServer({ name: 'local-bridge-mcp', version: '0.1.0' });
// ... server.registerTool(...) ×N ...
const transport = new StdioServerTransport();
await server.connect(transport);

The same server.js works correctly under Claude Code and other MCP hosts — the children exit cleanly when the host closes our stdin or sends a shutdown request, and Node exits naturally because nothing is left holding the event loop. So the MCP server code is not the issue.

Root cause (hypothesis)

The leak is an OpenClaw-side lifecycle bug, not an MCP-protocol or SDK-side bug. The diagnostic that points there:

$ ps -A -o pid,ppid,etime,command | grep -E 'openclaw-(agent|infer|gateway)'
10828 10826   58:54 openclaw-infer
11260 11257   39:09 openclaw-agent
11463     1   28:58 openclaw-gateway

All three of these are CLI invocations (or a long-running daemon) that should have exited in seconds for a single agent turn — but they're staying alive for many minutes after the work completes. Because they don't exit, they don't close the stdio pipes to their MCP server children. The children sit in await server.connect(transport) forever, exactly as the MCP SDK design says they should.

lsof on a leaked MCP child confirms its stdin pipe is still actively connected to the parent's writing end:

$ lsof -p <leaked-mcp-pid>
node    12719 &lt;user&gt;   4     PIPE 0xc6b48af03ed5df52     16384  ->0x799250810c02e91f
node    12719 &lt;user&gt;   5     PIPE 0x799250810c02e91f     16384  ->0xc6b48af03ed5df52

Reproduction

  1. OpenClaw 2026.4.23 (a979721) installed and running as ai.openclaw.gateway LaunchAgent on macOS (Apple Silicon).
  2. An MCP server registered in ~/.openclaw/openclaw.json under mcp.servers.local-bridge using stdio transport. The server's only behavior is to register a few tools that httpJson proxy to a localhost HTTP service.
  3. Repeatedly invoke openclaw agent --message ... --json --timeout 60 (we do this from a third-party bridge, but a shell loop reproduces).
  4. Observe that:
    • Every openclaw agent invocation lingers as a openclaw-agent process for many minutes after returning.
    • Each invocation also leaves behind one node .../mcp-server/server.js child of either the gateway or the agent process.
    • ps | grep mcp-server | wc -l grows monotonically.
    • Total RSS climbs by ~66 MB per turn.

Observed scale

On a host that processes ~17 inbound emails since last gateway restart:

$ ps -A -o pid,rss,etime,command | grep '[m]cp-server/server.js' \
    | awk '{rss+=$2; n++} END {printf "%d processes, total RSS: %.1f MB\n", n, rss/1024}'
17 processes, total RSS: 1130.2 MB

15 of the 17 children are direct children of the gateway daemon; the other 2 are children of the abandoned openclaw-agent/openclaw-infer parents shown above.

Expected behavior

After an agent turn completes:

  1. Any one-shot openclaw agent / openclaw infer CLI invocation exits, returning its result.
  2. Any spawned MCP server children have their stdin closed (or receive a shutdown request), see EOF, and exit naturally.
  3. The parent reaps the child via wait() so it doesn't become a zombie.
  4. Long-running openclaw gateway daemon does the same per-turn — spawn, use, shutdown, reap.

This is what other MCP hosts (Claude Code, Cursor, etc.) do with the same server.js, and it's the contract the MCP SDK assumes.

Workaround we deployed (defensive, not a fix)

We added a hard max-lifetime + parent-watch in our MCP server to cap the leak:

const MAX_LIFETIME_MS = parseInt(process.env.MCP_MAX_LIFETIME_MS || '600000', 10);
setTimeout(() => {
  console.error(`[mcp] max lifetime ${MAX_LIFETIME_MS}ms reached, exiting`);
  process.exit(0);
}, MAX_LIFETIME_MS).unref();

setInterval(() => {
  try { process.kill(process.ppid, 0); }
  catch { console.error('[mcp] parent gone, exiting'); process.exit(0); }
}, 30_000).unref();

.unref() on both timers so they don't keep the event loop alive on their own. This bounds the leak per child but doesn't address the underlying lifecycle issue in OpenClaw — that has to be fixed upstream.

Environment

  • OpenClaw 2026.4.23 (a979721) installed via npm i -g openclaw
  • macOS (Apple Silicon Mac mini)
  • Node.js >=20
  • @modelcontextprotocol/sdk@^1.12.0
  • Gateway running as ai.openclaw.gateway LaunchAgent on port 18789

Why this matters

For an agent that runs continuously (e.g., embedded in any kind of always-on automation — email, calendar, cron-like flows), the leak is unbounded. At ~66 MB per turn and a steady traffic of even tens of turns per day, the host runs out of RAM in a week or two without external intervention. The workaround above is acceptable belt-and-suspenders, but the right fix has to be on the host (OpenClaw) side: either send shutdown per the MCP spec, or close stdio and reap the child after each turn.

extent analysis

TL;DR

The most likely fix is to modify the OpenClaw gateway to properly close the stdio pipes to the MCP server children after each agent turn, allowing them to exit cleanly.

Guidance

  1. Verify the hypothesis: Confirm that the issue is indeed caused by the OpenClaw gateway not closing the stdio pipes to the MCP server children.
  2. Implement a fix: Modify the OpenClaw gateway to send a shutdown request to the MCP server children after each agent turn, or close the stdio pipes and reap the child processes.
  3. Test the fix: Verify that the MCP server children are properly exiting after each agent turn, and that the memory leak is resolved.
  4. Consider a temporary workaround: If a fix is not immediately available, consider implementing a temporary workaround, such as the one described in the issue, to bound the memory leak.

Example

No code example is provided, as the fix requires modifications to the OpenClaw gateway code, which is not included in the issue.

Notes

The issue is specific to the OpenClaw gateway and its interaction with the MCP server children. The provided workaround is a defensive measure to bound the memory leak, but a proper fix is required to resolve the issue.

Recommendation

Apply a workaround, such as the one described in the issue, to bound the memory leak until a proper fix is available. This will prevent the host from running out of RAM due to the unbounded memory leak.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After an agent turn completes:

  1. Any one-shot openclaw agent / openclaw infer CLI invocation exits, returning its result.
  2. Any spawned MCP server children have their stdin closed (or receive a shutdown request), see EOF, and exit naturally.
  3. The parent reaps the child via wait() so it doesn't become a zombie.
  4. Long-running openclaw gateway daemon does the same per-turn — spawn, use, shutdown, reap.

This is what other MCP hosts (Claude Code, Cursor, etc.) do with the same server.js, and it's the contract the MCP SDK assumes.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING