openclaw - 💡(How to fix) Fix [Bug] Native-binary stdio MCP servers respawned per agent run, accumulate before lazy reap

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Stdio MCP servers defined via mcp.servers.<name>.command (native binary, not Docker-wrapped or npm-wrapped) are spawned fresh on each agent-run rather than pooled across runs for a given agent. They are eventually reaped, but the dispose lag means multiple idle instances coexist whenever sessions are spawned faster than the dispose cadence.

Not a hard leak — load eventually drains — but it's wasted spawn overhead and a transient memory/PID hit. Reporting in case this is unintentional vs. by-design per-session-isolation.

Root Cause

Stdio MCP servers defined via mcp.servers.<name>.command (native binary, not Docker-wrapped or npm-wrapped) are spawned fresh on each agent-run rather than pooled across runs for a given agent. They are eventually reaped, but the dispose lag means multiple idle instances coexist whenever sessions are spawned faster than the dispose cadence.

Not a hard leak — load eventually drains — but it's wasted spawn overhead and a transient memory/PID hit. Reporting in case this is unintentional vs. by-design per-session-isolation.

Fix Action

Fix / Workaround

Spawn timing correlates with new agent-run dispatches (every ~4-5 min during active use). When session activity quiets, the population drains; 10 min later only one process remains.

Code Example

PID    PPID     ELAPSED CMD
      7       1    19:03:17 node dist/index.js gateway --bind lan --port 18789
  33304       7       08:42 /usr/local/bin/dialpad-messaging
  33424       7       04:47 /usr/local/bin/dialpad-messaging
  33576       7       00:32 /usr/local/bin/dialpad-messaging
RAW_BUFFERClick to expand / collapse

Summary

Stdio MCP servers defined via mcp.servers.<name>.command (native binary, not Docker-wrapped or npm-wrapped) are spawned fresh on each agent-run rather than pooled across runs for a given agent. They are eventually reaped, but the dispose lag means multiple idle instances coexist whenever sessions are spawned faster than the dispose cadence.

Not a hard leak — load eventually drains — but it's wasted spawn overhead and a transient memory/PID hit. Reporting in case this is unintentional vs. by-design per-session-isolation.

Environment

  • OpenClaw 2026.5.12-beta.1 source / openclaw-2026.4.27-f53b52ad6d21 runtime
  • Node v24.14.0, Linux 6.8.0-106 x86_64, Docker compose
  • MCP server: dialpad-messaging configured as {"command": "/usr/local/bin/dialpad-messaging"} (native ELF binary, no wrapper)

Observation

Snapshot of ps -eo pid,ppid,pcpu,pmem,etime,cmd inside the gateway container, taken with three concurrent agent runs in the previous ~10 min:

    PID    PPID     ELAPSED CMD
      7       1    19:03:17 node dist/index.js gateway --bind lan --port 18789
  33304       7       08:42 /usr/local/bin/dialpad-messaging
  33424       7       04:47 /usr/local/bin/dialpad-messaging
  33576       7       00:32 /usr/local/bin/dialpad-messaging

Each is PPID=7 (the node gateway), state Ssl (sleeping session leader, multi-threaded), CPU 0%. Each instance: ~9 MB RSS, ~280 MB virtual.

Spawn timing correlates with new agent-run dispatches (every ~4-5 min during active use). When session activity quiets, the population drains; 10 min later only one process remains.

Why this is suboptimal

  1. Spawn cost per run: native binary cold-start, MCP handshake — paid on every run instead of once.
  2. Memory cost: N × (~9 MB RSS, ~280 MB virtual) during peak overlap.
  3. Race window with dispose path: a new spawn can interleave with an in-flight killProcessTree on a prior instance.

Related issues

  • #75323 — Docker-wrapped MCP leak (different trigger; npm wrapper -> Docker container; dispose signal path differs from native binary case)
  • #80665 — Hermes-MCP fleet respawned per Hermes-as-tool invocation (Hermes-specific; similar pattern at a different scope)
  • #70808 (closed) — Gateway never disposes stdio MCP runtimes on session end (this one DID dispose, just lazily; might be a regression of the same lifecycle code path)

Question for maintainers

Is per-run respawn intended (isolation rationale) or an artifact of dispose-on-session-end ordering? If intentional, ignore; if not, expected fix is to pool stdio MCP runtimes by (agentId, serverName) and lifecycle them with the agent rather than the run.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug] Native-binary stdio MCP servers respawned per agent run, accumulate before lazy reap