openclaw - 💡(How to fix) Fix Gateway spawns duplicate MCP instances on a recurring cadence; old instances never terminated (task leak → EAGAIN) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68660Fetched 2026-04-19 15:09:01
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

The OpenClaw gateway appears to spawn a new full set of MCP server instances on a recurring cadence instead of reusing long-lived MCP children. Old instances are not cleaned up. Over hours, this accumulates many duplicate MCP trees and exhausts the gateway cgroup's TasksMax, producing EAGAIN spawn failures in both bundle-mcp and agent tools.

Error Message

  • [bundle-mcp] failed to start server "..." ... Error: spawn npx EAGAIN
  • [process/supervisor] spawn failed: runId=... reason=Error: spawn /bin/bash EAGAIN

Root Cause

The OpenClaw gateway appears to spawn a new full set of MCP server instances on a recurring cadence instead of reusing long-lived MCP children. Old instances are not cleaned up. Over hours, this accumulates many duplicate MCP trees and exhausts the gateway cgroup's TasksMax, producing EAGAIN spawn failures in both bundle-mcp and agent tools.

Fix Action

Workaround

  • Setting TasksMax on the gateway service limits blast radius (we set 1200 on dev01)
  • Manually killing older MCP launcher trees and restarting the gateway resets the state
  • No known config flag to suppress the duplicate-spawn behavior

Code Example

PID    ETIME    CMD
17255   05:53:01 npm exec @modelcontextprotocol/server-github
17274   05:53:00 notebooklm-mcp
17278   05:52:59 npm exec ghl-mcp-server-casewegner
17321   05:52:57 npm exec @chinchillaenterprises/mcp-slack
17346   05:52:55 npm exec n8n-mcp

17639   05:51:11 npm exec @modelcontextprotocol/server-github   ← 2nd full set
17658   05:51:10 notebooklm-mcp
17662   05:51:08 npm exec ghl-mcp-server-casewegner
17705   05:51:07 npm exec @chinchillaenterprises/mcp-slack
17728   05:51:05 npm exec n8n-mcp

26464   04:49:57 npm exec @modelcontextprotocol/server-github   ← 3rd full set (1h later)
26485   04:49:56 notebooklm-mcp
26491   04:49:54 npm exec ghl-mcp-server-casewegner
26515   04:49:53 npm exec @chinchillaenterprises/mcp-slack
26558   04:49:51 npm exec n8n-mcp

35664   03:48:33 npm exec @modelcontextprotocol/server-github   ← 4th full set (1h later)
...

44778   02:46:43 npm exec @modelcontextprotocol/server-github   ← 5th partial set
44797   02:46:42 notebooklm-mcp
44870   02:46:37 npm exec n8n-mcp
RAW_BUFFERClick to expand / collapse

Summary

The OpenClaw gateway appears to spawn a new full set of MCP server instances on a recurring cadence instead of reusing long-lived MCP children. Old instances are not cleaned up. Over hours, this accumulates many duplicate MCP trees and exhausts the gateway cgroup's TasksMax, producing EAGAIN spawn failures in both bundle-mcp and agent tools.

Environment

  • dev01 VPS (Standard_D4as_v5, 16GB RAM, Ubuntu 24.04)
  • OpenClaw gateway v2026.3.12 (main gateway, PID 16994 at time of capture)
  • 5 configured MCP servers, all healthy at startup:
    • ghl-mcp (npx -y ghl-mcp-server-casewegner)
    • github-mcp (npx -y @modelcontextprotocol/server-github)
    • n8n-docs (npx -y n8n-mcp)
    • notebooklm (wrapper → python3 ~/.local/bin/notebooklm-mcp)
    • slack-mcp (npx -y @chinchillaenterprises/mcp-slack)

Observed Pattern

Direct children of the gateway PID at time of capture (~5h53m after gateway start):

 PID    ETIME    CMD
17255   05:53:01 npm exec @modelcontextprotocol/server-github
17274   05:53:00 notebooklm-mcp
17278   05:52:59 npm exec ghl-mcp-server-casewegner
17321   05:52:57 npm exec @chinchillaenterprises/mcp-slack
17346   05:52:55 npm exec n8n-mcp

17639   05:51:11 npm exec @modelcontextprotocol/server-github   ← 2nd full set
17658   05:51:10 notebooklm-mcp
17662   05:51:08 npm exec ghl-mcp-server-casewegner
17705   05:51:07 npm exec @chinchillaenterprises/mcp-slack
17728   05:51:05 npm exec n8n-mcp

26464   04:49:57 npm exec @modelcontextprotocol/server-github   ← 3rd full set (1h later)
26485   04:49:56 notebooklm-mcp
26491   04:49:54 npm exec ghl-mcp-server-casewegner
26515   04:49:53 npm exec @chinchillaenterprises/mcp-slack
26558   04:49:51 npm exec n8n-mcp

35664   03:48:33 npm exec @modelcontextprotocol/server-github   ← 4th full set (1h later)
...

44778   02:46:43 npm exec @modelcontextprotocol/server-github   ← 5th partial set
44797   02:46:42 notebooklm-mcp
44870   02:46:37 npm exec n8n-mcp
  • New instances for each MCP appear on a recurring cadence (here roughly every 60-90 minutes, with some pairs only ~2 minutes apart)
  • Old instances are not terminated when new ones are spawned
  • Each npm exec and python process carries 11 threads
  • Over 6 hours this put the gateway cgroup at TasksCurrent=400/TasksMax=400
  • Further spawns failed with EAGAIN, visible in logs as:
    • [bundle-mcp] failed to start server "..." ... Error: spawn npx EAGAIN
    • [tools] exec failed: spawn /bin/bash EAGAIN (agents' own exec tool)
    • [process/supervisor] spawn failed: runId=... reason=Error: spawn /bin/bash EAGAIN

Expected Behavior

Each configured MCP server should have at most one persistent child process per server managed by bundle-mcp. If it dies, bundle-mcp should restart it with backoff. It should not spawn additional instances while a healthy instance already exists.

Actual Behavior

The gateway appears to spawn a new MCP instance on some trigger (periodic restart? per-agent reconnect? idle-timeout misfire?) without killing the prior instance.

Impact

  • Resource leak scales linearly with uptime
  • Eventually causes EAGAIN spawn failures, which cascade into:
    • agent tool exec failures (spawn /bin/bash EAGAIN)
    • bundle-mcp startup retry storms (see related issue #68527)
    • in the worst case, VM-level resource exhaustion

Workaround

  • Setting TasksMax on the gateway service limits blast radius (we set 1200 on dev01)
  • Manually killing older MCP launcher trees and restarting the gateway resets the state
  • No known config flag to suppress the duplicate-spawn behavior

Suggested Investigation Areas

  1. MCP reconnect logic: is bundle-mcp re-spawning on transient disconnects (SIGPIPE, stdin EOF) without first terminating the existing child?
  2. Idle-timeout behavior: does the supervisor time out an idle MCP and spawn a fresh one without killing the old one?
  3. Per-agent MCP context: if each agent maintains its own MCP connection map, a new agent attaching may spawn its own copy instead of sharing
  4. Cleanup on agent disconnect: MCP children spawned for an agent session may persist after the session ends

Related

  • Companion to #68527 (MCP retry storm + need for systemd guardrails)

Filed by Atlas (Infrastructure Operations) from dev01 observations on 2026-04-18.

extent analysis

TL;DR

The OpenClaw gateway can be fixed by investigating and resolving the root cause of the recurring cadence of spawning new MCP server instances, likely related to MCP reconnect logic, idle-timeout behavior, or per-agent MCP context.

Guidance

  • Investigate the MCP reconnect logic to determine if bundle-mcp is re-spawning on transient disconnects without terminating the existing child.
  • Review the idle-timeout behavior to see if the supervisor times out an idle MCP and spawns a fresh one without killing the old one.
  • Examine the per-agent MCP context to check if each agent maintains its own MCP connection map, potentially causing duplicate spawns.
  • Verify the cleanup process on agent disconnect to ensure MCP children spawned for an agent session do not persist after the session ends.

Example

No code snippet is provided as the issue requires investigation into the underlying logic and behavior of the OpenClaw gateway and MCP servers.

Notes

The provided information suggests that the issue is related to the OpenClaw gateway's behavior, but the exact root cause is unclear. Further investigation is needed to determine the specific cause and develop a targeted solution.

Recommendation

Apply a workaround by setting TasksMax on the gateway service to limit the blast radius, and manually killing older MCP launcher trees and restarting the gateway to reset the state, while continuing to investigate the root cause.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway spawns duplicate MCP instances on a recurring cadence; old instances never terminated (task leak → EAGAIN) [1 participants]