openclaw - 💡(How to fix) Fix Gateway spawns duplicate MCP instances on a recurring cadence; old instances never terminated (task leak → EAGAIN) [1 participants]

openclaw2026-04-18 18:03:21

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#68660•Fetched 2026-04-19 15:09:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

powerMovesDev

Participants

powerMovesDev

The OpenClaw gateway appears to spawn a new full set of MCP server instances on a recurring cadence instead of reusing long-lived MCP children. Old instances are not cleaned up. Over hours, this accumulates many duplicate MCP trees and exhausts the gateway cgroup's TasksMax, producing EAGAIN spawn failures in both bundle-mcp and agent tools.

Error Message

[bundle-mcp] failed to start server "..." ... Error: spawn npx EAGAIN
[process/supervisor] spawn failed: runId=... reason=Error: spawn /bin/bash EAGAIN

Root Cause

Fix Action

Workaround

Setting TasksMax on the gateway service limits blast radius (we set 1200 on dev01)
Manually killing older MCP launcher trees and restarting the gateway resets the state
No known config flag to suppress the duplicate-spawn behavior

Code Example

PID    ETIME    CMD
17255   05:53:01 npm exec @modelcontextprotocol/server-github
17274   05:53:00 notebooklm-mcp
17278   05:52:59 npm exec ghl-mcp-server-casewegner
17321   05:52:57 npm exec @chinchillaenterprises/mcp-slack
17346   05:52:55 npm exec n8n-mcp

17639   05:51:11 npm exec @modelcontextprotocol/server-github   ← 2nd full set
17658   05:51:10 notebooklm-mcp
17662   05:51:08 npm exec ghl-mcp-server-casewegner
17705   05:51:07 npm exec @chinchillaenterprises/mcp-slack
17728   05:51:05 npm exec n8n-mcp

26464   04:49:57 npm exec @modelcontextprotocol/server-github   ← 3rd full set (1h later)
26485   04:49:56 notebooklm-mcp
26491   04:49:54 npm exec ghl-mcp-server-casewegner
26515   04:49:53 npm exec @chinchillaenterprises/mcp-slack
26558   04:49:51 npm exec n8n-mcp

35664   03:48:33 npm exec @modelcontextprotocol/server-github   ← 4th full set (1h later)
...

44778   02:46:43 npm exec @modelcontextprotocol/server-github   ← 5th partial set
44797   02:46:42 notebooklm-mcp
44870   02:46:37 npm exec n8n-mcp

RAW_BUFFERClick to expand / collapse

Summary

Environment

dev01 VPS (Standard_D4as_v5, 16GB RAM, Ubuntu 24.04)
OpenClaw gateway v2026.3.12 (main gateway, PID 16994 at time of capture)
5 configured MCP servers, all healthy at startup:
- ghl-mcp (npx -y ghl-mcp-server-casewegner)
- github-mcp (npx -y @modelcontextprotocol/server-github)
- n8n-docs (npx -y n8n-mcp)
- notebooklm (wrapper → python3 ~/.local/bin/notebooklm-mcp)
- slack-mcp (npx -y @chinchillaenterprises/mcp-slack)

Observed Pattern

Direct children of the gateway PID at time of capture (~5h53m after gateway start):

 PID    ETIME    CMD
17255   05:53:01 npm exec @modelcontextprotocol/server-github
17274   05:53:00 notebooklm-mcp
17278   05:52:59 npm exec ghl-mcp-server-casewegner
17321   05:52:57 npm exec @chinchillaenterprises/mcp-slack
17346   05:52:55 npm exec n8n-mcp

17639   05:51:11 npm exec @modelcontextprotocol/server-github   ← 2nd full set
17658   05:51:10 notebooklm-mcp
17662   05:51:08 npm exec ghl-mcp-server-casewegner
17705   05:51:07 npm exec @chinchillaenterprises/mcp-slack
17728   05:51:05 npm exec n8n-mcp

26464   04:49:57 npm exec @modelcontextprotocol/server-github   ← 3rd full set (1h later)
26485   04:49:56 notebooklm-mcp
26491   04:49:54 npm exec ghl-mcp-server-casewegner
26515   04:49:53 npm exec @chinchillaenterprises/mcp-slack
26558   04:49:51 npm exec n8n-mcp

35664   03:48:33 npm exec @modelcontextprotocol/server-github   ← 4th full set (1h later)
...

44778   02:46:43 npm exec @modelcontextprotocol/server-github   ← 5th partial set
44797   02:46:42 notebooklm-mcp
44870   02:46:37 npm exec n8n-mcp

New instances for each MCP appear on a recurring cadence (here roughly every 60-90 minutes, with some pairs only ~2 minutes apart)
Old instances are not terminated when new ones are spawned
Each npm exec and python process carries 11 threads
Over 6 hours this put the gateway cgroup at TasksCurrent=400/TasksMax=400
Further spawns failed with EAGAIN, visible in logs as:
- [bundle-mcp] failed to start server "..." ... Error: spawn npx EAGAIN
- [tools] exec failed: spawn /bin/bash EAGAIN (agents' own exec tool)
- [process/supervisor] spawn failed: runId=... reason=Error: spawn /bin/bash EAGAIN

Expected Behavior

Each configured MCP server should have at most one persistent child process per server managed by bundle-mcp. If it dies, bundle-mcp should restart it with backoff. It should not spawn additional instances while a healthy instance already exists.

Actual Behavior

The gateway appears to spawn a new MCP instance on some trigger (periodic restart? per-agent reconnect? idle-timeout misfire?) without killing the prior instance.

Impact

Resource leak scales linearly with uptime
Eventually causes EAGAIN spawn failures, which cascade into:
- agent tool exec failures (spawn /bin/bash EAGAIN)
- bundle-mcp startup retry storms (see related issue #68527)
- in the worst case, VM-level resource exhaustion

Workaround

Setting TasksMax on the gateway service limits blast radius (we set 1200 on dev01)
Manually killing older MCP launcher trees and restarting the gateway resets the state
No known config flag to suppress the duplicate-spawn behavior

Suggested Investigation Areas

MCP reconnect logic: is bundle-mcp re-spawning on transient disconnects (SIGPIPE, stdin EOF) without first terminating the existing child?
Idle-timeout behavior: does the supervisor time out an idle MCP and spawn a fresh one without killing the old one?
Per-agent MCP context: if each agent maintains its own MCP connection map, a new agent attaching may spawn its own copy instead of sharing
Cleanup on agent disconnect: MCP children spawned for an agent session may persist after the session ends

Companion to #68527 (MCP retry storm + need for systemd guardrails)

Filed by Atlas (Infrastructure Operations) from dev01 observations on 2026-04-18.

extent analysis

TL;DR

The OpenClaw gateway can be fixed by investigating and resolving the root cause of the recurring cadence of spawning new MCP server instances, likely related to MCP reconnect logic, idle-timeout behavior, or per-agent MCP context.

Guidance

Investigate the MCP reconnect logic to determine if bundle-mcp is re-spawning on transient disconnects without terminating the existing child.
Review the idle-timeout behavior to see if the supervisor times out an idle MCP and spawns a fresh one without killing the old one.
Examine the per-agent MCP context to check if each agent maintains its own MCP connection map, potentially causing duplicate spawns.
Verify the cleanup process on agent disconnect to ensure MCP children spawned for an agent session do not persist after the session ends.

Example

No code snippet is provided as the issue requires investigation into the underlying logic and behavior of the OpenClaw gateway and MCP servers.

Notes

The provided information suggests that the issue is related to the OpenClaw gateway's behavior, but the exact root cause is unclear. Further investigation is needed to determine the specific cause and develop a targeted solution.

Recommendation

Apply a workaround by setting TasksMax on the gateway service to limit the blast radius, and manually killing older MCP launcher trees and restarting the gateway to reset the state, while continuing to investigate the root cause.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#optimization #mixed precision #training loop #device allocation #model download

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Gateway spawns duplicate MCP instances on a recurring cadence; old instances never terminated (task leak → EAGAIN) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Summary

Environment

Observed Pattern

Expected Behavior

Actual Behavior

Impact

Workaround

Suggested Investigation Areas

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Gateway spawns duplicate MCP instances on a recurring cadence; old instances never terminated (task leak → EAGAIN) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Summary

Environment

Observed Pattern

Expected Behavior

Actual Behavior

Impact

Workaround

Suggested Investigation Areas

Related

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING