openclaw - 💡(How to fix) Fix MCP servers not cleaned up on agent restart — causes OOM [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59941Fetched 2026-04-08 02:38:32
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
closed ×1locked ×1

Root Cause

MCP process lifecycle is not managed during agent restarts. The gateway spawns child processes for MCP servers but does not track or terminate them when the parent agent is restarted.

Fix Action

Workaround

  • Removed duplicate agent-level .mcp.json (was defined in both openclaw.json and agents/main/agent/.mcp.json)
  • Created scripts/mcp-watchdog.sh cron (every 15 min) to kill orphaned processes when count exceeds threshold
RAW_BUFFERClick to expand / collapse

Bug

When the health monitor restarts a Discord agent due to stale-socket, the gateway spawns new MCP server processes (e.g. helius-mcp) without killing the previous instances. Over time, orphaned processes accumulate and consume all available RAM until the Linux OOM killer terminates the gateway.

Evidence (2026-04-02)

  • 21 orphaned helius-mcp processes found consuming ~1 GB on a 3.8 GB server
  • Gateway was OOM-killed at 20:24 UTC
  • Health monitor restarts occurred every ~5-15 min (stale-socket), each spawning new MCP processes
  • commands.log shows frequent session resets

Root Cause

MCP process lifecycle is not managed during agent restarts. The gateway spawns child processes for MCP servers but does not track or terminate them when the parent agent is restarted.

Workaround

  • Removed duplicate agent-level .mcp.json (was defined in both openclaw.json and agents/main/agent/.mcp.json)
  • Created scripts/mcp-watchdog.sh cron (every 15 min) to kill orphaned processes when count exceeds threshold

Expected Behavior

When an agent restarts, the gateway should:

  1. Terminate all MCP server processes associated with that agent
  2. Spawn fresh MCP servers for the restarted agent
  3. Track MCP child process PIDs for cleanup on shutdown/restart

Environment

  • Server: 3.8 GB RAM, no swap
  • Gateway: openclaw-gateway systemd service
  • MCP server: npx helius-mcp@latest

extent analysis

TL;DR

Implement a mechanism to track and terminate MCP server processes when an agent restarts to prevent orphaned processes from accumulating.

Guidance

  • Modify the gateway to track the PIDs of spawned MCP server processes, allowing for proper termination during agent restarts.
  • Implement a process management system to ensure that MCP server processes are cleaned up when the parent agent is restarted or shut down.
  • Consider integrating the mcp-watchdog.sh script into the gateway's restart process to automatically kill orphaned processes.
  • Review the agent restart logic to ensure that it properly terminates all associated MCP server processes before spawning new ones.

Example

# Example of how to track PIDs in a bash script
pid=$(npx helius-mcp@latest & echo $!)
# Store the PID for later use
echo $pid > mcp_pid.txt
# Later, when restarting the agent
kill $(cat mcp_pid.txt)

Notes

The provided workaround using mcp-watchdog.sh may not be a permanent solution, as it relies on a cron job to periodically clean up orphaned processes. A more robust solution would involve integrating process management into the gateway itself.

Recommendation

Apply a workaround by integrating the mcp-watchdog.sh script into the gateway's restart process, as this can help mitigate the issue until a more permanent solution is implemented. This is because the script has already shown to be effective in killing orphaned processes, and integrating it into the restart process can help prevent the accumulation of these processes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING