openclaw - 💡(How to fix) Fix MCP servers not cleaned up on agent restart — causes OOM [1 participants]

jarvisclaudenelson · 2026-04-02T22:54:48Z

[openclaw] Bug When the health monitor restarts a Discord agent due to stale-socket , the gateway spawns new MCP server processes e.g. helius-mcp without killi… ## Workaround - Removed duplicate agent-level `.mcp.json` (was defined in both `openclaw.json` and `agents/main/agent/.mcp.json`) - Created `scripts/mcp-watchdog.sh` cron (every 15 min) to kill orphaned processes when count exceeds threshold ## Bug When the health monitor restarts a Discord agent due to `stale-socket`, the gateway spawns new MCP server processes (e.g. `helius-mcp`) without killing the previous instances. Over time, orphaned processes accumulate and consume all available RAM until the Linux OOM killer terminates the gateway. ## Evidence (2026-04-02) - **21 orphaned `helius-mcp` processes** found consuming ~1 GB on a 3.8 GB server - Gateway was OOM-killed at 20:24 UTC - Health monitor restarts occurred every ~5-15 min (`stale-socket`), each spawning new MCP processes - `commands.log` shows frequent session resets ## Root Cause MCP process lifecycle is not managed during agent restarts. The gateway spawns child processes for MCP servers but does not track or terminate them when the parent agent is restarted. ## Workaround - Removed duplicate agent-level `.mcp.json` (was defined in both `openclaw.json` and `agents/main/agent/.mcp.json`) - Created `scripts/mcp-watchdog.sh` cron (every 15 min) to kill orphaned processes when count exceeds threshold ## Expected Behavior When an agent restarts, the gateway should: 1. Terminate all MCP server processes associated with that agent 2. Spawn fresh MCP servers for the restarted agent 3. Track MCP child process PIDs for cleanup on shutdown/restart ## Environment - Server: 3.8 GB RAM, no swap - Gateway: openclaw-gateway systemd service - MCP server: `npx helius-mcp@latest`

openclaw2026-04-02 22:54:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#59941•Fetched 2026-04-08 02:38:32

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jarvisclaudenelson

Participants

jarvisclaudenelson

Timeline (top)

closed ×1locked ×1

Root Cause

MCP process lifecycle is not managed during agent restarts. The gateway spawns child processes for MCP servers but does not track or terminate them when the parent agent is restarted.

Fix Action

Workaround

Removed duplicate agent-level .mcp.json (was defined in both openclaw.json and agents/main/agent/.mcp.json)
Created scripts/mcp-watchdog.sh cron (every 15 min) to kill orphaned processes when count exceeds threshold

RAW_BUFFERClick to expand / collapse

Bug

When the health monitor restarts a Discord agent due to stale-socket, the gateway spawns new MCP server processes (e.g. helius-mcp) without killing the previous instances. Over time, orphaned processes accumulate and consume all available RAM until the Linux OOM killer terminates the gateway.

Evidence (2026-04-02)

21 orphaned helius-mcp processes found consuming ~1 GB on a 3.8 GB server
Gateway was OOM-killed at 20:24 UTC
Health monitor restarts occurred every ~5-15 min (stale-socket), each spawning new MCP processes
commands.log shows frequent session resets

Root Cause

MCP process lifecycle is not managed during agent restarts. The gateway spawns child processes for MCP servers but does not track or terminate them when the parent agent is restarted.

Workaround

Removed duplicate agent-level .mcp.json (was defined in both openclaw.json and agents/main/agent/.mcp.json)
Created scripts/mcp-watchdog.sh cron (every 15 min) to kill orphaned processes when count exceeds threshold

Expected Behavior

When an agent restarts, the gateway should:

Terminate all MCP server processes associated with that agent
Spawn fresh MCP servers for the restarted agent
Track MCP child process PIDs for cleanup on shutdown/restart

Environment

Server: 3.8 GB RAM, no swap
Gateway: openclaw-gateway systemd service
MCP server: npx helius-mcp@latest

extent analysis

TL;DR

Implement a mechanism to track and terminate MCP server processes when an agent restarts to prevent orphaned processes from accumulating.

Guidance

Modify the gateway to track the PIDs of spawned MCP server processes, allowing for proper termination during agent restarts.
Implement a process management system to ensure that MCP server processes are cleaned up when the parent agent is restarted or shut down.
Consider integrating the mcp-watchdog.sh script into the gateway's restart process to automatically kill orphaned processes.
Review the agent restart logic to ensure that it properly terminates all associated MCP server processes before spawning new ones.

Example

# Example of how to track PIDs in a bash script
pid=$(npx helius-mcp@latest & echo $!)
# Store the PID for later use
echo $pid > mcp_pid.txt
# Later, when restarting the agent
kill $(cat mcp_pid.txt)

Notes

The provided workaround using mcp-watchdog.sh may not be a permanent solution, as it relies on a cron job to periodically clean up orphaned processes. A more robust solution would involve integrating process management into the gateway itself.

Recommendation

Apply a workaround by integrating the mcp-watchdog.sh script into the gateway's restart process, as this can help mitigate the issue until a more permanent solution is implemented. This is because the script has already shown to be effective in killing orphaned processes, and integrating it into the restart process can help prevent the accumulation of these processes.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#memory management #API rate limit #retriever error #indexing error #inference speed

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix MCP servers not cleaned up on agent restart — causes OOM [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Bug

Evidence (2026-04-02)

Root Cause

Workaround

Expected Behavior

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix MCP servers not cleaned up on agent restart — causes OOM [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Bug

Evidence (2026-04-02)

Root Cause

Workaround

Expected Behavior

Environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING