openclaw - 💡(How to fix) Fix MCP stdio server processes become orphans after gateway restart [1 comments, 2 participants]

qixk · 2026-04-26T10:40:33Z

[openclaw] Describe the bug When the gateway restarts via openclaw gateway restart , MCP servers configured in mcp.servers that use stdio transport are left ru… ## Fix / Workaround **Workaround** **Describe the bug** When the gateway restarts (via `openclaw gateway restart`), MCP servers configured in `mcp.servers` that use stdio transport are left running as orphan processes. The new gateway instance forks fresh MCP server processes, but the old ones from the previous gateway instance are never cleaned up. Over a day of normal operation (with periodic gateway restarts for config changes), I accumulated **41 orphan `gbrain serve` processes**, each holding a PGLite database lock. This caused new MCP connections to time out (30s) because they couldn't acquire the database lock. **Config** ```json { "mcp": { "servers": { "gbrain": { "command": "gbrain", "args": ["serve"] } } } } ``` **Expected behavior** When the gateway shuts down or restarts: 1. All child MCP server processes should be terminated (SIGTERM → SIGKILL after timeout) 2. On startup, the gateway should detect and clean up orphan processes from a previous instance (e.g., by PID file or process group) Additionally, it would be great to support an `idleTimeoutMs` option per MCP server: ```json { "mcp": { "servers": { "gbrain": { "command": "gbrain", "args": ["serve"], "idleTimeoutMs": 300000 } } } } ``` This way, MCP servers that haven't received requests for N minutes would be automatically terminated, and re-spawned on demand. **Logs** ``` [bundle-mcp] failed to start server "gbrain" (gbrain serve): Error: MCP server connection timed out after 30000ms ``` This error repeats for every agent session trying to connect, because the PGLite database is locked by orphan processes. **Environment** - OpenClaw 2026.4.22 - macOS 15.x (arm64) - Node v24.13.1 - 14 Feishu bot accounts, each bound to a different agent - gbrain 0.16.4 as MCP server (PGLite backend) **Workaround** Cron job that kills stale `gbrain serve` processes every 5 minutes: ```bash */5 * * * * /path/to/gbrain-guardian.sh ```

openclaw2026-04-26 10:40:33

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72112•Fetched 2026-04-27 05:34:41

View on GitHub

Comments

Participants

Timeline

Reactions

Author

qixk

Participants

clawsweeper[bot]

qixk

Timeline (top)

closed ×1commented ×1

Error Message

[bundle-mcp] failed to start server "gbrain" (gbrain serve): Error: MCP server connection timed out after 30000ms

Root Cause

Over a day of normal operation (with periodic gateway restarts for config changes), I accumulated 41 orphan gbrain serve processes, each holding a PGLite database lock. This caused new MCP connections to time out (30s) because they couldn't acquire the database lock.

Fix Action

Fix / Workaround

Workaround

Code Example

{
  "mcp": {
    "servers": {
      "gbrain": {
        "command": "gbrain",
        "args": ["serve"]
      }
    }
  }
}

---

{
  "mcp": {
    "servers": {
      "gbrain": {
        "command": "gbrain",
        "args": ["serve"],
        "idleTimeoutMs": 300000
      }
    }
  }
}

---

[bundle-mcp] failed to start server "gbrain" (gbrain serve): Error: MCP server connection timed out after 30000ms

---

*/5 * * * * /path/to/gbrain-guardian.sh

RAW_BUFFERClick to expand / collapse

Describe the bug

When the gateway restarts (via openclaw gateway restart), MCP servers configured in mcp.servers that use stdio transport are left running as orphan processes. The new gateway instance forks fresh MCP server processes, but the old ones from the previous gateway instance are never cleaned up.

Config

{
  "mcp": {
    "servers": {
      "gbrain": {
        "command": "gbrain",
        "args": ["serve"]
      }
    }
  }
}

Expected behavior

When the gateway shuts down or restarts:

All child MCP server processes should be terminated (SIGTERM → SIGKILL after timeout)
On startup, the gateway should detect and clean up orphan processes from a previous instance (e.g., by PID file or process group)

Additionally, it would be great to support an idleTimeoutMs option per MCP server:

{
  "mcp": {
    "servers": {
      "gbrain": {
        "command": "gbrain",
        "args": ["serve"],
        "idleTimeoutMs": 300000
      }
    }
  }
}

This way, MCP servers that haven't received requests for N minutes would be automatically terminated, and re-spawned on demand.

Logs

[bundle-mcp] failed to start server "gbrain" (gbrain serve): Error: MCP server connection timed out after 30000ms

This error repeats for every agent session trying to connect, because the PGLite database is locked by orphan processes.

Environment

OpenClaw 2026.4.22
macOS 15.x (arm64)
Node v24.13.1
14 Feishu bot accounts, each bound to a different agent
gbrain 0.16.4 as MCP server (PGLite backend)

Workaround

Cron job that kills stale gbrain serve processes every 5 minutes:

*/5 * * * * /path/to/gbrain-guardian.sh

extent analysis

TL;DR

Implement a mechanism to properly terminate and clean up orphan MCP server processes when the gateway restarts, such as using PID files or process groups.

Guidance

Review the gateway's process management code to ensure it correctly handles termination of child processes, including those using stdio transport.
Consider implementing an idleTimeoutMs option per MCP server to automatically terminate idle servers and re-spawn them on demand.
Investigate using a more robust process management approach, such as using a process manager like PM2, to handle MCP server processes.
Verify that the cron job workaround is effectively cleaning up orphan processes and preventing database lock issues.

Example

No code example is provided as the issue does not contain sufficient information about the gateway's process management implementation.

Notes

The provided workaround using a cron job may not be a permanent solution and may have limitations, such as potential race conditions or incomplete cleanup.

Recommendation

Apply the workaround using a cron job to clean up orphan processes, as it provides a temporary solution to prevent database lock issues, while investigating a more permanent fix to the gateway's process management.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#cache error #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix MCP stdio server processes become orphans after gateway restart [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix MCP stdio server processes become orphans after gateway restart [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING