openclaw - 💡(How to fix) Fix MCP stdio server processes become orphans after gateway restart [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72112Fetched 2026-04-27 05:34:41
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Author
Timeline (top)
closed ×1commented ×1

Error Message

[bundle-mcp] failed to start server "gbrain" (gbrain serve): Error: MCP server connection timed out after 30000ms

Root Cause

Over a day of normal operation (with periodic gateway restarts for config changes), I accumulated 41 orphan gbrain serve processes, each holding a PGLite database lock. This caused new MCP connections to time out (30s) because they couldn't acquire the database lock.

Fix Action

Fix / Workaround

Workaround

Code Example

{
  "mcp": {
    "servers": {
      "gbrain": {
        "command": "gbrain",
        "args": ["serve"]
      }
    }
  }
}

---

{
  "mcp": {
    "servers": {
      "gbrain": {
        "command": "gbrain",
        "args": ["serve"],
        "idleTimeoutMs": 300000
      }
    }
  }
}

---

[bundle-mcp] failed to start server "gbrain" (gbrain serve): Error: MCP server connection timed out after 30000ms

---

*/5 * * * * /path/to/gbrain-guardian.sh
RAW_BUFFERClick to expand / collapse

Describe the bug

When the gateway restarts (via openclaw gateway restart), MCP servers configured in mcp.servers that use stdio transport are left running as orphan processes. The new gateway instance forks fresh MCP server processes, but the old ones from the previous gateway instance are never cleaned up.

Over a day of normal operation (with periodic gateway restarts for config changes), I accumulated 41 orphan gbrain serve processes, each holding a PGLite database lock. This caused new MCP connections to time out (30s) because they couldn't acquire the database lock.

Config

{
  "mcp": {
    "servers": {
      "gbrain": {
        "command": "gbrain",
        "args": ["serve"]
      }
    }
  }
}

Expected behavior

When the gateway shuts down or restarts:

  1. All child MCP server processes should be terminated (SIGTERM → SIGKILL after timeout)
  2. On startup, the gateway should detect and clean up orphan processes from a previous instance (e.g., by PID file or process group)

Additionally, it would be great to support an idleTimeoutMs option per MCP server:

{
  "mcp": {
    "servers": {
      "gbrain": {
        "command": "gbrain",
        "args": ["serve"],
        "idleTimeoutMs": 300000
      }
    }
  }
}

This way, MCP servers that haven't received requests for N minutes would be automatically terminated, and re-spawned on demand.

Logs

[bundle-mcp] failed to start server "gbrain" (gbrain serve): Error: MCP server connection timed out after 30000ms

This error repeats for every agent session trying to connect, because the PGLite database is locked by orphan processes.

Environment

  • OpenClaw 2026.4.22
  • macOS 15.x (arm64)
  • Node v24.13.1
  • 14 Feishu bot accounts, each bound to a different agent
  • gbrain 0.16.4 as MCP server (PGLite backend)

Workaround

Cron job that kills stale gbrain serve processes every 5 minutes:

*/5 * * * * /path/to/gbrain-guardian.sh

extent analysis

TL;DR

Implement a mechanism to properly terminate and clean up orphan MCP server processes when the gateway restarts, such as using PID files or process groups.

Guidance

  • Review the gateway's process management code to ensure it correctly handles termination of child processes, including those using stdio transport.
  • Consider implementing an idleTimeoutMs option per MCP server to automatically terminate idle servers and re-spawn them on demand.
  • Investigate using a more robust process management approach, such as using a process manager like PM2, to handle MCP server processes.
  • Verify that the cron job workaround is effectively cleaning up orphan processes and preventing database lock issues.

Example

No code example is provided as the issue does not contain sufficient information about the gateway's process management implementation.

Notes

The provided workaround using a cron job may not be a permanent solution and may have limitations, such as potential race conditions or incomplete cleanup.

Recommendation

Apply the workaround using a cron job to clean up orphan processes, as it provides a temporary solution to prevent database lock issues, while investigating a more permanent fix to the gateway's process management.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix MCP stdio server processes become orphans after gateway restart [1 comments, 2 participants]