openclaw - 💡(How to fix) Fix Per-agent file watcher fan-out on shared /skills/master-mcp/ causes EMFILE → bundle-mcp dispose → stuck sessions

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

OpenClaw gateway spawns a separate inotify file watcher per agent on the SAME directory (/root/.openclaw/skills/master-mcp/). At ~28 agents this generates enough open file descriptors that EMFILE errors start cascading through the gateway, killing bundle-mcp subprocesses mid-session and leaving agent sessions stuck in status=running.

Error Message

{"subsystem":"gateway/skills","msg":"skills watcher error (/root/.openclaw/workspace/spa-mobile-accountant): Error: EMFILE: too many open files, watch '/root/.openclaw/skills/master-mcp'"} {"subsystem":"gateway/skills","msg":"skills watcher error (/root/.openclaw/workspace/govt-subventions): Error: EMFILE: too many open files, watch '/root/.openclaw/skills/master-mcp'"} {"subsystem":"bundle-mcp","msg":"failed to start server "master-mcp-customer-empathy-ux" (node /root/.openclaw/mcp-servers/master-mcp/src/server.js): McpError: MCP error -32000: Connection closed"} {"subsystem":"diagnostic","msg":"lane task error: lane=main durationMs=29403 error="Error: bundle-mcp runtime disposed for session e1f2d1b5-00d9-48a4-bf0e-c701c90263e2""}

Root Cause

  • Agent sessions get stuck in status=running indefinitely (related to #84931)
  • Internal stuck-session recovery does not fire because the underlying tool runtime is disposed, not "stalled"
  • Operators see this as "agent not responding to messages" — but the agent is fine; bundle-mcp / master-mcp child is dead
  • Recovery requires gateway restart, which kills ALL in-flight sessions on the node

Fix Action

Fix / Workaround

Workarounds

Code Example

{"subsystem":"gateway/skills","msg":"skills watcher error (/root/.openclaw/workspace/spa-mobile-accountant): Error: EMFILE: too many open files, watch '/root/.openclaw/skills/master-mcp'"}
{"subsystem":"gateway/skills","msg":"skills watcher error (/root/.openclaw/workspace/govt-subventions): Error: EMFILE: too many open files, watch '/root/.openclaw/skills/master-mcp'"}
{"subsystem":"bundle-mcp","msg":"failed to start server \"master-mcp-customer-empathy-ux\" (node /root/.openclaw/mcp-servers/master-mcp/src/server.js): McpError: MCP error -32000: Connection closed"}
{"subsystem":"diagnostic","msg":"lane task error: lane=main durationMs=29403 error=\"Error: bundle-mcp runtime disposed for session e1f2d1b5-00d9-48a4-bf0e-c701c90263e2\""}
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw gateway spawns a separate inotify file watcher per agent on the SAME directory (/root/.openclaw/skills/master-mcp/). At ~28 agents this generates enough open file descriptors that EMFILE errors start cascading through the gateway, killing bundle-mcp subprocesses mid-session and leaving agent sessions stuck in status=running.

Environment

  • OpenClaw 2026.5.20-beta.1 (also reproduced on 2026.5.19)
  • Linux (Ubuntu 24.04, Node 22.22.2)
  • 28 agents on Node A, 2 agents on Node B (multi-node gateway federation)
  • Default systemd LimitNOFILE=65536 (already bumped from kernel default 1024)

Reproduction

  1. Run a multi-agent OpenClaw gateway with agents.list[] containing ~25+ entries
  2. Each agent registers its skills allowlist including the shared /root/.openclaw/skills/master-mcp/ directory
  3. Watch gateway logs:
{"subsystem":"gateway/skills","msg":"skills watcher error (/root/.openclaw/workspace/spa-mobile-accountant): Error: EMFILE: too many open files, watch '/root/.openclaw/skills/master-mcp'"}
{"subsystem":"gateway/skills","msg":"skills watcher error (/root/.openclaw/workspace/govt-subventions): Error: EMFILE: too many open files, watch '/root/.openclaw/skills/master-mcp'"}
{"subsystem":"bundle-mcp","msg":"failed to start server \"master-mcp-customer-empathy-ux\" (node /root/.openclaw/mcp-servers/master-mcp/src/server.js): McpError: MCP error -32000: Connection closed"}
{"subsystem":"diagnostic","msg":"lane task error: lane=main durationMs=29403 error=\"Error: bundle-mcp runtime disposed for session e1f2d1b5-00d9-48a4-bf0e-c701c90263e2\""}

Impact

  • Agent sessions get stuck in status=running indefinitely (related to #84931)
  • Internal stuck-session recovery does not fire because the underlying tool runtime is disposed, not "stalled"
  • Operators see this as "agent not responding to messages" — but the agent is fine; bundle-mcp / master-mcp child is dead
  • Recovery requires gateway restart, which kills ALL in-flight sessions on the node

Workarounds

  • Bump LimitNOFILE to 1048576 via systemd drop-in (gives breathing room but does not solve scaling)
  • External watchdog detects stuck status=running + auto-restarts gateway
  • We have both deployed today; sharing config if useful

Proposed fix

When multiple agents watch the same skills directory (/root/.openclaw/skills/<bundle-name>/), the gateway should:

  1. De-duplicate watchers per (gateway, directory) pair — one watcher fires events; the gateway fans out to subscribed agents internally
  2. OR document this as a known limit and recommend LimitNOFILE=1048576 in the gateway docs
  3. OR expose a config to disable per-agent watchers for shared skill directories

Why we think this matters

  • Linear scaling with agent count makes this a wall, not a slow degradation
  • We hit it at 28 agents; teams running 50+ will see it as a hard outage with no obvious cause
  • The cascade (EMFILE → bundle-mcp dispose → stuck session) makes diagnosis hard — the symptom is "agent stuck" not "fd exhausted"

Happy to share watchdog scripts, the systemd drop-in, and the prlimit live-bump pattern if useful for docs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Per-agent file watcher fan-out on shared /skills/master-mcp/ causes EMFILE → bundle-mcp dispose → stuck sessions