openclaw - ✅(Solved) Fix memorySearch sync.watch leaks ~16k file descriptors with large extraPaths trees [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78224Fetched 2026-05-07 03:39:27
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
2
Timeline (top)
mentioned ×3subscribed ×3commented ×1cross-referenced ×1

memorySearch.sync.watch: true accumulates ~16,000 REG file descriptors over 1–3 hours when extraPaths contains a directory with thousands of .md files, leading to ulimit saturation and spawn EBADF for any subsequent child-process spawn from the gateway. Disabling the watcher (sync.watch: false) immediately stops the leak.

Error Message

watchPaths adds whole directories (e.g. lcm-summaries/), and chokidar walks recursively. With awaitWriteFinish enabled, chokidar polls each tracked file every 100ms to determine write stability. On a 15k-file tree that's a lot of stat/open work, and any close-on-error path that's not airtight will leak fds at scale.

Root Cause

memorySearch is one of the more compelling features of the agent runtime, and sync.watch: false as a workaround means losing live re-indexing. For users with large LCM corpora (which is exactly the audience that benefits most from semantic recall), the current default cascades into gateway crashes within hours.

Fix Action

Fix / Workaround

Workaround (verified)

  1. Per-path watch opt-out. Allow extraPaths: [{path, watch: false}] (or a sibling sync.watch.ignorePaths array). LCM summaries are append-only and don't benefit from live re-indexing; a session-start scan is sufficient. Today there's no way to watch some extraPaths but not others — it's all-or-nothing per agent.
  2. Per-tree file-count cap. If walkDir under an extraPath exceeds a threshold (e.g. 5,000 files), log a warning and skip adding that subtree to the watcher. Users who want it can override.
  3. Drop awaitWriteFinish from the memory watcher. The existing scheduleWatchSync debounce is sufficient for index freshness; awaitWriteFinish's polling is the most fd-expensive chokidar mode.
  4. Self-defense. Periodic fd-count check in MemoryManagerSyncOps; if growth is monotonic, close and recreate the watcher (mitigation, not root-cause).

memorySearch is one of the more compelling features of the agent runtime, and sync.watch: false as a workaround means losing live re-indexing. For users with large LCM corpora (which is exactly the audience that benefits most from semantic recall), the current default cascades into gateway crashes within hours.

PR fix notes

PR #78231: fix: avoid memory watcher write polling

Description (problem / solution / changelog)

Summary

  • stop passing awaitWriteFinish to the memorySearch chokidar watcher
  • rely on the existing dirty/scheduleWatchSync debounce instead of per-file write-stability polling
  • update the watcher config regression test to assert write polling stays disabled

Fixes #78224.

Tests

  • PATH="/tmp/openclaw-pnpm-shim:$PATH" pnpm exec oxfmt --check extensions/memory-core/src/memory/manager-sync-ops.ts extensions/memory-core/src/memory/manager.watcher-config.test.ts
  • git diff --check
  • attempted: PATH="/tmp/openclaw-pnpm-shim:$PATH" node scripts/run-vitest.mjs run extensions/memory-core/src/memory/manager.watcher-config.test.ts — blocked before tests by missing local @openclaw/fs-safe/config
  • attempted: PATH="/tmp/openclaw-pnpm-shim:$PATH" node scripts/check-changed.mjs — early lanes passed; blocked by unrelated existing extension/core typecheck diagnostics including missing @openclaw/fs-safe/* and strictness errors outside this patch

Changed files

  • extensions/memory-core/src/memory/manager-sync-ops.ts (modified, +0/-4)
  • extensions/memory-core/src/memory/manager.watcher-config.test.ts (modified, +1/-1)

Code Example

$ lsof -p $(pgrep -f openclaw-gateway) | awk '{print $5}' | sort | uniq -c
  ~16000 REG     # ~95% in ~/.openclaw/rag-index/lcm-summaries/, rest in workspace docs/
   ~few KQUEUE
   ...

---

{
     "agents": {
       "list": [{
         "name": "main",
         "memorySearch": {
           "enabled": true,
           "sync": {"watch": true, "onSessionStart": true, "onSearch": true}
         }
       }],
       "defaults": {
         "memorySearch": {
           "extraPaths": ["~/.openclaw/rag-index/lcm-summaries", "~/some/docs/tree"]
         }
       }
     }
   }

---

this.watcher = resolveMemoryWatchFactory()(Array.from(watchPaths), {
  ignoreInitial: true,
  ignored: (watchPath, stats) => shouldIgnoreMemoryWatchPath(watchPath, stats, this.settings.multimodal),
  awaitWriteFinish: {
    stabilityThreshold: this.settings.sync.watchDebounceMs,
    pollInterval: 100
  }
});
RAW_BUFFERClick to expand / collapse

Summary

memorySearch.sync.watch: true accumulates ~16,000 REG file descriptors over 1–3 hours when extraPaths contains a directory with thousands of .md files, leading to ulimit saturation and spawn EBADF for any subsequent child-process spawn from the gateway. Disabling the watcher (sync.watch: false) immediately stops the leak.

Environment

  • openclaw 2026.5.4 (npm global, Node v24.14.0)
  • macOS Darwin 25.3.0 (arm64)
  • ~/.openclaw/rag-index/lcm-summaries/ contains ~15,000 markdown summaries (heavy LCM user via @martian-engineering/lossless-claw 0.9.4)
  • Gateway launched via launchd (gui/$UID/ai.openclaw.gateway)

Symptom

$ lsof -p $(pgrep -f openclaw-gateway) | awk '{print $5}' | sort | uniq -c
  ~16000 REG     # ~95% in ~/.openclaw/rag-index/lcm-summaries/, rest in workspace docs/
   ~few KQUEUE
   ...

After ~1–3h of normal session activity, fds saturate the soft ulimit (raised to 65536 here; default 256 saturates much faster). First externally visible failure: spawn EBADF when the gateway tries to spawn any helper process. /readyz reports degraded.

Repro

  1. Configure a main agent with:
    {
      "agents": {
        "list": [{
          "name": "main",
          "memorySearch": {
            "enabled": true,
            "sync": {"watch": true, "onSessionStart": true, "onSearch": true}
          }
        }],
        "defaults": {
          "memorySearch": {
            "extraPaths": ["~/.openclaw/rag-index/lcm-summaries", "~/some/docs/tree"]
          }
        }
      }
    }
  2. Ensure one of the extraPaths directories contains 10k+ .md files.
  3. Restart gateway. Sample fd count over time — grows from ~50 to 16k+ within a few hours.

Workaround (verified)

Set agents.list[N].memorySearch.sync.watch: false. Keeps onSessionStart + onSearch enabled so recall still works (one-shot scans release fds correctly); only the long-lived watcher is removed.

After applying, fd count stabilizes at 46–69 over 10+ minutes (vs. growth to 16k+ before).

Suspected location

dist/manager-CYeCmxMa.js:1067 (MemoryManagerSyncOps.ensureWatcher):

this.watcher = resolveMemoryWatchFactory()(Array.from(watchPaths), {
  ignoreInitial: true,
  ignored: (watchPath, stats) => shouldIgnoreMemoryWatchPath(watchPath, stats, this.settings.multimodal),
  awaitWriteFinish: {
    stabilityThreshold: this.settings.sync.watchDebounceMs,
    pollInterval: 100
  }
});

watchPaths adds whole directories (e.g. lcm-summaries/), and chokidar walks recursively. With awaitWriteFinish enabled, chokidar polls each tracked file every 100ms to determine write stability. On a 15k-file tree that's a lot of stat/open work, and any close-on-error path that's not airtight will leak fds at scale.

The downstream markDirty → scheduleWatchSync → listMemoryFiles + buildFileEntry re-walks extraPaths on each event. buildFileEntry uses fs.promises.readFile (which auto-closes), so I don't think the indexer itself is the leak — it's the watcher infrastructure.

I cannot pinpoint a single line without dtrace on open/close syscalls, but the lsof evidence (REG fds, all in watched paths, leak gated by sync.watch) puts it firmly in the watcher's lifecycle.

Proposed fixes (in order of preference)

  1. Per-path watch opt-out. Allow extraPaths: [{path, watch: false}] (or a sibling sync.watch.ignorePaths array). LCM summaries are append-only and don't benefit from live re-indexing; a session-start scan is sufficient. Today there's no way to watch some extraPaths but not others — it's all-or-nothing per agent.
  2. Per-tree file-count cap. If walkDir under an extraPath exceeds a threshold (e.g. 5,000 files), log a warning and skip adding that subtree to the watcher. Users who want it can override.
  3. Drop awaitWriteFinish from the memory watcher. The existing scheduleWatchSync debounce is sufficient for index freshness; awaitWriteFinish's polling is the most fd-expensive chokidar mode.
  4. Self-defense. Periodic fd-count check in MemoryManagerSyncOps; if growth is monotonic, close and recreate the watcher (mitigation, not root-cause).

I'm happy to PR (1) — the config-shape change is small and backward-compatible.

Why this matters

memorySearch is one of the more compelling features of the agent runtime, and sync.watch: false as a workaround means losing live re-indexing. For users with large LCM corpora (which is exactly the audience that benefits most from semantic recall), the current default cascades into gateway crashes within hours.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING