openclaw - 💡(How to fix) Fix MemoryIndexManager.close() races with in-flight sync — provider/DB closed before sync settles

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

MemoryIndexManager.close() can race with an in-flight sync() such that the embedding provider and SQLite DB are closed before the in-flight sync settles. The in-flight sync then runs against closed resources, which can leave the index stale, mislabeled (e.g. fts-only metadata/rows written for a provider that should have stayed live), or surface as crash-loop-adjacent errors during manager replacement or process shutdown.

Two related sub-races identified by codex review against extensions/memory-core/src/memory/manager.ts in the current main (HEAD 01c5ab8d13):

  1. pendingSync snapshot taken too early (manager.ts:940-941). When close() races with a first sync() that's still inside ensureProviderInitialized(), this.syncing is null at the snapshot. After provider init resolves, that sync continues and assigns this.syncing, but close() uses the stale pendingSync = null value and proceeds to close/null the provider and close the DB. The in-flight sync then writes against closed resources.
  2. Provider closed before pendingSync settles (manager.ts:988-989). When close() runs while a background memory sync is still active, closeCurrentProvider() clears this.provider before pendingSync resolves. The in-flight sync reads this.provider throughout indexing and meta writes, so it can suddenly switch to FTS-only behavior, write fts-only metadata, or trip the local provider's closed checks mid-embedding.

Both sub-races are pre-existing in main (verified git merge-base --is-ancestor 01c5ab8d13 against current). They were surfaced by Codex review on PR #86701 (the FD-leak fix for #86613); that PR's diff is in a disjoint region (close() lines 958-967 for native watcher teardown) and does not introduce these races.

Root Cause

MemoryIndexManager.close() can race with an in-flight sync() such that the embedding provider and SQLite DB are closed before the in-flight sync settles. The in-flight sync then runs against closed resources, which can leave the index stale, mislabeled (e.g. fts-only metadata/rows written for a provider that should have stayed live), or surface as crash-loop-adjacent errors during manager replacement or process shutdown.

Two related sub-races identified by codex review against extensions/memory-core/src/memory/manager.ts in the current main (HEAD 01c5ab8d13):

  1. pendingSync snapshot taken too early (manager.ts:940-941). When close() races with a first sync() that's still inside ensureProviderInitialized(), this.syncing is null at the snapshot. After provider init resolves, that sync continues and assigns this.syncing, but close() uses the stale pendingSync = null value and proceeds to close/null the provider and close the DB. The in-flight sync then writes against closed resources.
  2. Provider closed before pendingSync settles (manager.ts:988-989). When close() runs while a background memory sync is still active, closeCurrentProvider() clears this.provider before pendingSync resolves. The in-flight sync reads this.provider throughout indexing and meta writes, so it can suddenly switch to FTS-only behavior, write fts-only metadata, or trip the local provider's closed checks mid-embedding.

Both sub-races are pre-existing in main (verified git merge-base --is-ancestor 01c5ab8d13 against current). They were surfaced by Codex review on PR #86701 (the FD-leak fix for #86613); that PR's diff is in a disjoint region (close() lines 958-967 for native watcher teardown) and does not introduce these races.

Fix Action

Fix / Workaround

Identified via codex review --base origin/main runs on 2026-05-26 against the PR #86701 diff. The codex tool flagged this in two separate review passes (first pass on initial patch, second pass on the post-revision patch), each call-out targeting one of the two sub-races above. AI-assisted issue authorship; technical findings sourced from Codex review output.

RAW_BUFFERClick to expand / collapse

Bug type

Bug (race condition)

Beta release blocker

No

Summary

MemoryIndexManager.close() can race with an in-flight sync() such that the embedding provider and SQLite DB are closed before the in-flight sync settles. The in-flight sync then runs against closed resources, which can leave the index stale, mislabeled (e.g. fts-only metadata/rows written for a provider that should have stayed live), or surface as crash-loop-adjacent errors during manager replacement or process shutdown.

Two related sub-races identified by codex review against extensions/memory-core/src/memory/manager.ts in the current main (HEAD 01c5ab8d13):

  1. pendingSync snapshot taken too early (manager.ts:940-941). When close() races with a first sync() that's still inside ensureProviderInitialized(), this.syncing is null at the snapshot. After provider init resolves, that sync continues and assigns this.syncing, but close() uses the stale pendingSync = null value and proceeds to close/null the provider and close the DB. The in-flight sync then writes against closed resources.
  2. Provider closed before pendingSync settles (manager.ts:988-989). When close() runs while a background memory sync is still active, closeCurrentProvider() clears this.provider before pendingSync resolves. The in-flight sync reads this.provider throughout indexing and meta writes, so it can suddenly switch to FTS-only behavior, write fts-only metadata, or trip the local provider's closed checks mid-embedding.

Both sub-races are pre-existing in main (verified git merge-base --is-ancestor 01c5ab8d13 against current). They were surfaced by Codex review on PR #86701 (the FD-leak fix for #86613); that PR's diff is in a disjoint region (close() lines 958-967 for native watcher teardown) and does not introduce these races.

Steps to reproduce

NOT_ENOUGH_INFO — race is timing-dependent on the interleaving of close() and sync() against ensureProviderInitialized() / pendingSync resolution. No deterministic local repro available; identified by source review of the close() lifecycle.

Stress-style reproduction would involve:

  1. Triggering closeMemoryIndexManagersForAgent() or closeAllMemoryIndexManagers() concurrently with workspace memory traffic that drives sync() invocations.
  2. Sampling for: (a) fts-only rows or metadata appearing where embedding-backed rows are expected; (b) provider-closed errors during sync flush; (c) DB-closed errors during meta write.

The original #86613 captures showed unrelated FD growth but no specific sub-agent EBADF was traced to this close-time race; that doesn't rule out the race, only confirms it wasn't the dominant signal there.

Expected behavior

close() should wait for any in-flight sync (including syncs blocked in ensureProviderInitialized()) to fully settle before closing or clearing provider / DB resources. Alternatively, in-flight syncs should snapshot the provider reference they need and not read shared mutable state across await points spanning a possible close().

Actual behavior

In-flight sync continues against closed provider / DB. Possible manifestations:

  • Mislabeled index entries (fts-only rows for what should be an embedded source).
  • Silent embedding errors swallowed by the existing retry path.
  • closed errors surfacing on DB statement preparation during meta writes at end of sync.
  • Possibly contributes to occasional EBADF-class failures during gateway shutdown or hot restart cycles (not isolated).

OpenClaw version

Reproduces against main 01c5ab8d13. The relevant close() code has been stable in this shape for several commits; not a recent regression.

Operating system

Platform-independent (logic-level race in close() ordering); identified on macOS 14.7.6 / Node >=22.19.

Install method

Any (race is in extensions/memory-core/src/memory/manager.ts, not in install-specific code).

Suggested fix shape (for discussion, not prescriptive)

Two viable shapes:

  1. Re-read this.syncing after each await in close():
    • After await awaitPendingManagerWork({ pendingProviderInit }), re-read this.syncing and await again before proceeding to close the provider / DB.
    • Apply the same pattern to any future await steps that could allow a new sync() to attach.
  2. Per-sync provider snapshot:
    • Have sync() capture a local reference to this.provider once at the top and use that reference (rather than this.provider) for all subsequent reads across awaits.
    • close() can then close the shared this.provider, knowing the in-flight sync holds its own reference and will complete via that.

Either shape needs careful interaction with the existing awaitPendingManagerWork helper and the closed flag check that already exists at the top of close().

Related

  • PR #86701 (the FD-leak fix for #86613) does not touch this race; this is a separate issue.
  • #86345 (INDEX_CACHE lifetime bounding) overlaps the same close() lifecycle area but addresses cache eviction policy, not the provider/DB close ordering. Worth coordinating fixes if both land.

Notes

Identified via codex review --base origin/main runs on 2026-05-26 against the PR #86701 diff. The codex tool flagged this in two separate review passes (first pass on initial patch, second pass on the post-revision patch), each call-out targeting one of the two sub-races above. AI-assisted issue authorship; technical findings sourced from Codex review output.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

close() should wait for any in-flight sync (including syncs blocked in ensureProviderInitialized()) to fully settle before closing or clearing provider / DB resources. Alternatively, in-flight syncs should snapshot the provider reference they need and not read shared mutable state across await points spanning a possible close().

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING