openclaw - ✅(Solved) Fix builtin memory_search can return empty results until builtin index is forced to reindex [2 pull requests, 1 comments, 2 participants]

CadilarcZhang · 2026-03-29T07:43:14Z

[openclaw] Built-in memory search can return empty results even when the builtin sqlite memory index already exists and manual FTS queries match the expected c… Built-in `memory_search` can return empty results even when the builtin sqlite memory index already exists and manual FTS queries match the expected content. In our case, forcing a builtin full reindex immediately fixed the issue, without changing provider/model/endpoint. This suggests the problem is likely in builtin memory index sync/reindex state rather than missing memory data. --- # PR #59137: fix(memory): preserve retry state and embedding cache across reindex rollback - Repository: openclaw/openclaw - Author: TSHOGX - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/59137 ## Description (problem / solution / changelog) ## Summary - Problem: the current `extensions/memory-core` reindex path preserves the last committed index on rollback, but still loses important recovery state. A failed full session rebuild can forget that the next retry must sweep all sessions, and successful embedding batches can still be discarded when a later batch fails during a safe temp-DB reindex. - Why it matters: memory reindex should remain atomic for the committed index while still preserving enough retry and cache progress to converge efficiently after transient failures. - Why this PR is still worth landing even without directly closing one issue: the open issues in this area are broader transport/provider availability bugs. This PR does not claim to stop `fetch failed` from happening, but it does materially reduce the damage when those failures do happen. In manual verification, the same offline force-reindex failure left `Embedding cache` at `0` on `main`, while this branch preserved `3540` cache entries after rollback. That means fewer repeated remote embedding calls, less wasted progress, and a faster path to recovery on the next retry even before the larger transport-layer issues are fixed. - What changed: this PR snapshots and restores sync state across full reindex rollback, adds an explicit `sessionFullRetryPending` state for failed full session rebuilds, mirrors successful embedding-cache writes back to the original DB during safe reindex, and adds focused regression coverage for rollback recovery and cache preservation. - What did NOT change (scope boundary): no new config knobs, no provider API changes, no query behavior changes, and no broader memory pipeline redesign outside rollback/retry correctness and cache durability. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [x] Memory / storage - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes: none - Related: [#56901 Bug: Memory Index Fails for Main Agent with 'fetch failed' Error](https://github.com/openclaw/openclaw/issues/56901) - Related: [#56427 memory index --force fails with 'fetch failed' on large workspace (1400+ .md files)](https://github.com/openclaw/openclaw/issues/56427) - Related: [#44166 memory reindex aborts on transient embedding transport errors instead of retrying or splitting the batch](https://github.com/openclaw/openclaw/issues/44166) - Related: [#45981 [Bug]: Session memory index silently dropped on every gateway restart (shouldSync Sessions priority bug)](https://github.com/openclaw/openclaw/issues/45981) - Related: [#56862 builtin memory_search can return empty results until builtin index is forced to reindex](https://github.com/openclaw/openclaw/issues/56862) - Supersedes: [#55497 fix(memory): harden builtin sync and reindex semantics](https://github.com/openclaw/openclaw/pull/55497) - [x] This PR fixes a bug or regression ## Root Cause / Regression History (if applicable) - Root cause: the memory-core reindex flow was correctly preserving the committed SQLite index on rollback, but it was not preserving all of the in-memory state needed for the next retry. In particular, a failed full session rebuild could leave the manager with no dirty-file filter and no explicit signal that the next normal sync still needed a full session pass. Separately, successful embedding batches during safe temp-DB reindex were written only to the temp DB, so a later failure threw away useful cache work. - Missing detection / guardrail: existing atomic reindex coverage protected "keep the previous index on failure", but it did not cover the follow-up retry semantics after rollback, post-compaction targeted refresh behavior when a full retry was still pending, or cache preservation across temp-DB rollback. - Prior context (`git blame`, prior PR, issue, or refactor if known): this is the current-main continuation of the reliability work attempted in #55497, adapted to the moved memory-core plugin architecture

openclaw2026-03-29 07:43:14

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#56862•Fetched 2026-04-08 01:46:47

View on GitHub

Comments

Participants

Timeline

Reactions

Author

CadilarcZhang

Participants

CadilarcZhang

swmeyer1979

Timeline (top)

commented ×1cross-referenced ×1

Built-in memory_search can return empty results even when the builtin sqlite memory index already exists and manual FTS queries match the expected content.

In our case, forcing a builtin full reindex immediately fixed the issue, without changing provider/model/endpoint.

This suggests the problem is likely in builtin memory index sync/reindex state rather than missing memory data.

Root Cause

Suspected root cause

Fix Action

Workaround

A controlled builtin full reindex worked for us:

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "chunking": {
          "tokens": 401
        }
      }
    }
  }
}

After verification, we converged back to stable config and the queries kept working.

PR fix notes

PR #59137: fix(memory): preserve retry state and embedding cache across reindex rollback

Repository: openclaw/openclaw
Author: TSHOGX
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/59137

Description (problem / solution / changelog)

Summary

Problem: the current extensions/memory-core reindex path preserves the last committed index on rollback, but still loses important recovery state. A failed full session rebuild can forget that the next retry must sweep all sessions, and successful embedding batches can still be discarded when a later batch fails during a safe temp-DB reindex.
Why it matters: memory reindex should remain atomic for the committed index while still preserving enough retry and cache progress to converge efficiently after transient failures.
Why this PR is still worth landing even without directly closing one issue: the open issues in this area are broader transport/provider availability bugs. This PR does not claim to stop fetch failed from happening, but it does materially reduce the damage when those failures do happen. In manual verification, the same offline force-reindex failure left Embedding cache at 0 on main, while this branch preserved 3540 cache entries after rollback. That means fewer repeated remote embedding calls, less wasted progress, and a faster path to recovery on the next retry even before the larger transport-layer issues are fixed.
What changed: this PR snapshots and restores sync state across full reindex rollback, adds an explicit sessionFullRetryPending state for failed full session rebuilds, mirrors successful embedding-cache writes back to the original DB during safe reindex, and adds focused regression coverage for rollback recovery and cache preservation.
What did NOT change (scope boundary): no new config knobs, no provider API changes, no query behavior changes, and no broader memory pipeline redesign outside rollback/retry correctness and cache durability.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes: none
Related: #56901 Bug: Memory Index Fails for Main Agent with 'fetch failed' Error
Related: #56427 memory index --force fails with 'fetch failed' on large workspace (1400+ .md files)
Related: #44166 memory reindex aborts on transient embedding transport errors instead of retrying or splitting the batch
Related: #45981 [Bug]: Session memory index silently dropped on every gateway restart (shouldSync Sessions priority bug)
Related: #56862 builtin memory_search can return empty results until builtin index is forced to reindex
Supersedes: #55497 fix(memory): harden builtin sync and reindex semantics
This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

Root cause: the memory-core reindex flow was correctly preserving the committed SQLite index on rollback, but it was not preserving all of the in-memory state needed for the next retry. In particular, a failed full session rebuild could leave the manager with no dirty-file filter and no explicit signal that the next normal sync still needed a full session pass. Separately, successful embedding batches during safe temp-DB reindex were written only to the temp DB, so a later failure threw away useful cache work.
Missing detection / guardrail: existing atomic reindex coverage protected "keep the previous index on failure", but it did not cover the follow-up retry semantics after rollback, post-compaction targeted refresh behavior when a full retry was still pending, or cache preservation across temp-DB rollback.
Prior context (git blame, prior PR, issue, or refactor if known): this is the current-main continuation of the reliability work attempted in #55497, adapted to the moved memory-core plugin architecture under extensions/memory-core/src/memory/* instead of the old src/memory/* paths.
Why this regressed now: after the memory engine moved behind the plugin/runtime split, the old fix no longer applied cleanly, and the rebuilt atomic reindex path preserved the database contents but still had gaps in retry-state restoration and cache durability.
If unknown, what was ruled out: this is not just "main moved on" or a pure merge/rebase issue. The remaining behavior gap exists in current origin/main logic and reproduces in focused local tests without needing the old branch structure.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
- extensions/memory-core/src/memory/manager.atomic-reindex.test.ts
- extensions/memory-core/src/memory/manager.embedding-batches.test.ts
- extensions/memory-core/src/memory/embedding-manager.test-harness.ts
Scenario the test should lock in:
- a failed full session rebuild keeps sessionsDirty set and records that the next normal sync still needs a full session retry
- a targeted post-compaction refresh can update one transcript without accidentally clearing the pending full-session retry
- successful embedding batches are cached before a later batch failure and are reused on the next retry
- enabling cache on an existing index created before the cache table existed still works cleanly
Why this is the smallest reliable guardrail: the bug is in local sync-state transitions and SQLite-backed cache persistence, so focused memory-core tests exercise the relevant contract directly without needing live providers or gateway E2E setup.
Existing test that already covers this (if any):
- the existing atomic reindex test already covered preserving the last committed index on failure
If no new test is added, why not:
- N/A

User-visible / Behavior Changes

Full memory reindex keeps the previous committed index on failure and now also preserves enough dirty/retry state for the next sync to recover correctly.
If a full session rebuild fails mid-run, the next normal sync still knows it must perform a full session retry instead of silently downgrading to an incomplete incremental pass.
Successful embedding batches can now survive a later safe-reindex rollback through cache mirroring, which reduces repeated provider work on retry.
Existing indexes created before the embedding cache table existed can enable cache without failing reindex.

Diagram (if applicable)

Before:
[force reindex] -> [temp DB builds some state] -> [later failure]
               -> [committed DB restored]
               -> [full-session retry intent / temp-cache progress partly lost]

After:
[force reindex] -> [temp DB builds some state]
               -> [successful cache batches mirrored to committed DB]
               -> [later failure]
               -> [committed DB restored + retry snapshot restored + full-session retry flagged]
               -> [next normal sync converges]

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS
Runtime/container: Node 24.13.0 + pnpm + Vitest
Model/provider: OpenAI-compatible remote embeddings (provider: openai, model: BAAI/bge-m3) for manual verification; mocked provider in targeted tests
Integration/channel (if any): CLI / builtin memory index
Relevant config (redacted):
- memorySearch.provider: "openai"
- memorySearch.model: "BAAI/bge-m3"
- memorySearch.extraPaths: [~/Applications/Notes/Notebook]
- memorySearch.cache.enabled: true
- targeted tests also cover experimental.sessionMemory: true

Steps

Start from an existing successful memory index with Embedding cache populated.
Clear only embedding_cache from ~/.openclaw/memory/main.sqlite, leaving the committed index rows intact.
Disable networking and run openclaw memory index --force on main.
Record openclaw memory status and sqlite3 ~/.openclaw/memory/main.sqlite 'SELECT COUNT(*) FROM embedding_cache;'.
Switch to fix/memory-reindex-recovery and repeat the same offline openclaw memory index --force and openclaw memory status.
Run pnpm test -- manager.atomic-reindex.test.ts.
Run pnpm test -- manager.embedding-batches.test.ts.

Expected

A failed full reindex should preserve the committed index and restore enough retry state for the next sync to finish the rolled-back work.
A failed full session rebuild should leave an explicit signal that a full session retry is still required.
Successful embedding batches should be reused after a later batch failure during safe reindex.
Turning on cache for an older index should not fail because the previous DB lacked embedding_cache.

Actual

On main, the offline force-reindex failed, the previously committed index remained intact, and Embedding cache remained at 0.
On fix/memory-reindex-recovery, the offline force-reindex also failed, the previously committed index remained intact, and Embedding cache increased to 3540, showing that successful embedding batches were preserved across the failed safe reindex.
The targeted rollback and cache regression tests also pass with the new logic.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Manual CLI evidence on `main`

openclaw memory index --force

◇
Memory index failed (main): fetch failed | other side closed

openclaw memory status

Memory Search (main)
Provider: openai (requested: openai)
Model: BAAI/bge-m3
Sources: memory
Extra paths: ~/Applications/Notes/Notebook
Indexed: 2361/2361 files · 14147 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
By source:
  memory · 2361/2361 files · 14147 chunks
Vector: ready
Vector dims: 1024
FTS: ready
Embedding cache: enabled (0 entries)

sqlite3 ~/.openclaw/memory/main.sqlite 'SELECT COUNT(*) FROM embedding_cache;'

Manual CLI evidence on `fix/memory-reindex-recovery`

openclaw memory index --force

◇
Memory index failed (main): fetch failed | Client network socket disconnected before secure TLS connection was established

openclaw memory status

Memory Search (main)
Provider: openai (requested: openai)
Model: BAAI/bge-m3
Sources: memory
Extra paths: ~/Applications/Notes/Notebook
Indexed: 2361/2361 files · 14147 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
By source:
  memory · 2361/2361 files · 14147 chunks
Vector: ready
Vector dims: 1024
FTS: ready
Embedding cache: enabled (3540 entries)

Local targeted tests

pnpm test -- manager.atomic-reindex.test.ts
pnpm test -- manager.embedding-batches.test.ts

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- reviewed the current extensions/memory-core code paths against the original PR intent and confirmed the remaining gaps were retry-state restoration and cache durability, not just branch drift
- cleared embedding_cache, disabled networking, and ran openclaw memory index --force on main
- verified that main preserved the previously committed index but left Embedding cache at 0
- switched to fix/memory-reindex-recovery, repeated the same offline force-reindex, and verified that the previously committed index was still preserved while Embedding cache increased to 3540
- ran pnpm test -- manager.atomic-reindex.test.ts
- ran pnpm test -- manager.embedding-batches.test.ts
Edge cases checked:
- safe reindex rollback after some embedding batches already succeeded
- cache table absent in the original DB
- targeted session refresh after a failed full rebuild
What you did not verify:
- a live CLI repro for the session full-retry state machine
- batch API fallback flows beyond the local regression tests
- multi-process concurrent indexing

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps:

Risks and Mitigations

Risk: sessionFullRetryPending adds another state bit to the session sync state machine.
- Mitigation: focused regression coverage now checks that targeted refreshes do not accidentally clear a pending full retry and that successful full retries clear the flag.
Risk: cache mirroring during safe reindex writes successful embedding batches into the original DB before the temp DB is committed.
- Mitigation: the mirrored data is intentionally limited to reusable embedding-cache rows, not index rows; the committed searchable index still remains atomic, and the new tests lock in the intended rollback behavior.
Risk: older indexes may not have embedding_cache.
- Mitigation: the code now probes for table existence before seeding or mirroring cache rows, and the regression test covers enabling cache on a pre-cache index.

Changed files

extensions/memory-core/src/memory/embedding-manager.test-harness.ts (modified, +19/-3)
extensions/memory-core/src/memory/manager-embedding-ops.ts (modified, +33/-16)
extensions/memory-core/src/memory/manager-sync-ops.ts (modified, +193/-37)
extensions/memory-core/src/memory/manager.atomic-reindex.test.ts (modified, +478/-6)
extensions/memory-core/src/memory/manager.embedding-batches.test.ts (modified, +33/-0)

Code Example

{ "results": [] }

---

agents.defaults.memorySearch.chunking.tokens: 400 -> 401

---

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "chunking": {
          "tokens": 401
        }
      }
    }
  }
}

RAW_BUFFERClick to expand / collapse

Summary

Built-in memory_search can return empty results even when the builtin sqlite memory index already exists and manual FTS queries match the expected content.

In our case, forcing a builtin full reindex immediately fixed the issue, without changing provider/model/endpoint.

This suggests the problem is likely in builtin memory index sync/reindex state rather than missing memory data.

Environment

OpenClaw runtime around 2026.3.24
Checked release notes up to v2026.3.28
Backend in use: builtin (not qmd)
memory_search config:
- provider: openai
- model: embedding-3
- remote.baseUrl: https://open.bigmodel.cn/api/paas/v4

Symptoms

Queries like:

What is the user's documentation habit?
What is the gateway port?
How is Feishu configured?

would often return:

{ "results": [] }

But the local builtin sqlite index already contained the relevant memory data.

What we verified

builtin sqlite database existed at ~/.openclaw/memory/main.sqlite
tables like chunks, chunks_fts, chunks_vec, embedding_cache, meta were present and populated
manual sqlite FTS queries matched expected MEMORY.md / memory/*.md content
actual backend was builtin, not qmd

So this does not look like missing ingest or broken sqlite FTS.

Strong signal that this is a builtin reindex/sync issue

We forced a builtin full reindex by temporarily changing:

agents.defaults.memorySearch.chunking.tokens: 400 -> 401

After restart, the previously empty memory_search queries started returning correct results immediately.

We did not change:

provider
model
endpoint

That strongly suggests builtin search was operating on a stale / transitional index state before reindex.

Suspected root cause

Builtin memory_search appears to be able to run while the builtin index is dirty / needs reindex, instead of waiting for sync/reindex completion.

So the current query may execute against a stale or transitional state and return empty results.

Suggested fixes

Do not run builtin memory_search against transitional index state.
Await sync/reindex completion before returning empty results.
Expose builtin memory status (dirty / needsFullReindex / indexed content count / last sync).
Add a first-class builtin memory reindex command.

Workaround

A controlled builtin full reindex worked for us:

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "chunking": {
          "tokens": 401
        }
      }
    }
  }
}

After verification, we converged back to stable config and the queries kept working.

Release note check

We reviewed release notes up to v2026.3.28 and did not find an obvious fix for this builtin memory_search / reindex timing issue.

extent analysis

Fix Plan

To address the issue of memory_search returning empty results due to a stale or transitional index state, we will implement the following steps:

Await sync/reindex completion: Modify the memory_search function to wait for the sync/reindex process to complete before executing queries.
Expose builtin memory status: Add functionality to expose the builtin memory status, including dirty state, need for full reindex, indexed content count, and last sync timestamp.
Add a first-class builtin memory reindex command: Introduce a command to manually trigger a full reindex of the builtin memory.

Example Code

Here's an example of how the modified memory_search function could be implemented:

import sqlite3
import time

def memory_search(query):
    # Check if the index is in a dirty or transitional state
    if is_index_dirty():
        # Wait for the sync/reindex process to complete
        while is_index_dirty():
            time.sleep(1)
    
    # Execute the query against the updated index
    conn = sqlite3.connect('~/.openclaw/memory/main.sqlite')
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM chunks_fts WHERE content MATCH ?', (query,))
    results = cursor.fetchall()
    conn.close()
    return results

def is_index_dirty():
    # Check the index status by querying the 'meta' table
    conn = sqlite3.connect('~/.openclaw/memory/main.sqlite')
    cursor = conn.cursor()
    cursor.execute('SELECT dirty FROM meta')
    dirty = cursor.fetchone()[0]
    conn.close()
    return dirty

Verification

To verify that the fix worked, you can test the memory_search function with queries that previously returned empty results. The function should now return the expected results after waiting for the sync/reindex process to complete.

Extra Tips

Make sure to review the release notes for any updates that may address this issue.
Consider adding logging to track the index status and sync/reindex process to help with debugging.
You can use the is_index_dirty function to expose the builtin memory status and provide a more informative error message when the index is in a dirty or transitional state.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #installation #tensor shape #autograd error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix builtin memory_search can return empty results until builtin index is forced to reindex [2 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Suspected root cause

Fix Action

Workaround

PR fix notes

PR #59137: fix(memory): preserve retry state and embedding cache across reindex rollback

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause / Regression History (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Manual CLI evidence on main

Manual CLI evidence on fix/memory-reindex-recovery

Local targeted tests

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Changed files

Code Example

Summary

Environment

Symptoms

What we verified

Strong signal that this is a builtin reindex/sync issue

Suspected root cause

Suggested fixes

Workaround

Release note check

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Manual CLI evidence on `main`

Manual CLI evidence on `fix/memory-reindex-recovery`