openclaw - ✅(Solved) Fix builtin memory_search can return empty results until builtin index is forced to reindex [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56862Fetched 2026-04-08 01:46:47
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Timeline (top)
commented ×1cross-referenced ×1

Built-in memory_search can return empty results even when the builtin sqlite memory index already exists and manual FTS queries match the expected content.

In our case, forcing a builtin full reindex immediately fixed the issue, without changing provider/model/endpoint.

This suggests the problem is likely in builtin memory index sync/reindex state rather than missing memory data.


Root Cause

Suspected root cause

Fix Action

Workaround

A controlled builtin full reindex worked for us:

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "chunking": {
          "tokens": 401
        }
      }
    }
  }
}

After verification, we converged back to stable config and the queries kept working.


PR fix notes

PR #59137: fix(memory): preserve retry state and embedding cache across reindex rollback

Description (problem / solution / changelog)

Summary

  • Problem: the current extensions/memory-core reindex path preserves the last committed index on rollback, but still loses important recovery state. A failed full session rebuild can forget that the next retry must sweep all sessions, and successful embedding batches can still be discarded when a later batch fails during a safe temp-DB reindex.
  • Why it matters: memory reindex should remain atomic for the committed index while still preserving enough retry and cache progress to converge efficiently after transient failures.
  • Why this PR is still worth landing even without directly closing one issue: the open issues in this area are broader transport/provider availability bugs. This PR does not claim to stop fetch failed from happening, but it does materially reduce the damage when those failures do happen. In manual verification, the same offline force-reindex failure left Embedding cache at 0 on main, while this branch preserved 3540 cache entries after rollback. That means fewer repeated remote embedding calls, less wasted progress, and a faster path to recovery on the next retry even before the larger transport-layer issues are fixed.
  • What changed: this PR snapshots and restores sync state across full reindex rollback, adds an explicit sessionFullRetryPending state for failed full session rebuilds, mirrors successful embedding-cache writes back to the original DB during safe reindex, and adds focused regression coverage for rollback recovery and cache preservation.
  • What did NOT change (scope boundary): no new config knobs, no provider API changes, no query behavior changes, and no broader memory pipeline redesign outside rollback/retry correctness and cache durability.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause / Regression History (if applicable)

  • Root cause: the memory-core reindex flow was correctly preserving the committed SQLite index on rollback, but it was not preserving all of the in-memory state needed for the next retry. In particular, a failed full session rebuild could leave the manager with no dirty-file filter and no explicit signal that the next normal sync still needed a full session pass. Separately, successful embedding batches during safe temp-DB reindex were written only to the temp DB, so a later failure threw away useful cache work.
  • Missing detection / guardrail: existing atomic reindex coverage protected "keep the previous index on failure", but it did not cover the follow-up retry semantics after rollback, post-compaction targeted refresh behavior when a full retry was still pending, or cache preservation across temp-DB rollback.
  • Prior context (git blame, prior PR, issue, or refactor if known): this is the current-main continuation of the reliability work attempted in #55497, adapted to the moved memory-core plugin architecture under extensions/memory-core/src/memory/* instead of the old src/memory/* paths.
  • Why this regressed now: after the memory engine moved behind the plugin/runtime split, the old fix no longer applied cleanly, and the rebuilt atomic reindex path preserved the database contents but still had gaps in retry-state restoration and cache durability.
  • If unknown, what was ruled out: this is not just "main moved on" or a pure merge/rebase issue. The remaining behavior gap exists in current origin/main logic and reproduces in focused local tests without needing the old branch structure.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • extensions/memory-core/src/memory/manager.atomic-reindex.test.ts
    • extensions/memory-core/src/memory/manager.embedding-batches.test.ts
    • extensions/memory-core/src/memory/embedding-manager.test-harness.ts
  • Scenario the test should lock in:
    • a failed full session rebuild keeps sessionsDirty set and records that the next normal sync still needs a full session retry
    • a targeted post-compaction refresh can update one transcript without accidentally clearing the pending full-session retry
    • successful embedding batches are cached before a later batch failure and are reused on the next retry
    • enabling cache on an existing index created before the cache table existed still works cleanly
  • Why this is the smallest reliable guardrail: the bug is in local sync-state transitions and SQLite-backed cache persistence, so focused memory-core tests exercise the relevant contract directly without needing live providers or gateway E2E setup.
  • Existing test that already covers this (if any):
    • the existing atomic reindex test already covered preserving the last committed index on failure
  • If no new test is added, why not:
    • N/A

User-visible / Behavior Changes

  • Full memory reindex keeps the previous committed index on failure and now also preserves enough dirty/retry state for the next sync to recover correctly.
  • If a full session rebuild fails mid-run, the next normal sync still knows it must perform a full session retry instead of silently downgrading to an incomplete incremental pass.
  • Successful embedding batches can now survive a later safe-reindex rollback through cache mirroring, which reduces repeated provider work on retry.
  • Existing indexes created before the embedding cache table existed can enable cache without failing reindex.

Diagram (if applicable)

Before:
[force reindex] -> [temp DB builds some state] -> [later failure]
               -> [committed DB restored]
               -> [full-session retry intent / temp-cache progress partly lost]

After:
[force reindex] -> [temp DB builds some state]
               -> [successful cache batches mirrored to committed DB]
               -> [later failure]
               -> [committed DB restored + retry snapshot restored + full-session retry flagged]
               -> [next normal sync converges]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 24.13.0 + pnpm + Vitest
  • Model/provider: OpenAI-compatible remote embeddings (provider: openai, model: BAAI/bge-m3) for manual verification; mocked provider in targeted tests
  • Integration/channel (if any): CLI / builtin memory index
  • Relevant config (redacted):
    • memorySearch.provider: "openai"
    • memorySearch.model: "BAAI/bge-m3"
    • memorySearch.extraPaths: [~/Applications/Notes/Notebook]
    • memorySearch.cache.enabled: true
    • targeted tests also cover experimental.sessionMemory: true

Steps

  1. Start from an existing successful memory index with Embedding cache populated.
  2. Clear only embedding_cache from ~/.openclaw/memory/main.sqlite, leaving the committed index rows intact.
  3. Disable networking and run openclaw memory index --force on main.
  4. Record openclaw memory status and sqlite3 ~/.openclaw/memory/main.sqlite 'SELECT COUNT(*) FROM embedding_cache;'.
  5. Switch to fix/memory-reindex-recovery and repeat the same offline openclaw memory index --force and openclaw memory status.
  6. Run pnpm test -- manager.atomic-reindex.test.ts.
  7. Run pnpm test -- manager.embedding-batches.test.ts.

Expected

  • A failed full reindex should preserve the committed index and restore enough retry state for the next sync to finish the rolled-back work.
  • A failed full session rebuild should leave an explicit signal that a full session retry is still required.
  • Successful embedding batches should be reused after a later batch failure during safe reindex.
  • Turning on cache for an older index should not fail because the previous DB lacked embedding_cache.

Actual

  • On main, the offline force-reindex failed, the previously committed index remained intact, and Embedding cache remained at 0.
  • On fix/memory-reindex-recovery, the offline force-reindex also failed, the previously committed index remained intact, and Embedding cache increased to 3540, showing that successful embedding batches were preserved across the failed safe reindex.
  • The targeted rollback and cache regression tests also pass with the new logic.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Manual CLI evidence on main

openclaw memory index --force
Memory index failed (main): fetch failed | other side closed
openclaw memory status
Memory Search (main)
Provider: openai (requested: openai)
Model: BAAI/bge-m3
Sources: memory
Extra paths: ~/Applications/Notes/Notebook
Indexed: 2361/2361 files · 14147 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
By source:
  memory · 2361/2361 files · 14147 chunks
Vector: ready
Vector dims: 1024
FTS: ready
Embedding cache: enabled (0 entries)
sqlite3 ~/.openclaw/memory/main.sqlite 'SELECT COUNT(*) FROM embedding_cache;'
0

Manual CLI evidence on fix/memory-reindex-recovery

openclaw memory index --force
Memory index failed (main): fetch failed | Client network socket disconnected before secure TLS connection was established
openclaw memory status
Memory Search (main)
Provider: openai (requested: openai)
Model: BAAI/bge-m3
Sources: memory
Extra paths: ~/Applications/Notes/Notebook
Indexed: 2361/2361 files · 14147 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
By source:
  memory · 2361/2361 files · 14147 chunks
Vector: ready
Vector dims: 1024
FTS: ready
Embedding cache: enabled (3540 entries)

Local targeted tests

pnpm test -- manager.atomic-reindex.test.ts
pnpm test -- manager.embedding-batches.test.ts

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • reviewed the current extensions/memory-core code paths against the original PR intent and confirmed the remaining gaps were retry-state restoration and cache durability, not just branch drift
    • cleared embedding_cache, disabled networking, and ran openclaw memory index --force on main
    • verified that main preserved the previously committed index but left Embedding cache at 0
    • switched to fix/memory-reindex-recovery, repeated the same offline force-reindex, and verified that the previously committed index was still preserved while Embedding cache increased to 3540
    • ran pnpm test -- manager.atomic-reindex.test.ts
    • ran pnpm test -- manager.embedding-batches.test.ts
  • Edge cases checked:
    • safe reindex rollback after some embedding batches already succeeded
    • cache table absent in the original DB
    • targeted session refresh after a failed full rebuild
  • What you did not verify:
    • a live CLI repro for the session full-retry state machine
    • batch API fallback flows beyond the local regression tests
    • multi-process concurrent indexing

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: sessionFullRetryPending adds another state bit to the session sync state machine.
    • Mitigation: focused regression coverage now checks that targeted refreshes do not accidentally clear a pending full retry and that successful full retries clear the flag.
  • Risk: cache mirroring during safe reindex writes successful embedding batches into the original DB before the temp DB is committed.
    • Mitigation: the mirrored data is intentionally limited to reusable embedding-cache rows, not index rows; the committed searchable index still remains atomic, and the new tests lock in the intended rollback behavior.
  • Risk: older indexes may not have embedding_cache.
    • Mitigation: the code now probes for table existence before seeding or mirroring cache rows, and the regression test covers enabling cache on a pre-cache index.

Changed files

  • extensions/memory-core/src/memory/embedding-manager.test-harness.ts (modified, +19/-3)
  • extensions/memory-core/src/memory/manager-embedding-ops.ts (modified, +33/-16)
  • extensions/memory-core/src/memory/manager-sync-ops.ts (modified, +193/-37)
  • extensions/memory-core/src/memory/manager.atomic-reindex.test.ts (modified, +478/-6)
  • extensions/memory-core/src/memory/manager.embedding-batches.test.ts (modified, +33/-0)

Code Example

{ "results": [] }

---

agents.defaults.memorySearch.chunking.tokens: 400 -> 401

---

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "chunking": {
          "tokens": 401
        }
      }
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

Built-in memory_search can return empty results even when the builtin sqlite memory index already exists and manual FTS queries match the expected content.

In our case, forcing a builtin full reindex immediately fixed the issue, without changing provider/model/endpoint.

This suggests the problem is likely in builtin memory index sync/reindex state rather than missing memory data.


Environment

  • OpenClaw runtime around 2026.3.24
  • Checked release notes up to v2026.3.28
  • Backend in use: builtin (not qmd)
  • memory_search config:
    • provider: openai
    • model: embedding-3
    • remote.baseUrl: https://open.bigmodel.cn/api/paas/v4

Symptoms

Queries like:

  • What is the user's documentation habit?
  • What is the gateway port?
  • How is Feishu configured?

would often return:

{ "results": [] }

But the local builtin sqlite index already contained the relevant memory data.


What we verified

  • builtin sqlite database existed at ~/.openclaw/memory/main.sqlite
  • tables like chunks, chunks_fts, chunks_vec, embedding_cache, meta were present and populated
  • manual sqlite FTS queries matched expected MEMORY.md / memory/*.md content
  • actual backend was builtin, not qmd

So this does not look like missing ingest or broken sqlite FTS.


Strong signal that this is a builtin reindex/sync issue

We forced a builtin full reindex by temporarily changing:

agents.defaults.memorySearch.chunking.tokens: 400 -> 401

After restart, the previously empty memory_search queries started returning correct results immediately.

We did not change:

  • provider
  • model
  • endpoint

That strongly suggests builtin search was operating on a stale / transitional index state before reindex.


Suspected root cause

Builtin memory_search appears to be able to run while the builtin index is dirty / needs reindex, instead of waiting for sync/reindex completion.

So the current query may execute against a stale or transitional state and return empty results.


Suggested fixes

  1. Do not run builtin memory_search against transitional index state.
  2. Await sync/reindex completion before returning empty results.
  3. Expose builtin memory status (dirty / needsFullReindex / indexed content count / last sync).
  4. Add a first-class builtin memory reindex command.

Workaround

A controlled builtin full reindex worked for us:

{
  "agents": {
    "defaults": {
      "memorySearch": {
        "chunking": {
          "tokens": 401
        }
      }
    }
  }
}

After verification, we converged back to stable config and the queries kept working.


Release note check

We reviewed release notes up to v2026.3.28 and did not find an obvious fix for this builtin memory_search / reindex timing issue.

extent analysis

Fix Plan

To address the issue of memory_search returning empty results due to a stale or transitional index state, we will implement the following steps:

  • Await sync/reindex completion: Modify the memory_search function to wait for the sync/reindex process to complete before executing queries.
  • Expose builtin memory status: Add functionality to expose the builtin memory status, including dirty state, need for full reindex, indexed content count, and last sync timestamp.
  • Add a first-class builtin memory reindex command: Introduce a command to manually trigger a full reindex of the builtin memory.

Example Code

Here's an example of how the modified memory_search function could be implemented:

import sqlite3
import time

def memory_search(query):
    # Check if the index is in a dirty or transitional state
    if is_index_dirty():
        # Wait for the sync/reindex process to complete
        while is_index_dirty():
            time.sleep(1)
    
    # Execute the query against the updated index
    conn = sqlite3.connect('~/.openclaw/memory/main.sqlite')
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM chunks_fts WHERE content MATCH ?', (query,))
    results = cursor.fetchall()
    conn.close()
    return results

def is_index_dirty():
    # Check the index status by querying the 'meta' table
    conn = sqlite3.connect('~/.openclaw/memory/main.sqlite')
    cursor = conn.cursor()
    cursor.execute('SELECT dirty FROM meta')
    dirty = cursor.fetchone()[0]
    conn.close()
    return dirty

Verification

To verify that the fix worked, you can test the memory_search function with queries that previously returned empty results. The function should now return the expected results after waiting for the sync/reindex process to complete.

Extra Tips

  • Make sure to review the release notes for any updates that may address this issue.
  • Consider adding logging to track the index status and sync/reindex process to help with debugging.
  • You can use the is_index_dirty function to expose the builtin memory status and provide a more informative error message when the index is in a dirty or transitional state.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING