openclaw - 💡(How to fix) Fix [Feature]: batched memory embedding should batch over files

Root Cause

Doing openclaw memory search foo blocks until memory indexing is complete.

I have a custom provider that can use the free tier of mistral embeddings. It exposes OpenAI compatible batch API.

It seems that openclaw doesn't batch across files.

As you can see in evidence below the batching only happens on file boundaries.

This is quite wasteful of batching capability as most files are just one or two items and most batch API providers support up to 50k requests per batch job.

The hardcoded 2s polling interval is also a bit intensive as batching is by it's nature slower and low priority and low cost at most providers. A exponential backoff up to perhaps 5 minutes check would suffice.

Code Example

Memory Search (main)
Provider: openai (requested: openai)
Model: mistral/mistral-embed
Sources: memory, sessions
Indexed: 0/1441 files · 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
  memory · 0/832 files · 0 chunks
  sessions · 0/609 files · 0 chunks
Vector store: ready
Semantic vectors: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Cache cap: 50000
Batch: enabled (failures 0/2)
Recall store: 4184 entries · 2 promoted · 4183 concept-tagged · 132 spaced · scripts=4180 latin, 3 mixed
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-05-11T03:00:09.767Z
Dreaming artifacts: diary present · 6 corpus files · ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md

---

...
memory Index (main)
Provider: openai (requested: openai)
Model: mistral/mistral-embed
Sources: memory (MEMORY.md + ~/.openclaw/workspace/memory/*.md), sessions (~/.openclaw/agents/main/sessions/*.jsonl)

21:28:40 [memory] sync: indexing memory files
21:28:40 [memory] embeddings: openai batch submit
21:28:40 [memory] embeddings: openai batch created
21:28:40 [memory] openai batch batch_745ebeb0e7b84478a4bbf652 validating; waiting 2000ms
21:28:42 [memory] openai batch batch_745ebeb0e7b84478a4bbf652 validating; waiting 2000ms
21:28:44 [memory] openai batch batch_745ebeb0e7b84478a4bbf652 in_progress; waiting 2000ms
21:28:47 [memory] embeddings: openai batch submit
21:28:47 [memory] embeddings: openai batch created
21:28:47 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 validating; waiting 2000ms
21:28:49 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 validating; waiting 2000ms
21:28:51 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 validating; waiting 2000ms
21:28:53 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 validating; waiting 2000ms
21:28:55 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 in_progress; waiting 2000ms
21:28:57 [memory] embeddings: openai batch submit
21:28:57 [memory] embeddings: openai batch created
21:28:57 [memory] openai batch batch_1f08116ef8634c52bf317bc0 validating; waiting 2000ms
Indexing memory files (batch)... 2/831 · elapsed 0:18 · eta 124:57 0%^C

Summary

Doing openclaw memory search foo blocks until memory indexing is complete.

I have a custom provider that can use the free tier of mistral embeddings. It exposes OpenAI compatible batch API.

It seems that openclaw doesn't batch across files.

As you can see in evidence below the batching only happens on file boundaries.

This is quite wasteful of batching capability as most files are just one or two items and most batch API providers support up to 50k requests per batch job.

Problem to solve

Memory search is not avalible until embeding is finished and the way it batches right now is wasting batch capabilities.

Proposed solution

Identify where the chunking into batches happens and move that out to the loop that iterates over files instead.

Alternatives considered

No response

Impact

Costing more than necessary if payment is per batch job.

Evidence/examples

openclaw memory status:

Memory Search (main)
Provider: openai (requested: openai)
Model: mistral/mistral-embed
Sources: memory, sessions
Indexed: 0/1441 files · 0 chunks
Dirty: yes
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
Dreaming: off
Embeddings: ready
By source:
  memory · 0/832 files · 0 chunks
  sessions · 0/609 files · 0 chunks
Vector store: ready
Semantic vectors: ready
Vector path: ~/.npm-global/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so
FTS: ready
Embedding cache: enabled (0 entries)
Cache cap: 50000
Batch: enabled (failures 0/2)
Recall store: 4184 entries · 2 promoted · 4183 concept-tagged · 132 spaced · scripts=4180 latin, 3 mixed
Recall path: ~/.openclaw/workspace/memory/.dreams/short-term-recall.json
Recall updated: 2026-05-11T03:00:09.767Z
Dreaming artifacts: diary present · 6 corpus files · ingestion state present
Dream corpus: ~/.openclaw/workspace/memory/.dreams/session-corpus
Dream ingestion: ~/.openclaw/workspace/memory/.dreams/session-ingestion.json
Dream diary: ~/.openclaw/workspace/DREAMS.md

openclaw memory index --force --verbose:

...
memory Index (main)
Provider: openai (requested: openai)
Model: mistral/mistral-embed
Sources: memory (MEMORY.md + ~/.openclaw/workspace/memory/*.md), sessions (~/.openclaw/agents/main/sessions/*.jsonl)

21:28:40 [memory] sync: indexing memory files
21:28:40 [memory] embeddings: openai batch submit
21:28:40 [memory] embeddings: openai batch created
21:28:40 [memory] openai batch batch_745ebeb0e7b84478a4bbf652 validating; waiting 2000ms
21:28:42 [memory] openai batch batch_745ebeb0e7b84478a4bbf652 validating; waiting 2000ms
21:28:44 [memory] openai batch batch_745ebeb0e7b84478a4bbf652 in_progress; waiting 2000ms
21:28:47 [memory] embeddings: openai batch submit
21:28:47 [memory] embeddings: openai batch created
21:28:47 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 validating; waiting 2000ms
21:28:49 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 validating; waiting 2000ms
21:28:51 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 validating; waiting 2000ms
21:28:53 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 validating; waiting 2000ms
21:28:55 [memory] openai batch batch_e5dec2ec999d4c44b4a96102 in_progress; waiting 2000ms
21:28:57 [memory] embeddings: openai batch submit
21:28:57 [memory] embeddings: openai batch created
21:28:57 [memory] openai batch batch_1f08116ef8634c52bf317bc0 validating; waiting 2000ms
Indexing memory files (batch)... 2/831 · elapsed 0:18 · eta 124:57 0%^C

Additional information

I don't have any access to any premium models and I have tried implement this feature myself but using gemma4 and opencode didn't succeed to implement this successfully.

That should be a hint that the code is overly complex and should be simplified.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering