openclaw - ✅(Solved) Fix [Bug]: Builtin memory indexing ignores configured remote embedding batch timeout [4 pull requests, 8 comments, 5 participants]

unixwzrd · 2026-03-18T17:11:19Z

[openclaw] Builtin memory indexing appears to ignore the configured remote embedding batch timeout and still fails at a hardcoded 120 seconds. Builtin memory indexing appears to ignore the configured remote embedding batch timeout and still fails at a hardcoded 120 seconds. # PR #49937: Fix memory timeout on 2026 03 13 1 - Repository: openclaw/openclaw - Author: unixwzrd - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/49937 ## Description (problem / solution / changelog) ## Summary Describe the problem and fix in 2–5 bullets: - Problem: builtin memory indexing ignored the configured remote embedding batch timeout and still failed at a hardcoded 120 seconds. - Why it matters: large or slower remote embedding batches could not complete, so memory indexing failed before the corpus finished ingesting. - What changed: builtin memory embedding batch operations now use the configured batch timeout for remote embedding requests, and regression coverage was added. - What did NOT change (scope boundary): this does not change embedding model behavior, llama.cpp batch sizing, context-window limits, or downstream transport/server errors. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Memory / storage - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [ ] Integrations - [ ] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #49933 - Related #49933 ## User-visible / Behavior Changes - Builtin memory indexing no longer stops at a hardcoded 120 second timeout when remote embedding batches are slower than that. - Remote embedding batch timeout now follows the configured batch timeout value. ## Security Impact (required) - New permissions/capabilities? (`No`) - Secrets/tokens handling changed? (`No`) - New/changed network calls? (`No`) - Command/tool execution surface changed? (`No`) - Data access scope changed? (`No`) - If any `Yes`, explain risk + mitigation: ## Repro + Verification ### Environment - OS: macOS 15.6.1 - Runtime/container: local npm-installed OpenClaw build from stable `2026-03-13-1` - Model/provider: builtin memory indexing with remote embeddings via local llama.cpp-compatible embedding server (`bge-m3`) - Integration/channel (if any): N/A - Relevant config (redacted): builtin memory indexing enabled, remote embedding provider configured, batch timeout set above 120s ### Steps 1. Configure builtin memory indexing to use a remote embedding provider. 2. Use a corpus and embedding setup where one or more embedding batches take longer than 120 seconds. 3. Set the configured embedding batch timeout above 120 seconds. 4. Run `openclaw memory index`. ### Expected - Builtin memory indexing should honor the configured remote embedding batch timeout. ### Actual - Before this fix, indexing still failed at 120 seconds with: `Memory index failed (main): memory embeddings batch timed out after 120s` ## Evidence Attach at least one: - [x] Failing test/log before + passing after - [X] Trace/log snippets - [ ] Screenshot/recording - [ ] Perf numbers (if relevant) - First timeout issue ```text 23:52:23 [memory] embeddings: batch start 23:52:23 [memory] embeddings: batch start 23:52:24 [memory] embeddings: batch start ◇ Memory index failed (main): memory embeddings batch timed out after 120s ``` - Next failure after above fixed ```text 08:06:27 [memory] embeddings: batch start 08:06:27 [memory] embeddings: batch start 08:06:28 [memory] embeddings: batch start ◇ Memory index failed (main): fetch failed ``` - Success ```text 10:15:23 [memory] embeddings: batch start 10:15:24 [memory] embeddings: batch start ◇ Memory index updated (main). ``` ## Human Verification (required) What you personally verified (not just CI), and how: - Verified scenarios: - Reproduced builtin memory indexing failure at a fixed 120 seconds despite increasing configured timeout. - Applied this fix locally and confirmed indexing progressed beyond the old 120 second failure boundary. - Confirmed later failures changed class to downstream embedding-server limits rather than the fixed timeout. - Edge cases checked: - Regression coverage added for configured timeout usage in builtin memory batch handling. - What you did **not** verify: - I did not verify every embedding backend. - I did not verify unrelated transport/server failures beyond confirming they were separate from the 120s timeout bug. ## Review Conversations - [ ] I replied to or resolved every bot review conversation I addressed in this PR. - [ ] I left unresolved only the conversations that still need reviewer or maintainer judgment. If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers. ## Compatibility / Migration - Backward compatible? (`Yes`) - Config/env changes? (`No`) - Migration needed? (

openclaw2026-03-18 17:11:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#49933•Fetched 2026-04-08 01:01:06

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×8cross-referenced ×4labeled ×2mentioned ×2

Builtin memory indexing appears to ignore the configured remote embedding batch timeout and still fails at a hardcoded 120 seconds.

Error Message

Primary user-visible error: Memory index failed (main): memory embeddings batch timed out after 120s

Evidence that this is specifically a timeout-handling bug in OpenClaw:

increasing the configured timeout alone did not change the failure point
a local patch to builtin memory timeout handling changed runtime behavior immediately
after the patch, indexing moved past the old 120s failure boundary
batch resizing
silent fail incomplete index build

Root Cause

Consequence

builtin memory indexing fails before completion
users cannot complete archive ingestion without patching source or changing implementation
debugging is expensive because each rerun can take a long time

Fix Action

Fix / Workaround

Memory index failed (main): memory embeddings batch timed out after 120s After patching the timeout handling locally to use the configured value, indexing progressed much farther and eventually completed. Later failures were different issues in the embedding server stack, not the fixed 120 second ceiling.

2026.3.13 tested as stable baseline, with a local patch applied for verification

increasing the configured timeout alone did not change the failure point
a local patch to builtin memory timeout handling changed runtime behavior immediately
after the patch, indexing moved past the old 120s failure boundary
batch resizing
silent fail incomplete index build

PR fix notes

PR #49937: Fix memory timeout on 2026 03 13 1

Repository: openclaw/openclaw
Author: unixwzrd
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/49937

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

Problem: builtin memory indexing ignored the configured remote embedding batch timeout and still failed at a hardcoded 120 seconds.
Why it matters: large or slower remote embedding batches could not complete, so memory indexing failed before the corpus finished ingesting.
What changed: builtin memory embedding batch operations now use the configured batch timeout for remote embedding requests, and regression coverage was added.
What did NOT change (scope boundary): this does not change embedding model behavior, llama.cpp batch sizing, context-window limits, or downstream transport/server errors.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #49933
Related #49933

User-visible / Behavior Changes

Builtin memory indexing no longer stops at a hardcoded 120 second timeout when remote embedding batches are slower than that.
Remote embedding batch timeout now follows the configured batch timeout value.

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS: macOS 15.6.1
Runtime/container: local npm-installed OpenClaw build from stable 2026-03-13-1
Model/provider: builtin memory indexing with remote embeddings via local llama.cpp-compatible embedding server (bge-m3)
Integration/channel (if any): N/A
Relevant config (redacted): builtin memory indexing enabled, remote embedding provider configured, batch timeout set above 120s

Steps

Configure builtin memory indexing to use a remote embedding provider.
Use a corpus and embedding setup where one or more embedding batches take longer than 120 seconds.
Set the configured embedding batch timeout above 120 seconds.
Run openclaw memory index.

Expected

Builtin memory indexing should honor the configured remote embedding batch timeout.

Actual

Before this fix, indexing still failed at 120 seconds with: Memory index failed (main): memory embeddings batch timed out after 120s

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)
First timeout issue

23:52:23 [memory] embeddings: batch start
23:52:23 [memory] embeddings: batch start
23:52:24 [memory] embeddings: batch start
◇
Memory index failed (main): memory embeddings batch timed out after 120s

Next failure after above fixed

08:06:27 [memory] embeddings: batch start
08:06:27 [memory] embeddings: batch start
08:06:28 [memory] embeddings: batch start
◇
Memory index failed (main): fetch failed

Success

10:15:23 [memory] embeddings: batch start
10:15:24 [memory] embeddings: batch start
◇
Memory index updated (main).

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
- Reproduced builtin memory indexing failure at a fixed 120 seconds despite increasing configured timeout.
- Applied this fix locally and confirmed indexing progressed beyond the old 120 second failure boundary.
- Confirmed later failures changed class to downstream embedding-server limits rather than the fixed timeout.
Edge cases checked:
- Regression coverage added for configured timeout usage in builtin memory batch handling.
What you did not verify:
- I did not verify every embedding backend.
- I did not verify unrelated transport/server failures beyond confirming they were separate from the 120s timeout bug.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)
If yes, exact upgrade steps:

Failure Recovery (if this breaks)

How to disable/revert this change quickly: revert the patch to builtin memory embedding batch timeout handling
Files/config to restore:
- src/memory/manager-embedding-ops.ts
- src/memory/manager.embedding-batches.test.ts
Known bad symptoms reviewers should watch for:
- builtin memory indexing still failing at exactly 120 seconds for remote embedding batches
- timeout config no longer affecting remote embedding batch behavior

Risks and Mitigations

Risk:
- Longer remote embedding waits could surface slower downstream failures later in the run instead of failing at 120 seconds.
- Mitigation:
  - This is the intended behavior because the configured timeout should be honored; downstream server limits remain separate failures.

Changed files

.gitignore (modified, +1/-2)
src/memory/manager-embedding-ops.ts (modified, +12/-4)
src/memory/manager.embedding-batches.test.ts (modified, +20/-0)
src/memory/post-json.ts (modified, +29/-21)

PR #49947: fix(memory): honor configured remote batch timeout for builtin indexing

Repository: openclaw/openclaw
Author: cgdusek
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/49947

Description (problem / solution / changelog)

Summary

The issue and initial fix were authored by @unixwzrd in #49933 / #49937. This PR narrows scope to just the timeout bug fix with surrounding guardrails (default-path preservation, cache-key correctness).
Fixes builtin memory indexing so remote embedding batches honor the configured timeout instead of a hardcoded 120s ceiling.
Root cause: resolveBatchConfig() correctly computed batch.timeoutMs from agents.*.memorySearch.remote.batch.timeoutMinutes, but resolveEmbeddingTimeout("batch") ignored it and returned fixed constants.
Scope boundary: this PR only updates timeout resolution and adds one regression test; no retry policy, transport, or error-format behavior changes.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #49933
Related #49937

User-visible / Behavior Changes

Builtin memory indexing no longer times out at a fixed 120 seconds for remote embedding batches.
Remote batch timeout now follows agents.*.memorySearch.remote.batch.timeoutMinutes in this code path.

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (No)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)

Repro + Verification

Symptom evidence

Issue repro from #49933: Memory index failed (main): memory embeddings batch timed out after 120s despite configured higher timeout.

Root-cause evidence

Timeout config is resolved into this.batch.timeoutMs in resolveBatchConfig().
Before fix, resolveEmbeddingTimeout("batch") in src/memory/manager-embedding-ops.ts returned fixed EMBEDDING_BATCH_TIMEOUT_REMOTE_MS for remote provider batch embeds.
After fix, remote batch path reads this.batch.timeoutOverrideMs when valid.

Local verification

pnpm test -- src/memory/manager.embedding-batches.test.ts
pnpm test -- src/memory/manager.batch.test.ts
pnpm build

Added regression test:

uses configured remote batch timeout for builtin embedding batches

Human Verification (required)

Verified: timeout resolution now returns configured remote batch timeout in builtin embedding path.
Verified: targeted memory test suites pass.
Verified: repo build passes locally.
Not verified: live provider matrix across all embedding backends.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (No)
Migration needed? (No)

Failure Recovery (if this breaks)

Revert this commit.
Files:
- src/memory/manager-embedding-ops.ts
- src/memory/manager.embedding-batches.test.ts

Risks and Mitigations

Risk: minimal; behavior change is constrained to remote builtin batch-timeout resolution.
Mitigation: regression test added and existing memory batch tests kept green.

Co-Authored-By: M S [email protected]

Changed files

src/memory/manager-embedding-ops.ts (modified, +9/-0)
src/memory/manager-sync-ops.ts (modified, +16/-1)
src/memory/manager.embedding-batches.test.ts (modified, +20/-0)
src/memory/manager.get-concurrency.test.ts (modified, +69/-2)
src/memory/manager.ts (modified, +58/-7)

PR #49963: fix(memory): respect configured batch timeout for direct embedding calls

Repository: openclaw/openclaw
Author: qiuyuemartin-max
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/49963

Description (problem / solution / changelog)

fix(memory): respect configured batch timeout for direct embedding calls

Fixes #49933

Problem

The memory indexing system has two embedding execution paths:

Batch API path (OpenAI/Gemini/Voyage): correctly uses memory.remote.batch.timeoutMinutes config
Direct embedding path (local/custom providers, or fallback): hardcoded to 120 seconds, ignoring config

Users with slower embedding servers (e.g., local llama.cpp) cannot increase the timeout beyond 120s, causing indexing failures even when they've configured timeoutMinutes to a higher value.

Root Cause

In src/memory/manager-embedding-ops.ts:

Line 39: EMBEDDING_BATCH_TIMEOUT_REMOTE_MS = 2 * 60_000 (hardcoded 120s)
Line 931-937: resolveEmbeddingTimeout("batch") returns hardcoded value, never reads this.batch.timeoutMs

Solution

Make resolveEmbeddingTimeout("batch") respect the configured timeout:

private resolveEmbeddingTimeout(kind: "query" | "batch"): number {
  const isLocal = this.provider?.id === "local";
  
  if (kind === "query") {
    return isLocal ? EMBEDDING_QUERY_TIMEOUT_LOCAL_MS : EMBEDDING_QUERY_TIMEOUT_REMOTE_MS;
  }
  
  // For batch embeddings, use configured timeout if available
  if (this.batch.timeoutMs > 0) {
    return this.batch.timeoutMs;
  }
  
  // Fallback to conservative defaults when not configured
  return isLocal ? 10 * 60_000 : 2 * 60_000;
}

Changes

File: src/memory/manager-embedding-ops.ts
Lines changed: +5 -2 (net: +3)
Removed: Hardcoded EMBEDDING_BATCH_TIMEOUT_REMOTE_MS and EMBEDDING_BATCH_TIMEOUT_LOCAL_MS constants
Modified: resolveEmbeddingTimeout to read from config

Testing

User @Hung2124 confirmed in #49933:

✅ Local patch with configured timeout > 120s: indexing succeeded beyond 120s
✅ Subsequent failures were different (embedding server constraints, not timeout)

Backward Compatibility

✅ When timeoutMinutes is not configured: uses same defaults as before (120s remote, 600s local)
✅ No behavior change for batch API path (already uses config correctly)
✅ No breaking changes to public API

Impact

Fixes timeout for all direct embedding calls (local/custom providers)
Fixes timeout for batch API fallback scenarios
Allows users to configure timeout for large corpus indexing

Changed files

src/memory/manager-embedding-ops.ts (modified, +25/-6)

PR #49981: memory: thread timeoutMs through remote embedding batch HTTP calls

Repository: openclaw/openclaw
Author: yintamaa
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/49981

Description (problem / solution / changelog)

withRemoteHttpResponse was calling fetchWithSsrFGuard without a timeoutMs, so all HTTP requests for remote embedding batch operations (file upload, batch submit, status poll, result download) had no per-request timeout — even when remote.batch.timeoutMinutes was configured.

Thread the timeoutMs from EmbeddingBatchExecutionParams through the full call chain: withRemoteHttpResponse → postJson → postJsonWithRetry → uploadBatchJsonlFile, and all internal helpers in batch-openai, batch-voyage, and batch-gemini. All new params are optional so existing callers are unaffected.

Summary

Fixes #49933

withRemoteHttpResponse was calling fetchWithSsrFGuard without a timeoutMs, so all HTTP requests for remote embedding batch operations had no per-request timeout — even when remote.batch.timeoutMinutes was configured by the user.

Add timeoutMs?: number to withRemoteHttpResponse, postJson, postJsonWithRetry, and uploadBatchJsonlFile
Thread timeoutMs from EmbeddingBatchExecutionParams through all internal helpers in batch-openai, batch-voyage, and batch-gemini
All new params are optional — existing callers are unaffected

Test plan

Added a unit test to post-json.test.ts verifying timeoutMs is forwarded to withRemoteHttpResponse
pnpm test -- src/memory/post-json.test.ts passes

Changed files

src/infra/outbound/channel-selection.ts (modified, +2/-2)
src/memory/batch-gemini.ts (modified, +13/-1)
src/memory/batch-http.ts (modified, +2/-0)
src/memory/batch-openai.ts (modified, +18/-1)
src/memory/batch-upload.ts (modified, +2/-0)
src/memory/batch-voyage.ts (modified, +16/-1)
src/memory/manager-embedding-ops.ts (modified, +4/-1)
src/memory/post-json.test.ts (modified, +17/-0)
src/memory/post-json.ts (modified, +2/-0)
src/memory/remote-http.ts (modified, +2/-0)
src/plugins/bundle-mcp.ts (modified, +1/-0)
test/helpers/extensions/discord-provider.test-support.ts (modified, +10/-1)

Code Example

Memory index failed (main): memory embeddings batch timed out after 120s

---

Primary user-visible error:
Memory index failed (main): memory embeddings batch timed out after 120s

---
Evidence that this is specifically a timeout-handling bug in OpenClaw:

- increasing the configured timeout alone did not change the failure point
- a local patch to builtin memory timeout handling changed runtime behavior immediately
- after the patch, indexing moved past the old 120s failure boundary
- batch resizing
- silent fail incomplete index build

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Summary

Builtin memory indexing appears to ignore the configured remote embedding batch timeout and still fails at a hardcoded 120 seconds.

Steps to reproduce

Configure OpenClaw builtin memory indexing to use a remote embedding provider.
Use a corpus large enough that one or more embedding batches take longer than 120 seconds.
Increase the configured batch timeout above 120 seconds.
Run openclaw memory index.
Observe that indexing still fails at 120 seconds.

Expected behavior

Builtin memory indexing should honor the configured remote embedding batch timeout for embedding batches.

Actual behavior

Indexing failed with a hard 120 second timeout even after increasing the configured timeout.

Observed error:

Memory index failed (main): memory embeddings batch timed out after 120s

OpenClaw version

2026.3.13 tested as stable baseline, with a local patch applied for verification

Operating system

macOS 15.7.4

Install method

Global npm install from locally built forked repo

Model

Qwen3.5 35B A3B GGUF (thinking w/tool calling) / bge-m3-Q8_0-GGUF (embedding)

Provider / routing chain

OpenClaw builtin memory indexing -> LAN HTTP embedding -> local llama.cpp-compatible embedding server

Config file / key location

~/.openclaw/openclaw.json ~/.env

Additional provider/model setup details

Redacted summary:

builtin memory indexing enabled
remote embedding provider configured
batch timeout increased above 120 seconds

The debugging path also exposed separate downstream constraints in the embedding server:

earlier context-window pressure with a smaller embedder
later physical batch-size limits (BATCH_SIZE / UBATCH_SIZE)
one later transport-level fetch failed

Logs, screenshots, and evidence

Primary user-visible error:
Memory index failed (main): memory embeddings batch timed out after 120s

---
Evidence that this is specifically a timeout-handling bug in OpenClaw:

- increasing the configured timeout alone did not change the failure point
- a local patch to builtin memory timeout handling changed runtime behavior immediately
- after the patch, indexing moved past the old 120s failure boundary
- batch resizing
- silent fail incomplete index build

Impact and severity

Affected users/systems/channels

users running builtin memory indexing against slower remote embedding providers
especially local/self-hosted embedding servers processing large archives

Severity

blocks workflow

Frequency

reproducible under slower or larger embedding batches

Consequence

builtin memory indexing fails before completion
users cannot complete archive ingestion without patching source or changing implementation
debugging is expensive because each rerun can take a long time

Additional information

This was found while indexing a moderately large archive corpus. Once the timeout bug was patched locally, the remaining failures changed class:

embedding context/window constraints
physical batch-size constraints
later one transport-level fetch failed

That change in failure mode strongly suggests the original 120 second limit was an OpenClaw-side bug rather than the underlying embedding server simply being too slow.

extent analysis

Fix Plan

To fix the issue, we need to update the timeout handling in the OpenClaw builtin memory indexing to use the configured remote embedding batch timeout.

Here are the steps:

Update the openclaw.json configuration file to include the desired batch timeout.
Modify the OpenClaw source code to use the configured batch timeout instead of the hardcoded 120 seconds.

Example code snippet:

# Get the configured batch timeout from the openclaw.json file
batch_timeout = config.get('remote_embedding_batch_timeout')

# Use the configured batch timeout in the memory indexing function
def memory_indexing(embedding_provider, batch_size, batch_timeout):
    # ...
    try:
        # Call the embedding provider with the configured batch timeout
        response = embedding_provider.call(batch, timeout=batch_timeout)
        # ...
    except TimeoutError:
        # Handle the timeout error
        logging.error(f"Memory index failed: memory embeddings batch timed out after {batch_timeout}s")

Verification

To verify that the fix worked, run the OpenClaw memory indexing with a large corpus and a configured batch timeout above 120 seconds. The indexing should now complete without failing at the 120-second mark.

Extra Tips

Make sure to update the OpenClaw version to include the fix.
If you're using a local patch, ensure that it's properly applied and tested.
Consider adding additional logging and error handling to help diagnose any future issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Builtin memory indexing should honor the configured remote embedding batch timeout for embedding batches.

#api #ssr #installation #tensor shape #autograd error #configuration error #environment variable #network issue #logging issue #authentication issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.