openclaw - ✅(Solved) Fix [Bug]: memory index --force consistently fails with `fetch failed` while direct node fetch to OpenRouter /v1/embeddings succeeds (Linux, 2026.4.21 & 2026.4.23) [1 pull requests, 2 comments, 1 participants]

Q: Expected behavior

`openclaw memory index --force` should successfully reach the configured OpenAI-compatible endpoint (OpenRouter in this case), generate embeddings for all 83 markdown files, and report `Embeddings: ready` with `Batch: enabled`. If a transient error does occur, the `Batch: disabled (failures 0/2)` latch should reset on gateway restart so retries are possible without manual state surgery. In short: parity with what direct `node fetch` already achieves against the same endpoint, with the same key, from the same Node runtime, in ~1.8 seconds.

Inhum · 2026-04-25T09:57:35Z

[openclaw] openclaw memory index --force consistently fails with Memory index failed main : fetch failed on a Linux host, while direct node fetch calls to the… `openclaw memory index --force` consistently fails with `Memory index failed (main): fetch failed` on a Linux host, while direct `node fetch()` calls to the **same** OpenRouter `/v1/embeddings` endpoint from the **same** Node runtime and with the **same** API key succeed in ~1.8 seconds and return a valid 1536-dim vector. The same is observed for `[model-pricing]` background fetches inside the gateway. This reproduces on **OpenClaw 2026.4.21 and 2026.4.23** (latest), Node 22.22.2, Debian 13. It survives a clean reboot, a config rewrite, a full network/DNS reset, and config-level batch concurrency tuning. It looks similar in symptoms to #56427, #56901, #58255, but none of those issues' workarounds resolve it. # PR #71678: Fix: Issue 71522 memory embeddings - Repository: openclaw/openclaw - Author: sahilsatralkar - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/71678 ## Description (problem / solution / changelog) ## Summary Describe the problem and fix in 2–5 bullets: If this PR fixes a plugin beta-release blocker, title it `fix( ): beta blocker - ` and link the matching `Beta blocker: - ` issue labeled `beta-blocker`. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation. - Problem: non-batch remote memory embedding indexing ignored memorySearch.remote.batch.concurrency, batch status rendered disabled (failures 0/2) even when batch was configured off, and transport failures from guarded fetch surfaced as opaque fetch failed errors. - Why it matters: users tuning remote embedding load could not affect normal /embeddings indexing, status output suggested a false failure state, and remote endpoint failures were hard to diagnose without safe request context. - What changed: added memorySearch.remote.concurrency for non-batch indexing, added status.batch.disabledReason plus clearer CLI rendering, wrapped pre-response remote transport failures with sanitized diagnostics, and introduced shared remote URL joining/normalization for memory embedding endpoints. - What did NOT change (scope boundary): no SSRF/DNS pinning changes, no forced HTTP/1.1 transport changes, no new CLI command, no broad provider URL refactor beyond memory embedding endpoint joining. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [x] Docs - [x] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [ ] Gateway / orchestration - [ ] Skills / tool execution - [ ] Auth / tokens - [x] Memory / storage - [x] Integrations - [x] API / contracts - [ ] UI / DX - [ ] CI/CD / infra ## Linked Issue/PR - Closes #71522 - Related # - [ ] This PR fixes a bug or regression - Root cause: memory indexing used a hardcoded non-batch concurrency path (4) unless provider batch mode was enabled, so remote.batch.concurrency never applied to normal remote embedding requests; batch status also lacked a reason field, so the CLI always fell back to failure-count wording; remote guarded-fetch transport errors were rethrown without memory-specific sanitized request context. - Missing detection / guardrail: no config-resolution test or indexing test asserted separate non-batch vs batch concurrency controls; no status test covered configured-off batch output; no remote HTTP unit test asserted sanitized transport diagnostics. - Contributing context (if known): shared memory host helpers handled HTTP transport and batch URL normalization separately, but direct embedding URL construction and transport error wrapping were still ad hoc. ## Regression Test Plan (if applicable) For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write `N/A`. - Coverage level that should have caught this: - [x] Unit test - [x] Seam / integration test - [ ] End-to-end test - [ ] Existing coverage already sufficient - Target test or file: src/agents/memory-search.test.ts, extensions/memory-core/src/memory/index.test.ts, extensions/memory-core/src/cli.test.ts, src/memory-host-sdk/host/remote- http.test.ts, src/memory-host-sdk/host/remote-url.test.ts - Scenario the test should lock in: non-batch remote indexing honors remote.concurrency; batch indexing still uses remote.batch.concurrency; configured-off batch status prints reason instead of failure count; transport failures include sanitized origin/path plus cause code; remote URL joining never emits accidental double slashes. - Why this is the smallest reliable guardrail: the bug spans config resolution, manager runtime behavior, CLI presentation, and remote transport helpers; these focused tests cover each seam directly without needing live provider calls. - Existing test that already covers this (if any): existing memory config, manager, and CLI suites covered nea

openclaw2026-04-25 09:57:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#71522•Fetched 2026-04-26 05:11:57

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Inhum

Participants

Inhum

Timeline (top)

commented ×2cross-referenced ×2labeled ×1mentioned ×1

openclaw memory index --force consistently fails with Memory index failed (main): fetch failed on a Linux host, while direct node fetch() calls to the same OpenRouter /v1/embeddings endpoint from the same Node runtime and with the same API key succeed in ~1.8 seconds and return a valid 1536-dim vector. The same is observed for [model-pricing] background fetches inside the gateway.

This reproduces on OpenClaw 2026.4.21 and 2026.4.23 (latest), Node 22.22.2, Debian 13. It survives a clean reboot, a config rewrite, a full network/DNS reset, and config-level batch concurrency tuning.

It looks similar in symptoms to #56427, #56901, #58255, but none of those issues' workarounds resolve it.

Error Message

Provider: openai (requested: openai) Model: text-embedding-3-small Indexed: 81/83 files · 1293 chunks Dirty: yes Embeddings: unavailable Embeddings error: fetch failed Embedding cache: enabled (1292 entries) Batch: disabled (failures 0/2)

Root Cause

(Note the doubled // in the URL — appears for every OpenClaw outbound request; presumably harmless because Cloudflare normalizes, but mentioning in case it's a clue.)

Fix Action

Fix / Workaround

It looks similar in symptoms to #56427, #56901, #58255, but none of those issues' workarounds resolve it.

New memory files are not indexed. Any markdown added after the failure point gets full-text search only; semantic recall silently degrades to BM25.
Existing index keeps working from cache (1292 chunks already cached), so the failure is not loud — users may not notice their memory is going stale until much later, when recall quality drops on newer topics.
Batch: disabled (failures 0/2) latch is sticky across gateway restart, system reboot, config change, and memory status --fix. No documented way to reset it without source-level intervention.
Recovery requires a workaround (manual grep-based search per issue #56901, or switching to local embeddings per issue #70577), neither of which is a real fix.
Other gateway features that hit OpenRouter are also affected — e.g. [model-pricing] background fetch fails with the same error, suggesting the bug is in a shared HTTP path, not embedding-specific.

PR fix notes

PR #71678: Fix: Issue 71522 memory embeddings

Repository: openclaw/openclaw
Author: sahilsatralkar
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/71678

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

Problem: non-batch remote memory embedding indexing ignored memorySearch.remote.batch.concurrency, batch status rendered disabled (failures 0/2) even when batch was configured off, and transport failures from guarded fetch surfaced as opaque fetch failed errors.
Why it matters: users tuning remote embedding load could not affect normal /embeddings indexing, status output suggested a false failure state, and remote endpoint failures were hard to diagnose without safe request context.
What changed: added memorySearch.remote.concurrency for non-batch indexing, added status.batch.disabledReason plus clearer CLI rendering, wrapped pre-response remote transport failures with sanitized diagnostics, and introduced shared remote URL joining/normalization for memory embedding endpoints.
What did NOT change (scope boundary): no SSRF/DNS pinning changes, no forced HTTP/1.1 transport changes, no new CLI command, no broad provider URL refactor beyond memory embedding endpoint joining.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #71522
Related #
This PR fixes a bug or regression
- Root cause: memory indexing used a hardcoded non-batch concurrency path (4) unless provider batch mode was enabled, so remote.batch.concurrency never applied to normal remote embedding requests; batch status also lacked a reason field, so the CLI always fell back to failure-count wording; remote guarded-fetch transport errors were rethrown without memory-specific sanitized request context.
- Missing detection / guardrail: no config-resolution test or indexing test asserted separate non-batch vs batch concurrency controls; no status test covered configured-off batch output; no remote HTTP unit test asserted sanitized transport diagnostics.
- Contributing context (if known): shared memory host helpers handled HTTP transport and batch URL normalization separately, but direct embedding URL construction and transport error wrapping were still ad hoc.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
- Target test or file: src/agents/memory-search.test.ts, extensions/memory-core/src/memory/index.test.ts, extensions/memory-core/src/cli.test.ts, src/memory-host-sdk/host/remote- http.test.ts, src/memory-host-sdk/host/remote-url.test.ts
- Scenario the test should lock in: non-batch remote indexing honors remote.concurrency; batch indexing still uses remote.batch.concurrency; configured-off batch status prints reason instead of failure count; transport failures include sanitized origin/path plus cause code; remote URL joining never emits accidental double slashes.
- Why this is the smallest reliable guardrail: the bug spans config resolution, manager runtime behavior, CLI presentation, and remote transport helpers; these focused tests cover each seam directly without needing live provider calls.
- Existing test that already covers this (if any): existing memory config, manager, and CLI suites covered nearby behavior but not these exact failure modes.
- If no new test is added, why not: N/A

User-visible / Behavior Changes

New config: memorySearch.remote.concurrency with default 4 for normal non-batch remote embedding requests.
memorySearch.remote.batch.concurrency remains batch-only.
openclaw memory status --deep batch output now distinguishes configured-off and provider-unavailable/unsupported states from real failure-limit disablement.
Remote memory transport failures now include sanitized request context like origin/path and low-level cause codes without leaking query strings or bearer tokens.

Diagram (if applicable)

For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write N/A.

  Before:
  [memory index] -> [non-batch remote embeddings] -> [fixed concurrency 4]
  [memory status] -> [batch disabled] -> ["failures 0/2"]
  [guarded fetch transport error] -> ["fetch failed"]

  After:
  [memory index] -> [remote.concurrency] -> [controlled non-batch request fanout]
  [memory status] -> [disabledReason] -> ["configured off" / "provider unsupported" / "failures 2/2"]
  [guarded fetch transport error] -> [sanitized origin/path + cause] -> [actionable diagnostics]

Security Impact (required)

New permissions/capabilities? (No)
Secrets/tokens handling changed? (Yes)
New/changed network calls? (No)
Command/tool execution surface changed? (No)
Data access scope changed? (No)
If any Yes, explain risk + mitigation: transport error reporting now changes how remote request failures are surfaced, but only by adding sanitized origin/path and cause details. Query strings, bearer tokens, headers, and request bodies are explicitly excluded from diagnostics.

Repro + Verification

Environment

OS: macOS
Runtime/container: Node v25.5.0, pnpm 10.33.0
Model/provider: mocked memory embedding providers in unit/seam tests
Integration/channel (if any): N/A
Relevant config (redacted): memorySearch.provider=openai|gemini|voyage, memorySearch.remote.baseUrl=<redacted>, memorySearch.remote.concurrency=<n>, optional memorySearch.remote.batch.*

Steps

Configure remote memory embeddings with a custom remote endpoint and set only memorySearch.remote.batch.concurrency.
Run openclaw memory index --force --agent main and inspect concurrency behavior / status output.
Trigger a guarded-fetch transport failure to the remote embeddings endpoint and inspect the surfaced error.

Expected

Non-batch indexing should have a dedicated concurrency control.
Batch status should explain why batch is disabled.
Transport failures should include safe endpoint context.

Actual

Non-batch indexing used fixed concurrency 4.
Batch disabled output implied failures even when configured off.
Transport failures were opaque.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: ran focused tests for config resolution, memory manager indexing behavior, CLI batch rendering, remote HTTP diagnostics, remote URL joining; ran repeated pnpm build; ran codex review --base origin/main.
Edge cases checked: remote.concurrency defaulting/clamping, batch-concurrency isolation, configured-off vs failure-limit batch rendering, query-string/token redaction in transport errors, trailing/multiple slash URL normalization.
What you did not verify: live reproduction against a real OpenRouter/OpenAI-compatible remote endpoint; provider batch implementations were not separately exercised because this PR did not directly change extensions/openai/embedding-batch.ts or extensions/voyage/embedding-batch.ts.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes)
Config/env changes? (Yes)
Migration needed? (No)
If yes, exact upgrade steps: optional only; users can set memorySearch.remote.concurrency if they need to tune non-batch remote embedding fanout.

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk: new public memory status field (batch.disabledReason) changes the SDK contract surface.
- Mitigation: field is optional and additive only; existing fields remain unchanged.
Risk: stricter remote URL normalization rejects base URLs with query strings/fragments.
- Mitigation: this is intentional to avoid malformed endpoints and accidental credential leakage; standard provider base URLs are unaffected.
Risk: local validation gate noise from pnpm check:changed.
- Mitigation: focused tests and full builds passed; the remaining gate failure was an unrelated local heavy-check lock recursion in the repo lint path, not a code failure from this PR.

Built with Codex

Changed files

docs/concepts/memory-search.md (modified, +4/-0)
docs/reference/memory-config.md (modified, +11/-5)
extensions/memory-core/src/cli.runtime.ts (modified, +32/-1)
extensions/memory-core/src/cli.test.ts (modified, +83/-0)
extensions/memory-core/src/memory/index.test.ts (modified, +106/-10)
extensions/memory-core/src/memory/manager-embedding-ops.ts (modified, +7/-1)
extensions/memory-core/src/memory/manager.ts (modified, +32/-4)
extensions/voyage/embedding-provider.ts (modified, +2/-1)
packages/memory-host-sdk/src/host/batch-utils.ts (modified, +3/-1)
packages/memory-host-sdk/src/host/embeddings-remote-provider.ts (modified, +2/-1)
packages/memory-host-sdk/src/host/remote-url.test.ts (added, +34/-0)
packages/memory-host-sdk/src/host/remote-url.ts (added, +16/-0)
packages/memory-host-sdk/src/host/types.ts (modified, +7/-0)
src/agents/memory-search.test.ts (modified, +64/-0)
src/agents/memory-search.ts (modified, +2/-0)
src/config/config.schema-regressions.test.ts (modified, +25/-0)
src/config/schema.base.generated.ts (modified, +18/-0)
src/config/schema.help.quality.test.ts (modified, +1/-0)
src/config/schema.help.ts (modified, +2/-0)
src/config/schema.labels.ts (modified, +1/-0)
src/config/types.tools.ts (modified, +2/-0)
src/config/zod-schema.agent-runtime.ts (modified, +15/-2)
src/memory-host-sdk/engine-embeddings.ts (modified, +1/-0)
src/memory-host-sdk/engine-storage.ts (modified, +1/-0)
src/memory-host-sdk/host/batch-utils.ts (modified, +3/-1)
src/memory-host-sdk/host/embeddings-remote-provider.ts (modified, +2/-1)
src/memory-host-sdk/host/remote-http.test.ts (modified, +53/-0)
src/memory-host-sdk/host/remote-http.ts (modified, +70/-10)
src/memory-host-sdk/host/remote-url.test.ts (added, +34/-0)
src/memory-host-sdk/host/remote-url.ts (added, +16/-0)
src/memory-host-sdk/host/types.ts (modified, +7/-0)
src/plugins/cli-registry-loader.ts (modified, +27/-1)
src/plugins/cli.test.ts (modified, +87/-0)

Code Example

status: 200 in 1882 ms; dims: 1536; err: none

---

default: 200 in 1.45s
tailscale: 200 in 1.42s

---

UNDICI: connecting to openrouter.ai using https:undefined
UNDICI: connected to openrouter.ai using https:h1
UNDICI: sending request to GET https://openrouter.ai//api/v1/models
UNDICI: received response to GET https://openrouter.ai//api/v1/models - HTTP 200

---

UNDICI: connecting to 127.0.0.1:11434:11434 using http:undefined
UNDICI: connection to 127.0.0.1:11434:11434 ... errored - connect ECONNREFUSED
UNDICI: request to GET http://127.0.0.1:11434//api/tags errored - connect ECONNREFUSED
UNDICI: connecting to openrouter.ai using https:undefined
UNDICI: connecting to openrouter.ai using https:undefined
UNDICI: connection to openrouter.ai using https:undefined errored -
UNDICI: request to POST https://openrouter.ai//api/v1/embeddings errored -
UNDICI: connection to openrouter.ai using https:undefined errored -
UNDICI: request to POST https://openrouter.ai//api/v1/embeddings errored -
Memory index failed (main): fetch failed

---

Provider: openai (requested: openai)
Model: text-embedding-3-small
Indexed: 81/83 files · 1293 chunks
Dirty: yes
Embeddings: unavailable
Embeddings error: fetch failed
Embedding cache: enabled (1292 entries)
Batch: disabled (failures 0/2)

---

"memorySearch": {
  "enabled": true,
  "provider": "openai",
  "model": "text-embedding-3-small",
  "remote": {
    "baseUrl": "https://openrouter.ai/api/v1",
    "apiKey": {
      "source": "env",
      "provider": "default",
      "id": "OPENROUTER_API_KEY"
    },
    "headers": {
      "HTTP-Referer": "https://github.com/openclaw/openclaw",
      "X-Title": "OpenClaw Jimmy"
    },
    "batch": {
      "enabled": false,
      "concurrency": 1,
      "timeoutMinutes": 10
    }
  },
  "query": {
    "hybrid": { "enabled": true, "vectorWeight": 0.7, "textWeight": 0.3 }
  },
  "extraPaths": [
    "/home/user/.openclaw/workspace/claude-history",
    "/home/user/.openclaw/workspace/chatgpt-history"
  ]
}

---

## What was already attempted

In rough chronological order, all confirmed via diag script and undici
debug. None changed CLI behaviour:

1. **DNS**: ISP resolver returns junk (`8.6.112.0`, `8.47.69.0`) for
   `openrouter.ai`. Switched to Tailscale MagicDNS (`100.100.100.100`),
   `getent ahosts openrouter.ai` now returns the correct
   `104.18.2.115 / 104.18.3.115`. Did not fix OpenClaw.

2. **IPv6**: kernel-disabled. AAAA records still appear in
   `dns.lookup` results but kernel rejects them with `EADDRNOTAVAIL`.
   Tested `NODE_OPTIONS=--dns-result-order=ipv4first`. Did not fix OpenClaw.

3. **`NODE_OPTIONS=--no-network-family-autoselection`** had been
   recommended somewhere in OpenClaw setup docs/output for "low-resource
   hosts". Removed. Did not fix OpenClaw.

4. **Tailscale exit node**: confirmed active and routing correctly
   (`ip route get 104.18.3.115 → dev tailscale0`). External IP visible as
   the AWS Lightsail address.

5. **batch config**: added
   `remote.batch = {enabled: false, concurrency: 1, timeoutMinutes: 10}`
   per docs. Did not fix.

6. **Model name**: changed from `openai/text-embedding-3-small` (with
   the OpenRouter-style provider prefix) to `text-embedding-3-small` per
   docs and the recommendation here:
   <https://www.answeroverflow.com/m/1476607155758039050>. Did not fix.

7. **Headers**: added `HTTP-Referer` and `X-Title`. Did not fix.

8. **CA / TLS**: no `NODE_EXTRA_CA_CERTS`, no `NODE_TLS_REJECT_UNAUTHORIZED`
   in env. `/proc/$GW_PID/environ` clean (sanitized output in diag log).

9. **Reinstall**: `npm install -g [email protected]`. Did not fix.

10. **Clean reboot**: did not fix. The `Batch: disabled (failures 0/2)`
    latch survives reboots.

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

It looks similar in symptoms to #56427, #56901, #58255, but none of those issues' workarounds resolve it.

Steps to reproduce

Linux host with no Ollama listening on 127.0.0.1:11434 2. memorySearch configured against OpenRouter, OpenAI-compatible, with a valid OPENROUTER_API_KEY 3. Workspace with ~80 .md files 4. Run openclaw memory index --force 5. Observe Memory index failed (main): fetch failed 6. openclaw memory status --deep reports Batch: disabled (failures 0/2) — and this latch persists across gateway/system restarts

Expected behavior

openclaw memory index --force should successfully reach the configured OpenAI-compatible endpoint (OpenRouter in this case), generate embeddings for all 83 markdown files, and report Embeddings: ready with Batch: enabled. If a transient error does occur, the Batch: disabled (failures 0/2) latch should reset on gateway restart so retries are possible without manual state surgery.

In short: parity with what direct node fetch already achieves against the same endpoint, with the same key, from the same Node runtime, in ~1.8 seconds.

Actual behavior

What works

Direct node fetch to the exact same endpoint, same Node binary, same API key (from ~/.openclaw/.env):

status: 200 in 1882 ms; dims: 1536; err: none

curl through both the default route and --interface tailscale0:

default: 200 in 1.45s
tailscale: 200 in 1.42s

OpenClaw gateway in foreground (openclaw gateway run --verbose) successfully reaches OpenRouter (/v1/models returns HTTP 200) and Telegram, etc. Visible in undici debug:

UNDICI: connecting to openrouter.ai using https:undefined
UNDICI: connected to openrouter.ai using https:h1
UNDICI: sending request to GET https://openrouter.ai//api/v1/models
UNDICI: received response to GET https://openrouter.ai//api/v1/models - HTTP 200

(Note the doubled // in the URL — appears for every OpenClaw outbound request; presumably harmless because Cloudflare normalizes, but mentioning in case it's a clue.)

What fails

openclaw memory index --force, [model-pricing] OpenRouter pricing fetch failed, and any other CLI/gateway path that hits OpenRouter for embeddings:

UNDICI: connecting to 127.0.0.1:11434:11434 using http:undefined
UNDICI: connection to 127.0.0.1:11434:11434 ... errored - connect ECONNREFUSED
UNDICI: request to GET http://127.0.0.1:11434//api/tags errored - connect ECONNREFUSED
UNDICI: connecting to openrouter.ai using https:undefined
UNDICI: connecting to openrouter.ai using https:undefined
UNDICI: connection to openrouter.ai using https:undefined errored -
UNDICI: request to POST https://openrouter.ai//api/v1/embeddings errored -
UNDICI: connection to openrouter.ai using https:undefined errored -
UNDICI: request to POST https://openrouter.ai//api/v1/embeddings errored -
Memory index failed (main): fetch failed

The 127.0.0.1:11434 (Ollama) probe is fast (immediate ECONNREFUSED) and doesn't block. After it, OpenClaw opens two parallel sockets to openrouter.ai, both abort before TLS handshake completes. The errored - reason is empty (no message), which is consistent with an internal AbortController.abort() without a reason argument.

strace -e trace=network confirms two sockets to 104.18.x.x:443 reach afterConnect (TCP established) and are then destroy()-ed before TLS _start produces meaningful traffic. No EHOSTUNREACH / ECONNREFUSED / ETIMEDOUT from the kernel — the abort is fully client-side.

memory status --deep reports:

Provider: openai (requested: openai)
Model: text-embedding-3-small
Indexed: 81/83 files · 1293 chunks
Dirty: yes
Embeddings: unavailable
Embeddings error: fetch failed
Embedding cache: enabled (1292 entries)
Batch: disabled (failures 0/2)

The Batch: disabled (failures 0/2) latch persists across systemctl --user restart openclaw-gateway and across full host reboots. It survives memory status --fix. It survives rm -rf ~/.openclaw/memory/*.

OpenClaw version

2026.4.23

Operating system

Debian 13 (trixie), kernel 6.12.74+deb13+1-amd64, x86_64

Install method

npm global

Model

Text Embedding 3 Small

Provider / routing chain

OpenAI-compatible pointed at OpenRouter

Additional provider/model setup details

"memorySearch": {
  "enabled": true,
  "provider": "openai",
  "model": "text-embedding-3-small",
  "remote": {
    "baseUrl": "https://openrouter.ai/api/v1",
    "apiKey": {
      "source": "env",
      "provider": "default",
      "id": "OPENROUTER_API_KEY"
    },
    "headers": {
      "HTTP-Referer": "https://github.com/openclaw/openclaw",
      "X-Title": "OpenClaw Jimmy"
    },
    "batch": {
      "enabled": false,
      "concurrency": 1,
      "timeoutMinutes": 10
    }
  },
  "query": {
    "hybrid": { "enabled": true, "vectorWeight": 0.7, "textWeight": 0.3 }
  },
  "extraPaths": [
    "/home/user/.openclaw/workspace/claude-history",
    "/home/user/.openclaw/workspace/chatgpt-history"
  ]
}

openclaw config validate reports the file as valid.

version-and-env.log.txt direct-fetch-vs_cli.log.txt doctor.log diag-bug-report-20260425-1433.log

Logs, screenshots, and evidence

## What was already attempted

In rough chronological order, all confirmed via diag script and undici
debug. None changed CLI behaviour:

1. **DNS**: ISP resolver returns junk (`8.6.112.0`, `8.47.69.0`) for
   `openrouter.ai`. Switched to Tailscale MagicDNS (`100.100.100.100`),
   `getent ahosts openrouter.ai` now returns the correct
   `104.18.2.115 / 104.18.3.115`. Did not fix OpenClaw.

2. **IPv6**: kernel-disabled. AAAA records still appear in
   `dns.lookup` results but kernel rejects them with `EADDRNOTAVAIL`.
   Tested `NODE_OPTIONS=--dns-result-order=ipv4first`. Did not fix OpenClaw.

3. **`NODE_OPTIONS=--no-network-family-autoselection`** had been
   recommended somewhere in OpenClaw setup docs/output for "low-resource
   hosts". Removed. Did not fix OpenClaw.

4. **Tailscale exit node**: confirmed active and routing correctly
   (`ip route get 104.18.3.115 → dev tailscale0`). External IP visible as
   the AWS Lightsail address.

5. **batch config**: added
   `remote.batch = {enabled: false, concurrency: 1, timeoutMinutes: 10}`
   per docs. Did not fix.

6. **Model name**: changed from `openai/text-embedding-3-small` (with
   the OpenRouter-style provider prefix) to `text-embedding-3-small` per
   docs and the recommendation here:
   <https://www.answeroverflow.com/m/1476607155758039050>. Did not fix.

7. **Headers**: added `HTTP-Referer` and `X-Title`. Did not fix.

8. **CA / TLS**: no `NODE_EXTRA_CA_CERTS`, no `NODE_TLS_REJECT_UNAUTHORIZED`
   in env. `/proc/$GW_PID/environ` clean (sanitized output in diag log).

9. **Reinstall**: `npm install -g [email protected]`. Did not fix.

10. **Clean reboot**: did not fix. The `Batch: disabled (failures 0/2)`
    latch survives reboots.

Impact and severity

Severity: Medium.

Semantic memory search — a core advertised feature of OpenClaw — is effectively unusable for users on this configuration. Net impact:

New memory files are not indexed. Any markdown added after the failure point gets full-text search only; semantic recall silently degrades to BM25.
Existing index keeps working from cache (1292 chunks already cached), so the failure is not loud — users may not notice their memory is going stale until much later, when recall quality drops on newer topics.
Batch: disabled (failures 0/2) latch is sticky across gateway restart, system reboot, config change, and memory status --fix. No documented way to reset it without source-level intervention.
Recovery requires a workaround (manual grep-based search per issue #56901, or switching to local embeddings per issue #70577), neither of which is a real fix.
Other gateway features that hit OpenRouter are also affected — e.g. [model-pricing] background fetch fails with the same error, suggesting the bug is in a shared HTTP path, not embedding-specific.

The agent itself remains functional (Telegram bot keeps responding, chat completions to OpenRouter work via a different code path), so this is not a hard outage — but the feature is broken for a meaningful slice of users (anyone on OpenAI-compatible custom endpoints), and the silent staleness makes it a quality-of-results bug rather than a visible failure.

Additional information

version-and-env.log.txt direct-fetch-vs_cli.log.txt doctor.log diag-bug-report-20260425-1433.log

Hypotheses (unverified)

Things that look suspicious in the traces, but I could not confirm:

The CLI process opens two parallel sockets to openrouter.ai before the first one finishes TLS. If both are tied to a shared AbortController and one of them never makes progress (e.g., due to a cancelled fallback path), the abort cancels both — including the one that would have succeeded.
The Ollama probe (127.0.0.1:11434/api/tags) runs unconditionally even though memorySearch.provider = "openai". The probe itself is fast (immediate ECONNREFUSED), but its existence suggests the embedding-resolver still walks a "discovery" path that may hold a shared abort token.
The Batch: disabled (failures 0/2) latch reads as if it's persisted to disk somewhere outside ~/.openclaw/memory/main.sqlite (which I cleared). If so, the failure path may be self-perpetuating.
The doubled-slash URL pattern (https://openrouter.ai//api/v1/embeddings) appears in every OpenClaw outbound request. It's accepted by Cloudflare, but suggests baseUrl and path are joined naively — possibly by code that also has a corner case in error/abort handling.

extent analysis

TL;DR

The issue can be potentially resolved by investigating and fixing the parallel socket openings to openrouter.ai and the shared AbortController that might be causing the fetch to fail.

Guidance

Investigate the code that opens parallel sockets to openrouter.ai and determine if it's necessary to open two sockets simultaneously.
Check the AbortController usage and ensure that it's not aborting the successful socket due to a cancelled fallback path.
Verify if the Ollama probe is necessary when memorySearch.provider is set to "openai" and if it's contributing to the issue.
Look into the persistence of the Batch: disabled (failures 0/2) latch and determine how to reset it without source-level intervention.

Example

No code example is provided as the issue requires further investigation into the OpenClaw codebase.

Notes

The issue seems to be related to the networking and abort handling in the OpenClaw codebase. Further investigation is required to determine the root cause and provide a definitive fix.

Recommendation

Apply a workaround by temporarily disabling the parallel socket openings or modifying the AbortController usage to prevent the successful socket from being aborted. This can help mitigate the issue until a permanent fix is found.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

In short: parity with what direct node fetch already achieves against the same endpoint, with the same key, from the same Node runtime, in ~1.8 seconds.

#api #permission error #memory optimization #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: memory index --force consistently fails with `fetch failed` while direct node fetch to OpenRouter /v1/embeddings succeeds (Linux, 2026.4.21 & 2026.4.23) [1 pull requests, 2 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #71678: Fix: Issue 71522 memory embeddings

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment

Steps

Expected

Actual

Evidence

Human Verification (required)

Review Conversations

Compatibility / Migration

Risks and Mitigations

Built with Codex

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

What works

What fails

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Hypotheses (unverified)

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING