openclaw - ✅(Solved) Fix Bug: Memory Index Fails for Main Agent with 'fetch failed' Error [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56901Fetched 2026-04-08 01:46:19
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

Error Message

``` Memory index failed (main): fetch failed ```

Root Cause

Issue Summary

OpenClaw main agent memory indexing consistently fails with fetch failed error when attempting to generate embeddings via Gemini API. Other agents appear to "succeed" only because they have no memory files to index.

Fix Action

Fix / Workaround

Current Workaround

```json { "memorySearch": { "enabled": false } } ```

Manual Workaround Created

A simple grep-based search script has been created: ```bash ~/.openclaw/workspace/scripts/memory-search.sh "keyword" ```

Priority

Medium - Workaround exists (manual search script) but core functionality (semantic memory search) is lost.

PR fix notes

PR #59137: fix(memory): preserve retry state and embedding cache across reindex rollback

Description (problem / solution / changelog)

Summary

  • Problem: the current extensions/memory-core reindex path preserves the last committed index on rollback, but still loses important recovery state. A failed full session rebuild can forget that the next retry must sweep all sessions, and successful embedding batches can still be discarded when a later batch fails during a safe temp-DB reindex.
  • Why it matters: memory reindex should remain atomic for the committed index while still preserving enough retry and cache progress to converge efficiently after transient failures.
  • Why this PR is still worth landing even without directly closing one issue: the open issues in this area are broader transport/provider availability bugs. This PR does not claim to stop fetch failed from happening, but it does materially reduce the damage when those failures do happen. In manual verification, the same offline force-reindex failure left Embedding cache at 0 on main, while this branch preserved 3540 cache entries after rollback. That means fewer repeated remote embedding calls, less wasted progress, and a faster path to recovery on the next retry even before the larger transport-layer issues are fixed.
  • What changed: this PR snapshots and restores sync state across full reindex rollback, adds an explicit sessionFullRetryPending state for failed full session rebuilds, mirrors successful embedding-cache writes back to the original DB during safe reindex, and adds focused regression coverage for rollback recovery and cache preservation.
  • What did NOT change (scope boundary): no new config knobs, no provider API changes, no query behavior changes, and no broader memory pipeline redesign outside rollback/retry correctness and cache durability.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

Root Cause / Regression History (if applicable)

  • Root cause: the memory-core reindex flow was correctly preserving the committed SQLite index on rollback, but it was not preserving all of the in-memory state needed for the next retry. In particular, a failed full session rebuild could leave the manager with no dirty-file filter and no explicit signal that the next normal sync still needed a full session pass. Separately, successful embedding batches during safe temp-DB reindex were written only to the temp DB, so a later failure threw away useful cache work.
  • Missing detection / guardrail: existing atomic reindex coverage protected "keep the previous index on failure", but it did not cover the follow-up retry semantics after rollback, post-compaction targeted refresh behavior when a full retry was still pending, or cache preservation across temp-DB rollback.
  • Prior context (git blame, prior PR, issue, or refactor if known): this is the current-main continuation of the reliability work attempted in #55497, adapted to the moved memory-core plugin architecture under extensions/memory-core/src/memory/* instead of the old src/memory/* paths.
  • Why this regressed now: after the memory engine moved behind the plugin/runtime split, the old fix no longer applied cleanly, and the rebuilt atomic reindex path preserved the database contents but still had gaps in retry-state restoration and cache durability.
  • If unknown, what was ruled out: this is not just "main moved on" or a pure merge/rebase issue. The remaining behavior gap exists in current origin/main logic and reproduces in focused local tests without needing the old branch structure.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
    • extensions/memory-core/src/memory/manager.atomic-reindex.test.ts
    • extensions/memory-core/src/memory/manager.embedding-batches.test.ts
    • extensions/memory-core/src/memory/embedding-manager.test-harness.ts
  • Scenario the test should lock in:
    • a failed full session rebuild keeps sessionsDirty set and records that the next normal sync still needs a full session retry
    • a targeted post-compaction refresh can update one transcript without accidentally clearing the pending full-session retry
    • successful embedding batches are cached before a later batch failure and are reused on the next retry
    • enabling cache on an existing index created before the cache table existed still works cleanly
  • Why this is the smallest reliable guardrail: the bug is in local sync-state transitions and SQLite-backed cache persistence, so focused memory-core tests exercise the relevant contract directly without needing live providers or gateway E2E setup.
  • Existing test that already covers this (if any):
    • the existing atomic reindex test already covered preserving the last committed index on failure
  • If no new test is added, why not:
    • N/A

User-visible / Behavior Changes

  • Full memory reindex keeps the previous committed index on failure and now also preserves enough dirty/retry state for the next sync to recover correctly.
  • If a full session rebuild fails mid-run, the next normal sync still knows it must perform a full session retry instead of silently downgrading to an incomplete incremental pass.
  • Successful embedding batches can now survive a later safe-reindex rollback through cache mirroring, which reduces repeated provider work on retry.
  • Existing indexes created before the embedding cache table existed can enable cache without failing reindex.

Diagram (if applicable)

Before:
[force reindex] -> [temp DB builds some state] -> [later failure]
               -> [committed DB restored]
               -> [full-session retry intent / temp-cache progress partly lost]

After:
[force reindex] -> [temp DB builds some state]
               -> [successful cache batches mirrored to committed DB]
               -> [later failure]
               -> [committed DB restored + retry snapshot restored + full-session retry flagged]
               -> [next normal sync converges]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 24.13.0 + pnpm + Vitest
  • Model/provider: OpenAI-compatible remote embeddings (provider: openai, model: BAAI/bge-m3) for manual verification; mocked provider in targeted tests
  • Integration/channel (if any): CLI / builtin memory index
  • Relevant config (redacted):
    • memorySearch.provider: "openai"
    • memorySearch.model: "BAAI/bge-m3"
    • memorySearch.extraPaths: [~/Applications/Notes/Notebook]
    • memorySearch.cache.enabled: true
    • targeted tests also cover experimental.sessionMemory: true

Steps

  1. Start from an existing successful memory index with Embedding cache populated.
  2. Clear only embedding_cache from ~/.openclaw/memory/main.sqlite, leaving the committed index rows intact.
  3. Disable networking and run openclaw memory index --force on main.
  4. Record openclaw memory status and sqlite3 ~/.openclaw/memory/main.sqlite 'SELECT COUNT(*) FROM embedding_cache;'.
  5. Switch to fix/memory-reindex-recovery and repeat the same offline openclaw memory index --force and openclaw memory status.
  6. Run pnpm test -- manager.atomic-reindex.test.ts.
  7. Run pnpm test -- manager.embedding-batches.test.ts.

Expected

  • A failed full reindex should preserve the committed index and restore enough retry state for the next sync to finish the rolled-back work.
  • A failed full session rebuild should leave an explicit signal that a full session retry is still required.
  • Successful embedding batches should be reused after a later batch failure during safe reindex.
  • Turning on cache for an older index should not fail because the previous DB lacked embedding_cache.

Actual

  • On main, the offline force-reindex failed, the previously committed index remained intact, and Embedding cache remained at 0.
  • On fix/memory-reindex-recovery, the offline force-reindex also failed, the previously committed index remained intact, and Embedding cache increased to 3540, showing that successful embedding batches were preserved across the failed safe reindex.
  • The targeted rollback and cache regression tests also pass with the new logic.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Manual CLI evidence on main

openclaw memory index --force
Memory index failed (main): fetch failed | other side closed
openclaw memory status
Memory Search (main)
Provider: openai (requested: openai)
Model: BAAI/bge-m3
Sources: memory
Extra paths: ~/Applications/Notes/Notebook
Indexed: 2361/2361 files · 14147 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
By source:
  memory · 2361/2361 files · 14147 chunks
Vector: ready
Vector dims: 1024
FTS: ready
Embedding cache: enabled (0 entries)
sqlite3 ~/.openclaw/memory/main.sqlite 'SELECT COUNT(*) FROM embedding_cache;'
0

Manual CLI evidence on fix/memory-reindex-recovery

openclaw memory index --force
Memory index failed (main): fetch failed | Client network socket disconnected before secure TLS connection was established
openclaw memory status
Memory Search (main)
Provider: openai (requested: openai)
Model: BAAI/bge-m3
Sources: memory
Extra paths: ~/Applications/Notes/Notebook
Indexed: 2361/2361 files · 14147 chunks
Dirty: no
Store: ~/.openclaw/memory/main.sqlite
Workspace: ~/.openclaw/workspace
By source:
  memory · 2361/2361 files · 14147 chunks
Vector: ready
Vector dims: 1024
FTS: ready
Embedding cache: enabled (3540 entries)

Local targeted tests

pnpm test -- manager.atomic-reindex.test.ts
pnpm test -- manager.embedding-batches.test.ts

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
    • reviewed the current extensions/memory-core code paths against the original PR intent and confirmed the remaining gaps were retry-state restoration and cache durability, not just branch drift
    • cleared embedding_cache, disabled networking, and ran openclaw memory index --force on main
    • verified that main preserved the previously committed index but left Embedding cache at 0
    • switched to fix/memory-reindex-recovery, repeated the same offline force-reindex, and verified that the previously committed index was still preserved while Embedding cache increased to 3540
    • ran pnpm test -- manager.atomic-reindex.test.ts
    • ran pnpm test -- manager.embedding-batches.test.ts
  • Edge cases checked:
    • safe reindex rollback after some embedding batches already succeeded
    • cache table absent in the original DB
    • targeted session refresh after a failed full rebuild
  • What you did not verify:
    • a live CLI repro for the session full-retry state machine
    • batch API fallback flows beyond the local regression tests
    • multi-process concurrent indexing

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: sessionFullRetryPending adds another state bit to the session sync state machine.
    • Mitigation: focused regression coverage now checks that targeted refreshes do not accidentally clear a pending full retry and that successful full retries clear the flag.
  • Risk: cache mirroring during safe reindex writes successful embedding batches into the original DB before the temp DB is committed.
    • Mitigation: the mirrored data is intentionally limited to reusable embedding-cache rows, not index rows; the committed searchable index still remains atomic, and the new tests lock in the intended rollback behavior.
  • Risk: older indexes may not have embedding_cache.
    • Mitigation: the code now probes for table existence before seeding or mirroring cache rows, and the regression test covers enabling cache on a pre-cache index.

Changed files

  • extensions/memory-core/src/memory/embedding-manager.test-harness.ts (modified, +19/-3)
  • extensions/memory-core/src/memory/manager-embedding-ops.ts (modified, +33/-16)
  • extensions/memory-core/src/memory/manager-sync-ops.ts (modified, +193/-37)
  • extensions/memory-core/src/memory/manager.atomic-reindex.test.ts (modified, +478/-6)
  • extensions/memory-core/src/memory/manager.embedding-batches.test.ts (modified, +33/-0)
RAW_BUFFERClick to expand / collapse

Bug Report: Memory Index Fails for Main Agent

Issue Summary

OpenClaw main agent memory indexing consistently fails with fetch failed error when attempting to generate embeddings via Gemini API. Other agents appear to "succeed" only because they have no memory files to index.

Environment

  • OpenClaw Version: 2026.3.28 (latest as of 2026-03-29)
  • Previous Version: 2026.3.24 (issue persists across versions)
  • Operating System: Linux 6.4.0-150600.23.87-default (x64)
  • Node.js Version: v24.14.1
  • Shell: bash
  • Workspace: ~/.openclaw/workspace
  • Memory Directory: ~/.openclaw/workspace/memory/

Error Details

Error Message

``` Memory index failed (main): fetch failed ```

Full Output from `openclaw memory status`

``` Memory Search (main) Provider: gemini (requested: gemini) Model: gemini-embedding-2-preview Sources: memory Indexed: 0/11 files · 0 chunks Dirty: yes Store: ~/.openclaw/memory/main.sqlite Workspace: ~/.openclaw/workspace By source: memory · 0/11 files · 0 chunks Vector: ready Vector path: ~/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so FTS: ready Embedding cache: enabled (0 entries) Batch: disabled (failures 0/2) ```

Reproduction Steps

  1. Ensure memory files exist in `~/.openclaw/workspace/memory/` (11 files)
  2. Run: `openclaw memory index --force`
  3. Observe error: `Memory index failed (main): fetch failed`
  4. Note that other agents (product-designer, system-architect, etc.) report "success" but have 0/0 files indexed

Verification Tests

✅ Gemini API Works Correctly

```bash

Test single embedding request

curl -s -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-embedding-2-preview:embedContent?key=<REDACTED_API_KEY>" \ -H "Content-Type: application/json" \ -d '{"content":{"parts":[{"text":"test"}]}}' | jq '.embedding.values | length'

Result: 3072 (successful)

```

✅ File Permissions Are Correct

```bash ls -la ~/.openclaw/workspace/memory/*.md

All files have 644 permissions (readable)

```

✅ SQLite Database Is Functional

```bash sqlite3 ~/.openclaw/memory/main.sqlite "PRAGMA table_info(files);"

Returns: path, source, hash, mtime, size (correct schema)

```

✅ Vector Extension Is Loaded

``` Vector: ready Vector path: ~/.nvm/versions/node/v24.14.1/lib/node_modules/openclaw/node_modules/sqlite-vec-linux-x64/vec0.so ```

❌ Embedding Generation Fails

The `fetch failed` error occurs when OpenClaw's internal code attempts to generate embeddings, despite direct API calls working fine.

Attempted Solutions (All Failed)

SolutionResult
Update OpenClaw 2026.3.24 → 2026.3.28❌ No change - error persists
Switch embedding model: gemini-embedding-2-preview → gemini-embedding-001❌ No change - error persists
Reset SQLite database (delete and recreate)❌ No change - error persists
Fix file permissions (600 → 644)❌ No change - error persists
Verify API key and configuration❌ API is not the issue (curl works)
Enable fallback to FTS❌ No change - error persists (FTS also not indexed)
Disable and re-enable memorySearch❌ No change - error persists

Root Cause Analysis

Key Finding

The issue is in OpenClaw's internal embedding generation code, not in:

  • The Gemini API (proven functional)
  • File system (permissions, disk space all normal)
  • Database (SQLite schema correct)
  • Vector extension (sqlite-vec loaded correctly)

Possible Causes

  1. Batch processing logic - May have a bug specific to how main agent processes memory files
  2. File reading/parsing - May fail specifically for main agent's memory file format
  3. Configuration difference - Main agent may have different internal config than other agents
  4. Network request implementation - OpenClaw's fetch wrapper may have issues specific to main agent
  5. API request formatting - May generate malformed requests that fail despite direct API calls working

Why Other Agents "Succeed"

Other agents (product-designer, system-architect, frontend-dev, backend-dev, qa-engineer, devops-engineer) report `Memory index updated` but have:

  • 0/0 files indexed - They have no memory files to process
  • No actual embedding generation occurs
  • This is a false positive - they don't fail because they don't try

Impact

Current Workaround

```json { "memorySearch": { "enabled": false } } ```

This prevents the error but also:

  • ❌ Disables vector search functionality
  • ❌ Loses semantic memory search
  • ❌ Reduces OpenClaw's effectiveness

Manual Workaround Created

A simple grep-based search script has been created: ```bash ~/.openclaw/workspace/scripts/memory-search.sh "keyword" ```

This provides basic keyword search but lacks:

  • Semantic similarity
  • Vector-based ranking
  • Contextual understanding

Additional Information

Memory Files Requiring Indexing (11 files)

  • `MEMORY.md` (1.2K - long-term memory)
  • `2026-03-21-1759.md` (151 lines)
  • `2026-03-24-0826.md` (5 lines)
  • `2026-03-24-400-error-code-400-message-ple.md` (91 lines)
  • `2026-03-26-0330.md` (140 lines)
  • `2026-03-26-new-session.md` (17 lines)
  • `2026-03-26-prd-generation.md` (245 lines)
  • `2026-03-26-telegram-auth-fix.md` (115 lines)
  • `2026-03-27-model-test.md` (37 lines)
  • `2026-03-29-session-startup.md` (64 lines)
  • `test-index.md` (11 lines)

Configuration

```json { "memorySearch": { "enabled": true, "provider": "gemini", "model": "gemini-embedding-2-preview", "fallback": "fts", "remote": { "apiKey": "<REDACTED>", "batch": { "enabled": false } } } } ```

Expected Behavior

Memory files should be indexed successfully with embeddings generated and stored in SQLite database, enabling semantic search via `memory_search` function.

Actual Behavior

OpenClaw fails with `fetch failed` error when attempting to generate embeddings for memory files. No embeddings are stored, and vector search is unavailable.

Priority

Medium - Workaround exists (manual search script) but core functionality (semantic memory search) is lost.

Labels

bug, extensions: memory-core

Additional Notes

  • Full diagnostic report available at `/tmp/memory-index-debug-report.md`
  • Manual search script working at `~/.openclaw/workspace/scripts/memory-search.sh`
  • Issue affects only main agent - other agents with 0 memory files appear to succeed
  • This suggests a code path specific to processing multiple memory files

References

extent analysis

Fix Plan

To resolve the fetch failed error when generating embeddings for memory files, we need to modify the internal embedding generation code in OpenClaw.

  1. Update the gemini provider: Modify the gemini provider to handle batch requests correctly. This may involve updating the batch configuration option to true and implementing a retry mechanism for failed requests.
  2. Implement error handling: Add try-catch blocks to handle errors when generating embeddings. This will prevent the fetch failed error from crashing the indexing process.
  3. Validate API requests: Verify that the API requests being sent to the Gemini API are correctly formatted and contain the required parameters.

Example code changes:

// Update the gemini provider to handle batch requests
const geminiProvider = {
  // ...
  batch: true,
  // ...
};

// Implement error handling when generating embeddings
try {
  const embeddings = await generateEmbeddings(memoryFiles);
  // ...
} catch (error) {
  console.error('Error generating embeddings:', error);
  // ...
}

// Validate API requests
const apiRequest = {
  // ...
  headers: {
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    content: {
      parts: [
        {
          text: memoryFile.content,
        },
      ],
    },
  }),
};

Verification

To verify that the fix worked, run the following steps:

  1. Update the OpenClaw configuration to enable the gemini provider with batch requests.
  2. Run the openclaw memory index --force command to re-index the memory files.
  3. Check the OpenClaw logs for any errors or warnings.
  4. Verify that the embeddings are being generated correctly by checking the SQLite database.

Extra Tips

  • Make sure to test the updated code with a small set of memory files to ensure that the fix works as expected.
  • Consider adding additional logging and monitoring to detect any issues with the embedding generation process.
  • Review the OpenClaw documentation and GitHub repository for any updates or issues related to the gemini provider and embedding generation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Bug: Memory Index Fails for Main Agent with 'fetch failed' Error [2 pull requests, 1 participants]