openclaw - 💡(How to fix) Fix Feature Request: Two-Stage Memory Retrieval (Embedding + Rerank) with Local/Cloud Flexibility + 4 Practical Optimizations

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Request for OFFICIAL INTEGRATION of a two-stage memory retrieval architecture into memory-lancedb plugin, with 4 practical, low-effort optimizations included.

Support BOTH local models AND cloud APIs for maximum flexibility.

Current "DIY approach" requires manual configuration changes every time. Please make this a first-class, one-click enabled feature.

Root Cause

Request for OFFICIAL INTEGRATION of a two-stage memory retrieval architecture into memory-lancedb plugin, with 4 practical, low-effort optimizations included.

Support BOTH local models AND cloud APIs for maximum flexibility.

Current "DIY approach" requires manual configuration changes every time. Please make this a first-class, one-click enabled feature.

Code Example

{
  "memory-lancedb": {
    "config": {
      "twoStageRetrieval": true,
      "embedding": { "provider": "local", "model": "BAAI/bge-m3" },
      "rerank": { "provider": "local", "model": "BAAI/bge-reranker-v2-m3", "threshold": 0.01 }
    }
  }
}

---

{
  "memory-lancedb": {
    "config": {
      "twoStageRetrieval": true,
      "embedding": { "provider": "openai", "model": "text-embedding-3-small" },
      "rerank": { "provider": "cohere", "model": "rerank-english-v3.0", "apiKey": "xxx" }
    }
  }
}

---

📥 User Query
Step 1: Embedding (Local Model OR Cloud API)
📊 LanceDB Vector Search + BM25 Keyword Search
30 candidates each, deduplicated
Step 2: Rerank (Local Model OR Cloud API)
Time decay weighting
Importance weighting
📤 Final Results (Top 3-10)

---

final_score = 0.6 * rerank_score 
            + 0.25 * exp(-days_old / 30)  # 30-day half-life
            + 0.15 * importance  # field already exists!
RAW_BUFFERClick to expand / collapse

Summary

Request for OFFICIAL INTEGRATION of a two-stage memory retrieval architecture into memory-lancedb plugin, with 4 practical, low-effort optimizations included.

Support BOTH local models AND cloud APIs for maximum flexibility.

Current "DIY approach" requires manual configuration changes every time. Please make this a first-class, one-click enabled feature.

The Problem with Current "DIY" Approach

Right now I have to manually:

  1. 🚫 Start a separate embedding service manually
  2. 🚫 Hardcode endpoints in the plugin source code
  3. 🚫 Modify JS files in node_modules after every update
  4. 🚫 Manually install both models or configure API keys
  5. 🚫 Manually manage the service process

This is not maintainable and breaks every time the plugin updates.

Proposed Solution: Official Integration

🎯 Simple Config (Flexible)

Option A: Local Models (For privacy and zero API cost)

{
  "memory-lancedb": {
    "config": {
      "twoStageRetrieval": true,
      "embedding": { "provider": "local", "model": "BAAI/bge-m3" },
      "rerank": { "provider": "local", "model": "BAAI/bge-reranker-v2-m3", "threshold": 0.01 }
    }
  }
}

Option B: Cloud APIs (For users without GPU)

{
  "memory-lancedb": {
    "config": {
      "twoStageRetrieval": true,
      "embedding": { "provider": "openai", "model": "text-embedding-3-small" },
      "rerank": { "provider": "cohere", "model": "rerank-english-v3.0", "apiKey": "xxx" }
    }
  }
}

🚀 Core Architecture

📥 User Query
Step 1: Embedding (Local Model OR Cloud API)
📊 LanceDB Vector Search + BM25 Keyword Search
    • 30 candidates each, deduplicated
Step 2: Rerank (Local Model OR Cloud API)
    • Time decay weighting
    • Importance weighting
📤 Final Results (Top 3-10)

🤖 What the Plugin Should Do Automatically

  1. Auto-detect best provider (local if models exist, else cloud)
  2. Auto-download local models on first use, if enabled
  3. Run local models internally (no separate HTTP service)
  4. Unified interface for cloud API calls
  5. LRU cache for repeated queries
  6. Auto limit adjustment based on query length

4 Practical, Low-Effort Optimizations (No Hype)

These are all simple to implement, high ROI, no fancy architecture required:

✅ Optimization 1: Hybrid BM25 + Vector Search (~50 lines)

Problem: Vector search misses exact keywords like dates, names, version numbers
Solution: Run both searches, merge results, deduplicate, then rerank
Benefit: +20% recall, zero extra dependencies (SQLite FTS5 built-in)

✅ Optimization 2: Time Decay + Importance Weighting (~10 lines)

Problem: 3-month old memories get same score as yesterday's
Solution: Simple weighted scoring:

final_score = 0.6 * rerank_score 
            + 0.25 * exp(-days_old / 30)  # 30-day half-life
            + 0.15 * importance  # field already exists!

Benefit: Results make sense chronologically, zero cost

✅ Optimization 3: LRU Query Cache (~30 lines)

Problem: Same queries get recomputed every time
Solution: Cache last 1000 queries, TTL 24h
Benefit: 10x faster for common questions, fewer model calls

✅ Optimization 4: Adaptive Limit (~5 lines)

Problem: Short queries waste time fetching 50 results
Solution: limit = 20 if len(query) < 10 else 50
Benefit: +30% average speed, no recall loss


Why Two Stages Are Better Than One

AspectSingle Stage (Embedding Only)Two Stage (Embedding + Rerank)
Retrieval QualityBasic cosine similarityDeep semantic understanding
RelevanceOften returns loosely related resultsHighly relevant, context-aware
PrecisionMediumHigh
RecallHighHigh + Precision Filtered

Flexibility: Local vs Cloud

FactorLocal ModelsCloud APIs
CostZero after downloadPer-token pricing
Privacy100% local, no data leavesData sent to 3rd party
HardwareRequires ~4GB RAMWorks on any machine
SpeedDepends on hardwareDepends on network
Best ForPrivacy-conscious users, serversDesktop users, casual use

Performance Benchmarks (Local Models Already Working)

OperationPerformance
Embedding Vectorization30 texts/sec
Rerank (10 docs)23 docs/sec
Single Query Vectorization0.02 sec
End-to-end Retrieval< 0.5 sec

Benefits of Official Integration

  1. ✅ Zero Config - Just set twoStageRetrieval: true
  2. ✅ Flexible - Choose local or cloud based on your needs
  3. ✅ No Breakage - Survives plugin updates
  4. ✅ Auto Setup - Models download automatically or use configured API keys
  5. ✅ Better Results - Two stage architecture dramatically improves retrieval quality
  6. ✅ No External Dependencies - Everything built into the plugin
  7. ✅ +20% Recall, 10x Faster - With 4 simple optimizations above

My Current "Hacky" Implementation Status

✅ Already working in production with:

  • Separate embedding/rerank service (BGE-M3 + CrossEncoder)
  • Manually modified memory-lancedb/dist/index.js
  • Two-stage retrieval fully functional
  • Significantly better memory recall quality

But this is not sustainable. Please integrate this natively so users don't have to hack the plugin every time.


I'm happy to contribute a PR if the OpenClaw team is interested in this feature!

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING