openclaw - 💡(How to fix) Fix Feature Request: Two-Stage Memory Retrieval (Embedding + Rerank) with Local/Cloud Flexibility + 4 Practical Optimizations

StepCodex · 2026-05-17T03:29:17Z

[openclaw] Request for OFFICIAL INTEGRATION of a two-stage memory retrieval architecture into memory-lancedb plugin, with 4 practical, low-effort optimizations… **Request for OFFICIAL INTEGRATION** of a **two-stage memory retrieval architecture** into `memory-lancedb` plugin, with **4 practical, low-effort optimizations** included. Support **BOTH local models AND cloud APIs** for maximum flexibility. Current "DIY approach" requires manual configuration changes every time. Please make this a **first-class, one-click enabled feature**. ## Summary **Request for OFFICIAL INTEGRATION** of a **two-stage memory retrieval architecture** into `memory-lancedb` plugin, with **4 practical, low-effort optimizations** included. Support **BOTH local models AND cloud APIs** for maximum flexibility. Current "DIY approach" requires manual configuration changes every time. Please make this a **first-class, one-click enabled feature**. ## The Problem with Current "DIY" Approach Right now I have to manually: 1. 🚫 Start a separate embedding service manually 2. 🚫 Hardcode endpoints in the plugin source code 3. 🚫 Modify JS files in `node_modules` after every update 4. 🚫 Manually install both models or configure API keys 5. 🚫 Manually manage the service process This is **not maintainable** and breaks every time the plugin updates. ## Proposed Solution: Official Integration ### 🎯 Simple Config (Flexible) **Option A: Local Models (For privacy and zero API cost)** ```json { "memory-lancedb": { "config": { "twoStageRetrieval": true, "embedding": { "provider": "local", "model": "BAAI/bge-m3" }, "rerank": { "provider": "local", "model": "BAAI/bge-reranker-v2-m3", "threshold": 0.01 } } } } ``` **Option B: Cloud APIs (For users without GPU)** ```json { "memory-lancedb": { "config": { "twoStageRetrieval": true, "embedding": { "provider": "openai", "model": "text-embedding-3-small" }, "rerank": { "provider": "cohere", "model": "rerank-english-v3.0", "apiKey": "xxx" } } } } ``` ### 🚀 Core Architecture ``` 📥 User Query ↓ Step 1: Embedding (Local Model OR Cloud API) ↓ 📊 LanceDB Vector Search + BM25 Keyword Search • 30 candidates each, deduplicated ↓ Step 2: Rerank (Local Model OR Cloud API) • Time decay weighting • Importance weighting ↓ 📤 Final Results (Top 3-10) ``` ### 🤖 What the Plugin Should Do Automatically 1. **Auto-detect best provider** (local if models exist, else cloud) 2. **Auto-download local models** on first use, if enabled 3. **Run local models internally** (no separate HTTP service) 4. **Unified interface** for cloud API calls 5. **LRU cache** for repeated queries 6. **Auto limit adjustment** based on query length --- ## 4 Practical, Low-Effort Optimizations (No Hype) These are all **simple to implement, high ROI**, no fancy architecture required: ### ✅ Optimization 1: Hybrid BM25 + Vector Search (~50 lines) **Problem:** Vector search misses exact keywords like dates, names, version numbers **Solution:** Run both searches, merge results, deduplicate, then rerank **Benefit:** +20% recall, zero extra dependencies (SQLite FTS5 built-in) ### ✅ Optimization 2: Time Decay + Importance Weighting (~10 lines) **Problem:** 3-month old memories get same score as yesterday's **Solution:** Simple weighted scoring: ``` final_score = 0.6 * rerank_score + 0.25 * exp(-days_old / 30) # 30-day half-life + 0.15 * importance # field already exists! ``` **Benefit:** Results make sense chronologically, zero cost ### ✅ Optimization 3: LRU Query Cache (~30 lines) **Problem:** Same queries get recomputed every time **Solution:** Cache last 1000 queries, TTL 24h **Benefit:** 10x faster for common questions, fewer model calls ### ✅ Optimization 4: Adaptive Limit (~5 lines) **Problem:** Short queries waste time fetching 50 results **Solution:** `limit = 20 if len(query) < 10 else 50` **Benefit:** +30% average speed, no recall loss --- ## Why Two Stages Are Better Than One | Aspect | Single Stage (Embedding Only) | Two Stage (Embedding + Rerank) | |--------|-------------------------------|---------------------------------| | **Retrieval Quality** | Basic cosine similarity | Deep semantic understanding | | **Relevance** | Often returns loosely related results | Highly relevant, context-aware | | **Precision** | Medium | High | | **Recall** | High | High + Precision Filtered | ## Flexibility: Local vs Cloud | Factor | Local Models | Cloud APIs | |--------|-------------|------------| | **Cost** | Zero after download | Per-token pricing | | **Privacy** | 100% local, no data leaves | Data sent to 3rd party | | **Hardware** | Requires ~4GB RAM | Works on any machine | | **Speed** | Depends on hardware | Depends on network | | **Best For** | Privacy-conscious users, servers | Desktop users, casual use | ## Performance Benchmarks (Local Models Already Working) | Operation | Performance | |-----------|-------------| | Embedding Vectorization | 30 texts/sec | | Rerank (10 docs) | 23 docs/sec | | Single Query Vectorization | 0.02 sec | | End-to-end Retrieva

Root Cause

Request for OFFICIAL INTEGRATION of a two-stage memory retrieval architecture into memory-lancedb plugin, with 4 practical, low-effort optimizations included.

Support BOTH local models AND cloud APIs for maximum flexibility.

Current "DIY approach" requires manual configuration changes every time. Please make this a first-class, one-click enabled feature.

Code Example

{
  "memory-lancedb": {
    "config": {
      "twoStageRetrieval": true,
      "embedding": { "provider": "local", "model": "BAAI/bge-m3" },
      "rerank": { "provider": "local", "model": "BAAI/bge-reranker-v2-m3", "threshold": 0.01 }
    }
  }
}

---

{
  "memory-lancedb": {
    "config": {
      "twoStageRetrieval": true,
      "embedding": { "provider": "openai", "model": "text-embedding-3-small" },
      "rerank": { "provider": "cohere", "model": "rerank-english-v3.0", "apiKey": "xxx" }
    }
  }
}

---

📥 User Query
    ↓
Step 1: Embedding (Local Model OR Cloud API)
    ↓
📊 LanceDB Vector Search + BM25 Keyword Search
    • 30 candidates each, deduplicated
    ↓
Step 2: Rerank (Local Model OR Cloud API)
    • Time decay weighting
    • Importance weighting
    ↓
📤 Final Results (Top 3-10)

---

final_score = 0.6 * rerank_score 
            + 0.25 * exp(-days_old / 30)  # 30-day half-life
            + 0.15 * importance  # field already exists!

Summary

Request for OFFICIAL INTEGRATION of a two-stage memory retrieval architecture into memory-lancedb plugin, with 4 practical, low-effort optimizations included.

Support BOTH local models AND cloud APIs for maximum flexibility.

Current "DIY approach" requires manual configuration changes every time. Please make this a first-class, one-click enabled feature.

The Problem with Current "DIY" Approach

Right now I have to manually:

🚫 Start a separate embedding service manually
🚫 Hardcode endpoints in the plugin source code
🚫 Modify JS files in node_modules after every update
🚫 Manually install both models or configure API keys
🚫 Manually manage the service process

This is not maintainable and breaks every time the plugin updates.

Proposed Solution: Official Integration

🎯 Simple Config (Flexible)

Option A: Local Models (For privacy and zero API cost)

{
  "memory-lancedb": {
    "config": {
      "twoStageRetrieval": true,
      "embedding": { "provider": "local", "model": "BAAI/bge-m3" },
      "rerank": { "provider": "local", "model": "BAAI/bge-reranker-v2-m3", "threshold": 0.01 }
    }
  }
}

Option B: Cloud APIs (For users without GPU)

{
  "memory-lancedb": {
    "config": {
      "twoStageRetrieval": true,
      "embedding": { "provider": "openai", "model": "text-embedding-3-small" },
      "rerank": { "provider": "cohere", "model": "rerank-english-v3.0", "apiKey": "xxx" }
    }
  }
}

🚀 Core Architecture

📥 User Query
    ↓
Step 1: Embedding (Local Model OR Cloud API)
    ↓
📊 LanceDB Vector Search + BM25 Keyword Search
    • 30 candidates each, deduplicated
    ↓
Step 2: Rerank (Local Model OR Cloud API)
    • Time decay weighting
    • Importance weighting
    ↓
📤 Final Results (Top 3-10)

🤖 What the Plugin Should Do Automatically

Auto-detect best provider (local if models exist, else cloud)
Auto-download local models on first use, if enabled
Run local models internally (no separate HTTP service)
Unified interface for cloud API calls
LRU cache for repeated queries
Auto limit adjustment based on query length

4 Practical, Low-Effort Optimizations (No Hype)

These are all simple to implement, high ROI, no fancy architecture required:

✅ Optimization 1: Hybrid BM25 + Vector Search (~50 lines)

Problem: Vector search misses exact keywords like dates, names, version numbers
Solution: Run both searches, merge results, deduplicate, then rerank
Benefit: +20% recall, zero extra dependencies (SQLite FTS5 built-in)

✅ Optimization 2: Time Decay + Importance Weighting (~10 lines)

Problem: 3-month old memories get same score as yesterday's
Solution: Simple weighted scoring:

final_score = 0.6 * rerank_score 
            + 0.25 * exp(-days_old / 30)  # 30-day half-life
            + 0.15 * importance  # field already exists!

Benefit: Results make sense chronologically, zero cost

✅ Optimization 3: LRU Query Cache (~30 lines)

Problem: Same queries get recomputed every time
Solution: Cache last 1000 queries, TTL 24h
Benefit: 10x faster for common questions, fewer model calls

✅ Optimization 4: Adaptive Limit (~5 lines)

Problem: Short queries waste time fetching 50 results
Solution: limit = 20 if len(query) < 10 else 50
Benefit: +30% average speed, no recall loss

Why Two Stages Are Better Than One

Aspect	Single Stage (Embedding Only)	Two Stage (Embedding + Rerank)
Retrieval Quality	Basic cosine similarity	Deep semantic understanding
Relevance	Often returns loosely related results	Highly relevant, context-aware
Precision	Medium	High
Recall	High	High + Precision Filtered

Flexibility: Local vs Cloud

Factor	Local Models	Cloud APIs
Cost	Zero after download	Per-token pricing
Privacy	100% local, no data leaves	Data sent to 3rd party
Hardware	Requires ~4GB RAM	Works on any machine
Speed	Depends on hardware	Depends on network
Best For	Privacy-conscious users, servers	Desktop users, casual use

Performance Benchmarks (Local Models Already Working)

Operation	Performance
Embedding Vectorization	30 texts/sec
Rerank (10 docs)	23 docs/sec
Single Query Vectorization	0.02 sec
End-to-end Retrieval	< 0.5 sec

Benefits of Official Integration

✅ Zero Config - Just set twoStageRetrieval: true
✅ Flexible - Choose local or cloud based on your needs
✅ No Breakage - Survives plugin updates
✅ Auto Setup - Models download automatically or use configured API keys
✅ Better Results - Two stage architecture dramatically improves retrieval quality
✅ No External Dependencies - Everything built into the plugin
✅ +20% Recall, 10x Faster - With 4 simple optimizations above

My Current "Hacky" Implementation Status

✅ Already working in production with:

Separate embedding/rerank service (BGE-M3 + CrossEncoder)
Manually modified memory-lancedb/dist/index.js
Two-stage retrieval fully functional
Significantly better memory recall quality

But this is not sustainable. Please integrate this natively so users don't have to hack the plugin every time.

I'm happy to contribute a PR if the OpenClaw team is interested in this feature!

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Feature Request: Two-Stage Memory Retrieval (Embedding + Rerank) with Local/Cloud Flexibility + 4 Practical Optimizations

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

The Problem with Current "DIY" Approach

Proposed Solution: Official Integration

🎯 Simple Config (Flexible)

🚀 Core Architecture

🤖 What the Plugin Should Do Automatically

4 Practical, Low-Effort Optimizations (No Hype)

✅ Optimization 1: Hybrid BM25 + Vector Search (~50 lines)

✅ Optimization 2: Time Decay + Importance Weighting (~10 lines)

✅ Optimization 3: LRU Query Cache (~30 lines)

✅ Optimization 4: Adaptive Limit (~5 lines)

Why Two Stages Are Better Than One

Flexibility: Local vs Cloud

Performance Benchmarks (Local Models Already Working)

Benefits of Official Integration

My Current "Hacky" Implementation Status

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Feature Request: Two-Stage Memory Retrieval (Embedding + Rerank) with Local/Cloud Flexibility + 4 Practical Optimizations

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

The Problem with Current "DIY" Approach

Proposed Solution: Official Integration

🎯 Simple Config (Flexible)

🚀 Core Architecture

🤖 What the Plugin Should Do Automatically

4 Practical, Low-Effort Optimizations (No Hype)

✅ Optimization 1: Hybrid BM25 + Vector Search (~50 lines)

✅ Optimization 2: Time Decay + Importance Weighting (~10 lines)

✅ Optimization 3: LRU Query Cache (~30 lines)

✅ Optimization 4: Adaptive Limit (~5 lines)

Why Two Stages Are Better Than One

Flexibility: Local vs Cloud

Performance Benchmarks (Local Models Already Working)

Benefits of Official Integration

My Current "Hacky" Implementation Status

Still need to ship something?

RELATED_DISCOVERY

TRENDING