openclaw - 💡(How to fix) Fix BM25 full-text search returns 0 results for Chinese queries with multiple terms

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

The buildFtsQuery function in extensions/memory-core/src/memory/hybrid.ts joins all query tokens with AND:

function buildFtsQuery(raw) {
  const tokens = raw.match(/[\p{L}\p{N}_]+/gu)?.map((t) => t.trim()).filter(Boolean) ?? [];
  if (tokens.length === 0) return null;
  return tokens.map((t) => `"${t.replaceAll("\"", "")}"`).join(" AND ");
}

For Chinese text, the regex /[\p{L}\p{N}_]+/gu treats consecutive CJK characters as a single token. So a query like 微信读书 书架 阅读偏好 becomes:

"微信读书" AND "书架" AND "阅读偏好"

This requires all three terms to appear in the same FTS5 chunk. In practice, Chinese text rarely has all keywords in one chunk, so the result is always 0 matches.

Code Example

function buildFtsQuery(raw) {
  const tokens = raw.match(/[\p{L}\p{N}_]+/gu)?.map((t) => t.trim()).filter(Boolean) ?? [];
  if (tokens.length === 0) return null;
  return tokens.map((t) => `"${t.replaceAll("\"", "")}"`).join(" AND ");
}

---

"微信读书" AND "书架" AND "阅读偏好"

---

// Option 1: Simple OR
return tokens.map((t) => `"${t.replaceAll("\"", "")}"`).join(" OR ");

// Option 2: Hybrid - first 2 terms AND, rest OR
// Or use BM25 ranking with OR logic (OR is standard for search engines)
RAW_BUFFERClick to expand / collapse

Problem

When searching memory with Chinese queries containing multiple terms, memory_search always returns textScore: 0 for all results. Vector search works fine, but BM25/FTS5 keyword search never matches.

Root Cause

The buildFtsQuery function in extensions/memory-core/src/memory/hybrid.ts joins all query tokens with AND:

function buildFtsQuery(raw) {
  const tokens = raw.match(/[\p{L}\p{N}_]+/gu)?.map((t) => t.trim()).filter(Boolean) ?? [];
  if (tokens.length === 0) return null;
  return tokens.map((t) => `"${t.replaceAll("\"", "")}"`).join(" AND ");
}

For Chinese text, the regex /[\p{L}\p{N}_]+/gu treats consecutive CJK characters as a single token. So a query like 微信读书 书架 阅读偏好 becomes:

"微信读书" AND "书架" AND "阅读偏好"

This requires all three terms to appear in the same FTS5 chunk. In practice, Chinese text rarely has all keywords in one chunk, so the result is always 0 matches.

Evidence

Direct SQLite FTS5 queries confirm the index is healthy:

QueryResults
"微信读书" (single term)10 chunks
"书架" (single term)2 chunks
"阅读偏好" (single term)7 chunks
"微信读书" AND "书架" AND "阅读偏好" (current behavior)0
"微信读书" OR "书架" OR "阅读偏好" (proposed fix)10

Same pattern for other queries:

  • "基金" AND "023521" AND "博时" AND "持仓" → 0 results
  • "基金" OR "023521" OR "博时" OR "持仓" → 48 results

Environment

  • OpenClaw v2026.5.12
  • FTS5 tokenizer: unicode61 (default)
  • Embedding model: bge-m3 (1024 dim)
  • Chunks indexed: 2692
  • All chunks have text content (verified)

Suggested Fix

Change the FTS query builder to use OR instead of AND for joining tokens, or implement a hybrid approach:

// Option 1: Simple OR
return tokens.map((t) => `"${t.replaceAll("\"", "")}"`).join(" OR ");

// Option 2: Hybrid - first 2 terms AND, rest OR
// Or use BM25 ranking with OR logic (OR is standard for search engines)

Note: This is a regression for CJK users. The AND logic works for English where tokens are naturally word-separated, but fails for Chinese where the tokenizer groups consecutive CJK into single tokens.

Impact

Every Chinese memory search query with 2+ terms produces textScore: 0, effectively disabling BM25 keyword search for CJK users. This affects all Chinese, Japanese, and Korean users who search with multi-word queries.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING