hermes - ✅(Solved) Fix session_search: FTS5 returns empty results for Chinese/CJK queries [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11511Fetched 2026-04-18 06:00:37
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
cross-referenced ×3referenced ×3commented ×1

Fix Action

Fixed

PR fix notes

PR #11516: fix: FTS5 LIKE fallback for CJK (Chinese/Japanese/Korean) queries

Description (problem / solution / changelog)

Problem

session_search uses SQLite FTS5 for full-text search. FTS5's default tokenizer splits CJK text character-by-character (no spaces between words). Multi-character Chinese queries like "记忆断裂" become 记 AND 忆 AND 断 AND 裂, requiring all 4 characters to match — returning 0 results despite data existing.

This affects all CJK (Chinese, Japanese, Korean) users.

Solution

Add a LIKE fallback in SessionDB.search_messages(): when FTS5 returns no results and the query contains CJK characters, retry with WHERE content LIKE ?.

Changes

  1. New _contains_cjk() static method — detects CJK Unicode ranges (Chinese, Hiragana, Katakana, Hangul)
  2. LIKE fallback in search_messages() — when FTS5 returns empty and query has CJK, retries with LIKE-based query preserving all source/role filters

Design decisions

  • FTS5 is tried first for all queries (preserves English performance)
  • LIKE fallback only triggers when FTS5 returns 0 results AND query contains CJK
  • LIKE results ordered by timestamp DESC (most recent first) since rank is unavailable
  • All existing filters (source, exclude_sources, role) are preserved in the LIKE query

Fixes #11511

Related: #9135, #9651

Changed files

  • hermes_state.py (modified, +52/-2)

PR #11517: fix: add LIKE fallback for CJK queries in session_search

Description (problem / solution / changelog)

Summary

Fix session_search returning empty results for Chinese/Japanese/Korean queries.

Problem

FTS5's default tokenizer splits CJK text character-by-character (no word boundaries). Searching "记忆断裂" becomes 记 AND 忆 AND 断 AND 裂 — requiring all 4 individual characters in the same message. Despite data existing (LIKE finds 20+ matches), FTS5 returns 0 results.

This affects all CJK users.

Changes

1. CJK-aware query rewriting (_sanitize_fts5_query)

  • Detect CJK characters in query
  • Strip common Chinese stop-words (的、了、是、在、etc.)
  • Build OR-connected bigram pairs for 3+ character queries (better recall than single-char OR, better precision than full AND)

2. LIKE fallback (search_messages)

  • When FTS5 returns 0 results and query contains CJK characters, retry with WHERE content LIKE ?
  • Preserves all existing filters (source exclusion, role filter)
  • Respects existing limit parameter
  • Groups results by session_id consistent with FTS5 path

Performance

LIKE is slower (full table scan), but for Hermes's data volume (thousands of messages) there's no perceptible difference.

Testing

Before fix:

session_search("记忆断裂") → 0 results
session_search("Friday beetle conatus") → 0 results (mixed CJK+English)

After fix:

session_search("记忆断裂") → 20+ results
session_search("Friday beetle conatus") → 3 results

Fixes #11511

Changed files

  • hermes_state.py (modified, +94/-4)

PR #11541: fix: FTS5 LIKE fallback for CJK queries

Description (problem / solution / changelog)

Fix: FTS5 LIKE fallback for CJK queries

Problem

session_search uses SQLite FTS5 for full-text search. FTS5 default tokenizer splits Chinese text character-by-character. Multi-character Chinese queries fail because FTS5 requires all characters to match.

Solution

When FTS5 returns 0 results AND the query contains CJK characters, fallback to LIKE search. This preserves FTS5 performance for English while ensuring CJK queries work.

Changes

  1. Added _is_cjk_query() helper to detect CJK characters
  2. Added LIKE fallback in search_messages() when FTS5 returns empty for CJK

Fixes #11511

Changed files

  • hermes_state.py (modified, +61/-0)

Code Example

from hermes_state import SessionDB
db = SessionDB()

# FTS5 search — returns 0
results = db.search_messages(query="记忆断裂", limit=5)
print(len(results))  # 0

# But data exists
import sqlite3
conn = sqlite3.connect("~/.hermes/state.db")
conn.execute("SELECT count(*) FROM messages WHERE content LIKE '%记忆断裂%'")
# Returns 20+
RAW_BUFFERClick to expand / collapse

Problem

session_search uses SQLite FTS5 for full-text search. FTS5 default tokenizer splits Chinese text character-by-character (since there are no spaces between words). This causes multi-character Chinese queries to fail.

Example: searching "记忆断裂" becomes 记 AND 忆 AND 断 AND 裂 — requiring all 4 individual characters to match in the same message. Despite the data existing (LIKE finds 20+ matches), FTS5 returns 0 results.

This affects all CJK (Chinese, Japanese, Korean) users.

Reproduction

from hermes_state import SessionDB
db = SessionDB()

# FTS5 search — returns 0
results = db.search_messages(query="记忆断裂", limit=5)
print(len(results))  # 0

# But data exists
import sqlite3
conn = sqlite3.connect("~/.hermes/state.db")
conn.execute("SELECT count(*) FROM messages WHERE content LIKE '%记忆断裂%'")
# Returns 20+

Environment

  • Hermes Agent v0.10.0
  • macOS, Python 3.11
  • SQLite FTS5 with default tokenizer

Suggested fix

Add a LIKE fallback in SessionDB.search_messages(): when FTS5 returns no results and the query contains CJK characters, retry with WHERE content LIKE ?. This preserves FTS5 performance for English while ensuring CJK queries work.

We have a working implementation and can submit a PR.

Related

  • #9135 (multilingual memory extraction)
  • #9651 (FTS5 multi-keyword OR vs AND)

extent analysis

TL;DR

Implement a LIKE fallback in SessionDB.search_messages() for queries containing CJK characters when FTS5 returns no results.

Guidance

  • Identify CJK characters in the query string to determine when to use the LIKE fallback.
  • Modify SessionDB.search_messages() to retry the search with WHERE content LIKE ? when FTS5 returns no results and the query contains CJK characters.
  • Test the modified search_messages() function with various CJK queries to ensure it returns the expected results.
  • Consider optimizing the LIKE fallback to minimize performance impact on English queries.

Example

def search_messages(query, limit=5):
    # FTS5 search
    results = db.execute("SELECT * FROM messages WHERE content MATCH ?", (query,))
    if not results and any(ord(c) > 127 for c in query):  # Check for CJK characters
        # LIKE fallback
        results = db.execute("SELECT * FROM messages WHERE content LIKE ?", ('%' + query + '%',))
    return results[:limit]

Notes

This solution assumes that the presence of CJK characters in the query string is a reliable indicator of when to use the LIKE fallback. However, this may not cover all cases, and further testing is needed to ensure the solution works correctly for all possible queries.

Recommendation

Apply the suggested fix by implementing the LIKE fallback in SessionDB.search_messages(), as it provides a reliable workaround for CJK queries while preserving FTS5 performance for English queries.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING