hermes - ✅(Solved) Fix session_search returns 0 hits for OR-combined ≤2-char CJK queries (trigram picked but can't match) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20494Fetched 2026-05-06 06:36:36
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1

session_search returns 0 hits when invoked with OR-combined short CJK tokens (≤2 chars each), even though the underlying messages plainly contain those terms. The CJK-routing branch in hermes_state.py selects the FTS5 trigram path based on total CJK char count in the whole query, but trigram cannot match phrases of <3 CJK chars; the LIKE fallback that would serve this query is unreachable.

Root Cause

hermes_state.py:1748-1753:

is_cjk = self._contains_cjk(query)
if is_cjk:
    raw_query = query.strip('"').strip()
    cjk_count = self._count_cjk(raw_query)
    if cjk_count >= 3:
        # → trigram FTS5 path
    else:
        # → LIKE fallback

The branch tests the total CJK char count across the whole query string. With 广西 OR 桂林 OR 漓江 OR 旅游 that is 8 (≥3), so the trigram branch is taken. The trigram tokenizer requires each phrase to be ≥3 chars (≥9 UTF-8 bytes); each 2-char token here produces no matchable trigrams, returning 0. There is no empty-result fallback to LIKE inside the trigram branch.

Fix Action

Fix / Workaround

The user-visible workaround for now is to instruct the model to use single ≥3 CJK-char phrases instead of OR-combined short tokens, but that is fragile per-deployment guidance — fixing the branch makes the tool work as users naturally expect.

PR fix notes

PR #20499: fix(session-search): route short CJK OR queries to LIKE

Description (problem / solution / changelog)

Summary

Fixes #20494.

search_messages() already avoids trigram FTS for single 1-2 character CJK queries because SQLite trigram cannot match them. Boolean queries like 广西 OR 桂林 OR 漓江 OR 旅游 slipped through because the code checked the total CJK character count across the whole query, selected trigram, and returned no hits.

This PR:

  • checks CJK term length per non-operator token before choosing trigram FTS
  • routes boolean CJK queries with any too-short term through the existing LIKE fallback
  • applies simple OR/AND semantics across LIKE clauses while preserving existing filters

Verification

  • scripts/run_tests.sh tests/test_hermes_state.py -k 'CJK or cjk' -> 19 passed, 4 warnings
  • scripts/run_tests.sh tests/test_hermes_state.py -> 212 passed, 4 warnings
  • git diff --check

Overlap Check

I checked open PRs for 20494, session_search, CJK, and trigram. The active nearby PRs are broader session-search/schema work (#20238, #20239), but I did not find a direct fix for this short-token boolean CJK routing bug. This patch is limited to hermes_state.py search routing and targeted regression tests.

Changed files

  • hermes_state.py (modified, +36/-8)
  • tests/test_hermes_state.py (modified, +28/-1)

Code Example

session_search(query="广西 OR 桂林 OR 漓江 OR 旅游")

---

is_cjk = self._contains_cjk(query)
if is_cjk:
    raw_query = query.strip('"').strip()
    cjk_count = self._count_cjk(raw_query)
    if cjk_count >= 3:
        # → trigram FTS5 path
    else:
        # → LIKE fallback

---

non_ops = [t for t in raw_query.split()
           if t.upper() not in ("AND","OR","NOT")]
if non_ops and all(self._count_cjk(t) >= 3 for t in non_ops):
    # trigram path
else:
    # LIKE path — extend to handle multi-token OR by OR-of-LIKEs

---

if cjk_count >= 3:
    matches = trigram_search(...)
    if not matches:
        matches = like_fallback_search(...)
RAW_BUFFERClick to expand / collapse

Summary

session_search returns 0 hits when invoked with OR-combined short CJK tokens (≤2 chars each), even though the underlying messages plainly contain those terms. The CJK-routing branch in hermes_state.py selects the FTS5 trigram path based on total CJK char count in the whole query, but trigram cannot match phrases of <3 CJK chars; the LIKE fallback that would serve this query is unreachable.

Reproduction

Hermes 0.12.0 / commit 7530ce04e (main).

LLM calls (real example from a Telegram session asking about a past trip):

session_search(query="广西 OR 桂林 OR 漓江 OR 旅游")

→ 0 sessions, 0 messages

But the messages are clearly there:

termLIKE '%term%' count
广西126
桂林135
漓江104
旅游24

Single 3+ CJK-char terms work fine via trigram (e.g. 广西旅游 → 6, 漓江游船 → 77, 涠洲岛 → 146).

Root cause

hermes_state.py:1748-1753:

is_cjk = self._contains_cjk(query)
if is_cjk:
    raw_query = query.strip('"').strip()
    cjk_count = self._count_cjk(raw_query)
    if cjk_count >= 3:
        # → trigram FTS5 path
    else:
        # → LIKE fallback

The branch tests the total CJK char count across the whole query string. With 广西 OR 桂林 OR 漓江 OR 旅游 that is 8 (≥3), so the trigram branch is taken. The trigram tokenizer requires each phrase to be ≥3 chars (≥9 UTF-8 bytes); each 2-char token here produces no matchable trigrams, returning 0. There is no empty-result fallback to LIKE inside the trigram branch.

Suggested fixes (either is sufficient)

A — Per-token length check (more efficient):

non_ops = [t for t in raw_query.split()
           if t.upper() not in ("AND","OR","NOT")]
if non_ops and all(self._count_cjk(t) >= 3 for t in non_ops):
    # trigram path
else:
    # LIKE path — extend to handle multi-token OR by OR-of-LIKEs

B — Auto-fallback when trigram is empty (simpler, more robust to future tokenizer changes):

if cjk_count >= 3:
    matches = trigram_search(...)
    if not matches:
        matches = like_fallback_search(...)

The user-visible workaround for now is to instruct the model to use single ≥3 CJK-char phrases instead of OR-combined short tokens, but that is fragile per-deployment guidance — fixing the branch makes the tool work as users naturally expect.

Secondary note (not blocking)

While diagnosing, noticed ~/.hermes/SOUL.md's default comment says "This file is loaded fresh each message -- no restart needed" but run_agent.py:4881-4887 says _build_system_prompt is called once per session and only rebuilt after context compression. Users editing SOUL.md mid-conversation won't see changes until /new or compression. Either the SOUL.md template comment should be updated, or that one layer could be re-read per turn (skipping prefix-cache for the persona section only). Happy to file a separate issue if preferred.

Happy to send a PR for either A or B (your call on style preference).

extent analysis

TL;DR

The issue can be fixed by modifying the hermes_state.py file to either check the length of each token separately or add a fallback to the LIKE search when the trigram search returns no results.

Guidance

  • The current implementation checks the total CJK character count across the whole query string, which leads to the trigram branch being taken even when individual tokens are shorter than 3 characters.
  • To fix this, the code can be modified to check the length of each token separately, as suggested in the "Per-token length check" approach.
  • Alternatively, a fallback to the LIKE search can be added when the trigram search returns no results, as suggested in the "Auto-fallback when trigram is empty" approach.
  • The user-visible workaround is to instruct the model to use single ≥3 CJK-char phrases instead of OR-combined short tokens, but this is fragile and may not work for all users.

Example

The suggested fixes can be implemented as follows:

# Per-token length check
non_ops = [t for t in raw_query.split() if t.upper() not in ("AND","OR","NOT")]
if non_ops and all(self._count_cjk(t) >= 3 for t in non_ops):
    # trigram path
else:
    # LIKE path — extend to handle multi-token OR by OR-of-LIKEs

# Auto-fallback when trigram is empty
if cjk_count >= 3:
    matches = trigram_search(...)
    if not matches:
        matches = like_fallback_search(...)

Notes

The suggested fixes assume that the hermes_state.py file is the correct location for the modification, and that the trigram_search and like_fallback_search functions are already implemented.

Recommendation

Apply the "Auto-fallback when trigram is empty" workaround, as it is simpler and more robust to future tokenizer changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix session_search returns 0 hits for OR-combined ≤2-char CJK queries (trigram picked but can't match) [1 pull requests, 1 participants]