Root Cause

The four patterns in store.py:84–91:

_RE_CAPITALIZED  = re.compile(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)+)\b')
_RE_DOUBLE_QUOTE = re.compile(r'"([^"]+)"')         # ASCII " only
_RE_SINGLE_QUOTE = re.compile(r"'([^']+)'")          # ASCII ' only
_RE_AKA          = re.compile(r'(\w+(?:\s+\w+)*)\s+(?:aka|also known as)\s+(\w+(?:\s+\w+)*)', re.I)

For CJK input:

_RE_CAPITALIZED — no Latin uppercase/lowercase, never fires
_RE_DOUBLE_QUOTE / _RE_SINGLE_QUOTE — CJK users type 「」《》 "" '' instead of ASCII quotes
_RE_AKA — English idiom, doesn't translate

Result: pure-CJK facts yield zero candidates, the entity graph stays empty, and HRR vectors are effectively content-only.

Fix Action

Fix / Workaround

I have a working patch

I patched this locally on my install and verified it on 11 Chinese facts:

The patch is ~50 lines (regex defs + stopword set + 3 additions to _extract_entities + a small _add tweak that strips CJK punctuation and skips stopwords only for pure-CJK candidates). English behavior is fully unchanged — all four existing rules and the original _add semantics are preserved verbatim.

Code Example

# Fresh memory store, add some Chinese facts
python -c "
import sys, importlib.util, sqlite3, threading
spec = importlib.util.spec_from_file_location('s','plugins/memory/holographic/store.py')
m = importlib.util.module_from_spec(spec); spec.loader.exec_module(m)
fs = m.MemoryStore('/tmp/test.db')
fs.add_fact('飞书白兔 App 已于 2026-5-10 接入完成')
fs.add_fact('Coco 香港插班项目计划')
fs.add_fact('用户公司日常用「白兔」/「白兔控股」，不要用工商执照名「成都抖咖」')
"

sqlite3 /tmp/test.db "SELECT COUNT(*) FROM facts;        -- 3
                      SELECT COUNT(*) FROM entities;    -- 0  ← bug
                      SELECT COUNT(*) FROM fact_entities; -- 0  ← bug
                      SELECT COUNT(*) FROM facts WHERE hrr_vector IS NOT NULL; -- 3 (but encoded with empty entity list)"

---

_RE_CAPITALIZED  = re.compile(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)+)\b')
_RE_DOUBLE_QUOTE = re.compile(r'"([^"]+)"')         # ASCII " only
_RE_SINGLE_QUOTE = re.compile(r"'([^']+)'")          # ASCII ' only
_RE_AKA          = re.compile(r'(\w+(?:\s+\w+)*)\s+(?:aka|also known as)\s+(\w+(?:\s+\w+)*)', re.I)

Holographic memory plugin: `_extract_entities` is ASCII-only, breaks compositional retrieval for non-English users

Summary

The holographic memory plugin's entity extractor (plugins/memory/holographic/store.py:_extract_entities) is ASCII-only. For users whose facts are predominantly Chinese, Japanese, Korean, Cyrillic, etc., zero entities are ever extracted, which silently degrades probe / related / reason from compositional HRR retrieval down to FTS5 keyword fallback.

The bug is silent — facts add fine, search works fine, list works fine. Only probe / related / reason quietly underperform, and there's no log line to point you at the entity table being empty.

Affected version

Plugin: plugins/memory/holographic/ (version 0.1.0 per plugin.yaml)
Files: plugins/memory/holographic/store.py lines 84–91 (regex defs) and lines 394–427 (_extract_entities body)

Reproduction

# Fresh memory store, add some Chinese facts
python -c "
import sys, importlib.util, sqlite3, threading
spec = importlib.util.spec_from_file_location('s','plugins/memory/holographic/store.py')
m = importlib.util.module_from_spec(spec); spec.loader.exec_module(m)
fs = m.MemoryStore('/tmp/test.db')
fs.add_fact('飞书白兔 App 已于 2026-5-10 接入完成')
fs.add_fact('Coco 香港插班项目计划')
fs.add_fact('用户公司日常用「白兔」/「白兔控股」，不要用工商执照名「成都抖咖」')
"

sqlite3 /tmp/test.db "SELECT COUNT(*) FROM facts;        -- 3
                      SELECT COUNT(*) FROM entities;    -- 0  ← bug
                      SELECT COUNT(*) FROM fact_entities; -- 0  ← bug
                      SELECT COUNT(*) FROM facts WHERE hrr_vector IS NOT NULL; -- 3 (but encoded with empty entity list)"

The HRR vectors are computed with entities=[] since the linker found nothing, so the bind(entity, role) → bank → unbind pipeline in retrieval.py:probe() has no structural signal to find.

Root cause

The four patterns in store.py:84–91:

_RE_CAPITALIZED  = re.compile(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)+)\b')
_RE_DOUBLE_QUOTE = re.compile(r'"([^"]+)"')         # ASCII " only
_RE_SINGLE_QUOTE = re.compile(r"'([^']+)'")          # ASCII ' only
_RE_AKA          = re.compile(r'(\w+(?:\s+\w+)*)\s+(?:aka|also known as)\s+(\w+(?:\s+\w+)*)', re.I)

For CJK input:

_RE_CAPITALIZED — no Latin uppercase/lowercase, never fires
_RE_DOUBLE_QUOTE / _RE_SINGLE_QUOTE — CJK users type 「」《》 "" '' instead of ASCII quotes
_RE_AKA — English idiom, doesn't translate

Result: pure-CJK facts yield zero candidates, the entity graph stays empty, and HRR vectors are effectively content-only.

Impact

Anyone whose fact base is mostly non-English silently loses compositional retrieval. They still get FTS5 keyword search (which is fine), but the selling point of this plugin — entity-aware HRR algebra — is dark. They probably won't notice until they specifically test probe / reason and wonder why the scores look flat.

Proposed fix

Add three CJK-aware rules to _extract_entities. Important design choice: prefer explicit-marker rules over bare-character heuristics, because a sliding regex over CJK runs (e.g. [\u4e00-\u9fff]{2,6}) produces too many cross-word fragments without a dictionary ("把视觉模型识", "校跟用户口头"). Stay conservative.

Suggested rule set:

CJK brackets/quotes (high signal) — 「…」 『…』 《…》 ""…"" ''…''
Mixed-script identifiers (high signal) — [A-Za-z][A-Za-z0-9_.\-]+ with optional \s+\d+(?:\.\d+)* version suffix. Captures lark-cli, GPT-5.5, Gemini 3.1 Pro, baitugroup.com, IHMS, etc.
Bare CJK runs of 2–6 chars (low signal — recommend leaving off by default, or behind a config flag) — useful only with a small stopword list to filter pronouns/generic terms.

Combined with a tiny _CN_STOPWORDS set (~80 common pronouns/auxiliaries/generic nouns) applied only to pure-CJK candidates so English entities like "Project" aren't accidentally caught.

I have a working patch

I patched this locally on my install and verified it on 11 Chinese facts:

Metric	Before	After
entities	0	67
fact_entities links	0	77
facts with HRR vector	11	11 (recomputed with real entities)
memory_banks	0	5
probe(entity="Coco") signal	flat (FTS fallback)	finds the 3 Coco-linked facts via the graph
compositional reason(["Coco","Kanban"])	n/a	returns the 2 facts linked to both

Happy to send a PR if it would be welcome — let me know the preferred shape:

(a) The narrow patch as described (additive, no config knobs)
(b) Same patch but with a cjk_run_extraction: bool config flag in plugin.yaml so adventurous users can enable the noisy Rule 3
(c) A more ambitious refactor that makes _extract_entities pluggable (a [extractor] block in plugin.yaml) so future contributors can drop in jieba/spaCy/LLM-based extractors without touching core

I'd default to (a) unless you'd rather start with the more pluggable design.

Side note: same root cause affects HRR vector quality

Since _compute_hrr_vector (store.py:470) reads from fact_entities, the bug also means CJK users' HRR vectors are computed with entities=[]. After backfilling entities the vectors need to be recomputed, which the existing add_fact path handles automatically — but anyone with an existing DB will need a one-shot migration to recompute. Worth a note in the release.

Environment:

Debian 13 (trixie), Python 3.13
Hermes-Agent venv at ~/.hermes/hermes-agent/
Plugin path plugins/memory/holographic/
Fact base 100% Chinese / Chinese-English mixed

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [holographic] _extract_entities is ASCII-only; CJK / non-English facts produce zero entities

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

I have a working patch

Code Example

Holographic memory plugin: `_extract_entities` is ASCII-only, breaks compositional retrieval for non-English users

Summary

Affected version

Reproduction

Root cause

Impact

Proposed fix

I have a working patch

Side note: same root cause affects HRR vector quality

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [holographic] _extract_entities is ASCII-only; CJK / non-English facts produce zero entities

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

I have a working patch

Code Example

Holographic memory plugin: _extract_entities is ASCII-only, breaks compositional retrieval for non-English users

Summary

Affected version

Reproduction

Root cause

Impact

Proposed fix

I have a working patch

Side note: same root cause affects HRR vector quality

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Holographic memory plugin: `_extract_entities` is ASCII-only, breaks compositional retrieval for non-English users