hermes - 💡(How to fix) Fix holographic memory `auto_extract` saves raw user messages verbatim instead of extracting preferences

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The auto_extract feature on the holographic memory plugin matches user messages against simple "I prefer / I like / I want" regex patterns at session end and writes the entire matched message (truncated to 400 chars) into fact_store as a fact. There's no extraction, summarization, or synthesis — just a pattern match followed by a raw dump of conversational text.

The result is that fact_store accumulates entries that are user messages verbatim, not facts. Subsequent holographic recall surfaces these conversational snippets as if they were learned preferences, polluting downstream context with chat fragments.

Error Message

for pattern in _PREF_PATTERNS: if pattern.search(content): try: self._store.add_fact(content[:400], category="user_pref") extracted += 1 except Exception: pass break

Root Cause

The auto_extract feature on the holographic memory plugin matches user messages against simple "I prefer / I like / I want" regex patterns at session end and writes the entire matched message (truncated to 400 chars) into fact_store as a fact. There's no extraction, summarization, or synthesis — just a pattern match followed by a raw dump of conversational text.

The result is that fact_store accumulates entries that are user messages verbatim, not facts. Subsequent holographic recall surfaces these conversational snippets as if they were learned preferences, polluting downstream context with chat fragments.

Fix Action

Workaround

auto_extract: false (the default) disables the behavior entirely. Manual fact_store add calls from the agent still work and produce clean entries. Existing contamination can be cleaned up with a SQL filter on category IN ('user_pref','project') AND tags='' AND helpful_count=0 plus a regex check for conversational openers.

Code Example

sqlite3 ~/.hermes/memory_store.db "SELECT fact_id, category, content FROM facts ORDER BY fact_id DESC LIMIT 5"

---

fact #N | user_pref | I like the new cleanup approach better, can we just write to /tmp instead?

---

for pattern in _PREF_PATTERNS:
    if pattern.search(content):
        try:
            self._store.add_fact(content[:400], category="user_pref")
            extracted += 1
        except Exception:
            pass
        break
RAW_BUFFERClick to expand / collapse

Title: holographic memory auto_extract saves raw user messages verbatim instead of extracting preferences

Summary

The auto_extract feature on the holographic memory plugin matches user messages against simple "I prefer / I like / I want" regex patterns at session end and writes the entire matched message (truncated to 400 chars) into fact_store as a fact. There's no extraction, summarization, or synthesis — just a pattern match followed by a raw dump of conversational text.

The result is that fact_store accumulates entries that are user messages verbatim, not facts. Subsequent holographic recall surfaces these conversational snippets as if they were learned preferences, polluting downstream context with chat fragments.

Repro

  1. Enable plugins.hermes-memory-store.auto_extract: true in config.yaml with the holographic memory provider configured.
  2. Send any user message containing a phrase that matches one of the auto-extractor's regexes — e.g. "I like the new cleanup approach better, can we just write to /tmp instead?"
  3. Let the session end (so on_session_end fires).
  4. Inspect memory_store.db:
    sqlite3 ~/.hermes/memory_store.db "SELECT fact_id, category, content FROM facts ORDER BY fact_id DESC LIMIT 5"

Expected

A fact entry should reflect a synthesized preference, e.g. prefers systemd-tmpfiles over alternative cleanup approaches, or no fact should be saved if the matched phrase is conversational filler rather than a preference statement.

Actual

The entire user message body is stored verbatim as category=user_pref:

fact #N | user_pref | I like the new cleanup approach better, can we just write to /tmp instead?

These contaminating entries have empty tags and helpful_count=0, but holographic recall still surfaces them as semantically-related "facts" in subsequent sessions.

Real entries from one test session after auto_extract: true was enabled (synthetic examples representative of the failure mode):

  • I like that, sounds good
  • I want you to add tests for the new endpoint
  • i like the approach, would you set it up on the staging server?
  • I always check git status before committing

None of these are facts. They're conversational replies that happen to contain the literal substring "I like" / "I want" / "I always."

Suspected cause

plugins/memory/holographic/__init__.py::_auto_extract_facts:

for pattern in _PREF_PATTERNS:
    if pattern.search(content):
        try:
            self._store.add_fact(content[:400], category="user_pref")
            extracted += 1
        except Exception:
            pass
        break

content[:400] is the unmodified user message — the regex (.+) capture group is computed but never used; the call falls back to writing the whole message body. There is no extraction step (LLM call, span-extraction, or even a simple match.group(1) substitution) between pattern match and add_fact.

The patterns themselves are also too permissive for a verbatim-dump approach. \bI\s+like\s+(.+) matches every conversational "I like that idea, let's…" reply, which makes every back-and-forth turn a candidate for ingestion.

Possible fixes

In rough order of effort:

  1. Use the regex capture group: match.group(1) instead of content[:400], so at minimum only the captured remainder is stored. This is a one-line change but doesn't solve the false-positive rate.
  2. Tighten the patterns to match clean preference statements only — e.g. require the message to BE a preference statement (start-of-string anchored, no trailing question marks, length cap on the captured span). Reduces noise but still verbatim.
  3. Replace the regex extractor with a small LLM summarization pass (e.g. via the auxiliary compression slot) that produces a synthesized fact like User prefers X for Y from the matched message. Highest cost, highest signal.

Option (1) is the smallest fix and would be a clear improvement; (3) is what the feature was likely intended to be.

Workaround

auto_extract: false (the default) disables the behavior entirely. Manual fact_store add calls from the agent still work and produce clean entries. Existing contamination can be cleaned up with a SQL filter on category IN ('user_pref','project') AND tags='' AND helpful_count=0 plus a regex check for conversational openers.

Environment

  • Hermes Agent v0.13.0 (v2026.5.7)
  • holographic memory provider, default config except auto_extract: true
  • Reproducible with the in-tree code as of main; no out-of-tree patches required.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix holographic memory `auto_extract` saves raw user messages verbatim instead of extracting preferences