hermes - 💡(How to fix) Fix feat(memory): Add vector/semantic search + memory lifecycle management to built-in memory tool

StepCodex · 2026-06-09T06:22:52Z

[hermes] Problem Hermes' built-in memory tool is currently FTS5-only — it matches exact keywords but cannot retrieve semantically similar content when phrasing… ## Problem Hermes' built-in `memory` tool is currently FTS5-only — it matches exact keywords but cannot retrieve semantically similar content when phrasing differs. Two practical pain points: 1. **Missed recall.** "user prefers concise responses" won't surface when searching "keep answers short" or "be brief". 2. **No lifecycle.** No automated dedup, evolution, TTL, or compaction — every manual `memory(action=add)` just appends to a ~2KB-limited buffer. ## What OMEGA Memory does differently [omega-memory/omega-memory](https://github.com/omega-memory/omega-memory) is Apache 2.0, local-first, SQLite + ONNX embeddings. Key gaps vs Hermes built-in: | Capability | Hermes built-in (current) | OMEGA | |---|---|---| | Semantic search | ❌ FTS5 only | ✅ SQLite-vec + FTS5 hybrid | | Memory lifecycle | ❌ Manual | ✅ Auto-dedup, TTL, evolution, compaction | | Cross-session checkpoint | ❌ | ✅ checkpoint/resume_task | | Session-start briefing | ❌ | ✅ welcome tool | | Memory compression | ❌ | ✅ cluster-and-summarize | ## Proposal ### 1. Optional vector embeddings (opt-in, non-breaking) - `config.yaml`: new `[memory]` section with `vector_store: true/false` (default: false) - ONNX model: bge-small-en-v1.5 (384-dim, ~30MB, ~300MB RSS on first query — same pattern as existing Hermes tool caches) - Storage: `~/.hermes/memories/` gets `vectors.db` (sqlite-vec) alongside existing MEMORY.md/USER.md - Hybrid retrieval: vector cosine + FTS5 BM25 → type-weighted merge → dedup - Memory types: tag entries as `decision | lesson | preference | fact | summary` ### 2. Memory lifecycle automation - **Dedup on write:** SHA256 + vector similarity threshold (configurable, default 0.85) - **Evolution:** similar memories (0.55-0.85) append insights instead of duplicate - **TTL:** session summaries expire 1 day; lessons/preferences permanent - **Consolidation tool:** `memory(action=consolidate)` — dedup + stale prune + compact ### 3. Cross-session continuity - **Checkpoint:** `memory(action=checkpoint, task_state=...)` — save task context - **Resume:** `memory(action=resume_task)` — surface saved checkpoints - **Welcome:** auto-surface N relevant memories on session start (vector-powered) ### 4. Memory compression - `memory(action=compact)` — cluster related memories, summarize each cluster, archive originals - Helps stay within the 2KB system prompt budget ## Why built-in vs plugin 8 memory providers already exist (`plugins/memory/` — Mem0, Supermemory, Hindsight, Holographic, etc.) but: - **Default gap:** most users never enable plugins. The built-in `memory` tool is first interaction. - **Latency:** plugins go through MemoryProvider abstraction layer. - **Dependency tax:** plugins bring auth/models/services. A built-in vector layer with Hermes' local-first philosophy = zero extra deps. ## Implementation sketch ``` tools/memory_tool.py ├── MemoryStore (existing — MEMORY.md/USER.md) ├── VectorStore (new — sqlite-vec + ONNX, optional) │ ├── embed(text) → 384-dim vector │ ├── search(query, top_k=5) → hybrid │ └── lifecycle() — dedup, evolve, expire, compact └── memory_tool() handler — new action types: ├── search (existing) → hybrid if vector enabled ├── checkpoint (new) ├── resume (new) └── compact (new) ``` ## Prior art in Hermes - 8 memory provider plugins already exist with similar architectures - ONNX model caches already used for other tools - Current FTS5 dedup in memory_tool.py — extend with hash+vector ## References - OMEGA: https://github.com/omega-memory/omega-memory - MemoryProvider: agent/memory_provider.py - MemoryStore: tools/memory_tool.py - Session search: tools/session_search_tool.py

Code Example

tools/memory_tool.py
  ├── MemoryStore (existing — MEMORY.md/USER.md)
  ├── VectorStore (new — sqlite-vec + ONNX, optional)
  │   ├── embed(text) → 384-dim vector
  │   ├── search(query, top_k=5) → hybrid
  │   └── lifecycle() — dedup, evolve, expire, compact
  └── memory_tool() handler — new action types:
      ├── search (existing) → hybrid if vector enabled
      ├── checkpoint (new)
      ├── resume (new)
      └── compact (new)

Problem

Hermes' built-in memory tool is currently FTS5-only — it matches exact keywords but cannot retrieve semantically similar content when phrasing differs. Two practical pain points:

Missed recall. "user prefers concise responses" won't surface when searching "keep answers short" or "be brief".
No lifecycle. No automated dedup, evolution, TTL, or compaction — every manual memory(action=add) just appends to a ~2KB-limited buffer.

What OMEGA Memory does differently

omega-memory/omega-memory is Apache 2.0, local-first, SQLite + ONNX embeddings. Key gaps vs Hermes built-in:

Capability	Hermes built-in (current)	OMEGA
Semantic search	❌ FTS5 only	✅ SQLite-vec + FTS5 hybrid
Memory lifecycle	❌ Manual	✅ Auto-dedup, TTL, evolution, compaction
Cross-session checkpoint	❌	✅ checkpoint/resume_task
Session-start briefing	❌	✅ welcome tool
Memory compression	❌	✅ cluster-and-summarize

Proposal

1. Optional vector embeddings (opt-in, non-breaking)

config.yaml: new [memory] section with vector_store: true/false (default: false)
ONNX model: bge-small-en-v1.5 (384-dim, ~30MB, ~300MB RSS on first query — same pattern as existing Hermes tool caches)
Storage: ~/.hermes/memories/ gets vectors.db (sqlite-vec) alongside existing MEMORY.md/USER.md
Hybrid retrieval: vector cosine + FTS5 BM25 → type-weighted merge → dedup
Memory types: tag entries as decision | lesson | preference | fact | summary

2. Memory lifecycle automation

Dedup on write: SHA256 + vector similarity threshold (configurable, default 0.85)
Evolution: similar memories (0.55-0.85) append insights instead of duplicate
TTL: session summaries expire 1 day; lessons/preferences permanent
Consolidation tool: memory(action=consolidate) — dedup + stale prune + compact

3. Cross-session continuity

Checkpoint: memory(action=checkpoint, task_state=...) — save task context
Resume: memory(action=resume_task) — surface saved checkpoints
Welcome: auto-surface N relevant memories on session start (vector-powered)

4. Memory compression

memory(action=compact) — cluster related memories, summarize each cluster, archive originals
Helps stay within the 2KB system prompt budget

Why built-in vs plugin

8 memory providers already exist (plugins/memory/ — Mem0, Supermemory, Hindsight, Holographic, etc.) but:

Default gap: most users never enable plugins. The built-in memory tool is first interaction.
Latency: plugins go through MemoryProvider abstraction layer.
Dependency tax: plugins bring auth/models/services. A built-in vector layer with Hermes' local-first philosophy = zero extra deps.

Implementation sketch

tools/memory_tool.py
  ├── MemoryStore (existing — MEMORY.md/USER.md)
  ├── VectorStore (new — sqlite-vec + ONNX, optional)
  │   ├── embed(text) → 384-dim vector
  │   ├── search(query, top_k=5) → hybrid
  │   └── lifecycle() — dedup, evolve, expire, compact
  └── memory_tool() handler — new action types:
      ├── search (existing) → hybrid if vector enabled
      ├── checkpoint (new)
      ├── resume (new)
      └── compact (new)

Prior art in Hermes

8 memory provider plugins already exist with similar architectures
ONNX model caches already used for other tools
Current FTS5 dedup in memory_tool.py — extend with hash+vector

References

OMEGA: https://github.com/omega-memory/omega-memory
MemoryProvider: agent/memory_provider.py
MemoryStore: tools/memory_tool.py
Session search: tools/session_search_tool.py

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering