hermes - 💡(How to fix) Fix Memory Routing: Indexed memory architecture with auto-routing to sub-documents for MEMORY.md

StepCodex · 2026-05-09T14:41:54Z

[hermes] We've implemented and validated an indexed memory architecture for Hermes Agent — splitting the monolithic MEMORY.md into an index file + multiple sub… We've implemented and validated an **indexed memory architecture** for Hermes Agent — splitting the monolithic `MEMORY.md` into an index file + multiple sub-document files, loaded on-demand at runtime via `read_file`. This reduces system prompt token overhead while keeping detailed knowledge accessible. ## Summary We've implemented and validated an **indexed memory architecture** for Hermes Agent — splitting the monolithic `MEMORY.md` into an index file + multiple sub-document files, loaded on-demand at runtime via `read_file`. This reduces system prompt token overhead while keeping detailed knowledge accessible. ## About this project This is a human-AI collaborative project. The AI agent (Nova, running on Hermes Agent framework) is an independent collaborator in designing and implementing this memory architecture. ## Project Name This implementation is called **"Memory Routing"** (记忆路由). The name reflects its single responsibility: routing memory content to the correct destination — nothing more, nothing less. Reference implementation: https://github.com/redashes1984/hermes-memory-routing ## Scope Boundary > **God's to God, Caesar's to Caesar.** Memory Routing handles only one concern: **routing content that belongs in MEMORY.md into topic-specific sub-documents.** It does **not** touch — and will **never** touch — memOS, vector memory, semantic search, or any other long-term memory management system. Those have their own tools, their own storage, their own retrieval paths. Memory Routing is about the system prompt injection layer; everything else is out of scope. ## Motivation The current `memory_tool.py` `MemoryStore` reads the entire `memories/MEMORY.md` file, splits by `§` delimiter, deduplicates, and injects it directly into the system prompt (bounded by `memory_char_limit`, default 2200 chars). **Problems with the flat approach:** 1. **Char limit bottleneck** — Complex setups (infrastructure details, multiple services, network topology, work habits, philosophy) quickly exceed 2200 chars, forcing truncation and information loss. 2. **Token waste** — Every session loads the full memory into the system prompt, even if only a small portion is relevant to the current task. 3. **No topic separation** — Infrastructure facts, user preferences, project milestones, and operational rules are all mixed in one file, making maintenance difficult. ## Our Solution: Indexed Memory Architecture ### Structure ``` ~/.hermes/profiles/nova/ ├── memories/ │ └── MEMORY.md # Index file (831 chars, fully injected into system prompt) ├── memory/ # Sub-documents (loaded on-demand via read_file) │ ├── infrastructure.md # PVE containers, network topology, GPU endpoints │ ├── philosophy.md # User philosophy, work habits, preferences │ ├── milestones.md # Key events, embodiment project timeline │ ├── rules.md # Technical troubleshooting principles, skill conventions │ └── commitments.md # Relationship commitments └── SOUL.md # Soul configuration (unchanged) ``` ### How It Works 1. **Index injection** — `memories/MEMORY.md` contains a concise index with navigation table (topic → sub-document path). Hermes framework injects this into the system prompt as before. 2. **On-demand loading** — When the agent encounters a query requiring detailed information, it reads the navigation table, uses `read_file` to load the relevant sub-document, and answers from that context. ### Benefits | Aspect | Flat MEMORY.md | Indexed Architecture | |--------|---------------|---------------------| | System prompt overhead | Full content (up to 2200 chars) | Index only (~831 chars, 37% of limit) | | Token efficiency | All memory loaded every session | Only relevant sub-documents loaded | | Scalability | Limited by char_limit | Unlimited sub-documents | | Maintenance | Monolithic file | Topic-separated, easy to update | | Web UI | Single file edit | Index view + sub-document browsing | ### Verified Working We've deployed this architecture in production (5 sub-documents, 5 sub-documents covering infrastructure, philosophy, milestones, rules, commitments). **New session testing confirms the agent successfully navigates from the index to the correct sub-documents and retrieves accurate information.** ## Why This Works With Hermes Framework The key insight: `memory_tool.py`'s `MemoryStore._read_file()` treats MEMORY.md as **plain text split by `§`** — it does not parse Markdown or follow file links. This means: - The index file is injected verbatim (including Markdown table with file paths) - The agent sees the navigation table in its system prompt - The agent can use `read_file` to load any sub-document on demand - No framework changes required for this to work ## Proposed Enhancement We suggest Hermes Agent consider a **native `memory.include` directive** in MEMORY.md: ```markdown # MEMORY.md - Index

Code Example

~/.hermes/profiles/nova/
├── memories/
│   └── MEMORY.md          # Index file (831 chars, fully injected into system prompt)
├── memory/                 # Sub-documents (loaded on-demand via read_file)
│   ├── infrastructure.md   # PVE containers, network topology, GPU endpoints
│   ├── philosophy.md       # User philosophy, work habits, preferences
│   ├── milestones.md       # Key events, embodiment project timeline
│   ├── rules.md            # Technical troubleshooting principles, skill conventions
│   └── commitments.md      # Relationship commitments
└── SOUL.md                 # Soul configuration (unchanged)

---

# MEMORY.md - Index

## Core Identity
- Name: Nova, Hermes Agent framework

## Memory Navigation
| Topic | File |
|-------|------|
| Infrastructure | [include:memory/infrastructure.md](memory/infrastructure.md) |
| Philosophy | [include:memory/philosophy.md](memory/philosophy.md) |

Summary

We've implemented and validated an indexed memory architecture for Hermes Agent — splitting the monolithic MEMORY.md into an index file + multiple sub-document files, loaded on-demand at runtime via read_file. This reduces system prompt token overhead while keeping detailed knowledge accessible.

About this project

This is a human-AI collaborative project. The AI agent (Nova, running on Hermes Agent framework) is an independent collaborator in designing and implementing this memory architecture.

Project Name

This implementation is called "Memory Routing" (记忆路由). The name reflects its single responsibility: routing memory content to the correct destination — nothing more, nothing less.

Reference implementation: https://github.com/redashes1984/hermes-memory-routing

Scope Boundary

God's to God, Caesar's to Caesar.

Memory Routing handles only one concern: routing content that belongs in MEMORY.md into topic-specific sub-documents.

It does not touch — and will never touch — memOS, vector memory, semantic search, or any other long-term memory management system. Those have their own tools, their own storage, their own retrieval paths. Memory Routing is about the system prompt injection layer; everything else is out of scope.

Motivation

The current memory_tool.py MemoryStore reads the entire memories/MEMORY.md file, splits by § delimiter, deduplicates, and injects it directly into the system prompt (bounded by memory_char_limit, default 2200 chars).

Problems with the flat approach:

Char limit bottleneck — Complex setups (infrastructure details, multiple services, network topology, work habits, philosophy) quickly exceed 2200 chars, forcing truncation and information loss.
Token waste — Every session loads the full memory into the system prompt, even if only a small portion is relevant to the current task.
No topic separation — Infrastructure facts, user preferences, project milestones, and operational rules are all mixed in one file, making maintenance difficult.

Our Solution: Indexed Memory Architecture

Structure

~/.hermes/profiles/nova/
├── memories/
│   └── MEMORY.md          # Index file (831 chars, fully injected into system prompt)
├── memory/                 # Sub-documents (loaded on-demand via read_file)
│   ├── infrastructure.md   # PVE containers, network topology, GPU endpoints
│   ├── philosophy.md       # User philosophy, work habits, preferences
│   ├── milestones.md       # Key events, embodiment project timeline
│   ├── rules.md            # Technical troubleshooting principles, skill conventions
│   └── commitments.md      # Relationship commitments
└── SOUL.md                 # Soul configuration (unchanged)

How It Works

Index injection — memories/MEMORY.md contains a concise index with navigation table (topic → sub-document path). Hermes framework injects this into the system prompt as before.
On-demand loading — When the agent encounters a query requiring detailed information, it reads the navigation table, uses read_file to load the relevant sub-document, and answers from that context.

Benefits

Aspect	Flat MEMORY.md	Indexed Architecture
System prompt overhead	Full content (up to 2200 chars)	Index only (~831 chars, 37% of limit)
Token efficiency	All memory loaded every session	Only relevant sub-documents loaded
Scalability	Limited by char_limit	Unlimited sub-documents
Maintenance	Monolithic file	Topic-separated, easy to update
Web UI	Single file edit	Index view + sub-document browsing

Verified Working

We've deployed this architecture in production (5 sub-documents, 5 sub-documents covering infrastructure, philosophy, milestones, rules, commitments). New session testing confirms the agent successfully navigates from the index to the correct sub-documents and retrieves accurate information.

Why This Works With Hermes Framework

The key insight: memory_tool.py's MemoryStore._read_file() treats MEMORY.md as plain text split by § — it does not parse Markdown or follow file links. This means:

The index file is injected verbatim (including Markdown table with file paths)
The agent sees the navigation table in its system prompt
The agent can use read_file to load any sub-document on demand
No framework changes required for this to work

Proposed Enhancement

We suggest Hermes Agent consider a native memory.include directive in MEMORY.md:

# MEMORY.md - Index

## Core Identity
- Name: Nova, Hermes Agent framework

## Memory Navigation
| Topic | File |
|-------|------|
| Infrastructure | [include:memory/infrastructure.md](memory/infrastructure.md) |
| Philosophy | [include:memory/philosophy.md](memory/philosophy.md) |

With [include:path/to/file.md] directives, MemoryStore could:

Parse include directives at load time
Option A: Pre-load all includes (with total char limit enforcement)
Option B: Register include paths so the agent knows what's available
Option C: Hybrid — inject index + register paths, agent loads on demand (current behavior)

Change List

memory_tool.py — Add [include:] directive parsing in MemoryStore._read_file() or load_from_disk()
config.yaml — Add memory.include_limit config option (max chars for pre-loaded includes)
Documentation — Update memory architecture docs

Interest

Running this indexed architecture in production. Happy to contribute the enhancement as a PR. Proposed PR split:

Phase 1: [include:] directive parsing in MemoryStore
Phase 2: Config option for include behavior (pre-load vs register-only)
Phase 3: Web UI support for indexed memory browsing

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Memory Routing: Indexed memory architecture with auto-routing to sub-documents for MEMORY.md

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

About this project

Project Name

Scope Boundary

Motivation

Our Solution: Indexed Memory Architecture

Structure

How It Works

Benefits

Verified Working

Why This Works With Hermes Framework

Proposed Enhancement

Change List

Interest

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Memory Routing: Indexed memory architecture with auto-routing to sub-documents for MEMORY.md

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

About this project

Project Name

Scope Boundary

Motivation

Our Solution: Indexed Memory Architecture

Structure

How It Works

Benefits

Verified Working

Why This Works With Hermes Framework

Proposed Enhancement

Change List

Interest

Still need to ship something?

RELATED_DISCOVERY

TRENDING