hermes - 💡(How to fix) Fix feat(delegation): Infinite Context Buffer — offload bulk memory to free large-context subagents [1 participants]

Root Cause

Use free large-context models (e.g. Gemini CLI, 1M tokens) as persistent context buffer subagents via delegate_task, pre-loaded with daily logs, session history, system state, and other bulk information. The main session queries this buffer on-demand and receives only a concise summary — achieving zero-information-loss recall at zero marginal cost.

Code Example

┌──────────────────────────┐
│  Main Context (L1 Cache) │  ← Current task only, ~200k
│  Cost: Paid tokens       │
└──────────┬───────────────┘
           │ delegate_task (on-demand query)
           ▼
┌──────────────────────────┐
│  Context Buffer (RAM)    │  ← Full history loaded, up to 600k
│  Model: Free (Gemini 1M) │
│  Cost: $0 effective      │
└──────────┬───────────────┘
           │ Periodic dump
           ▼
┌──────────────────────────┐
│  Hindsight (HDD)         │  ← Vector search, long-term
│  Cost: Embedding only    │
└──────────────────────────┘

Summary

Motivation

Current Hermes memory hierarchy:

Layer	Analogy	Access Pattern	Limitation
Hindsight	HDD	Vector search (recall)	Search quality dependent, can miss relevant info
Memory.md	Registers	Auto-injected every turn	2,200 char hard cap
Main context	L1 Cache	Current task only	200k token limit

There is no RAM-equivalent layer — a middle ground where you can say "what did we work on over the past 2 weeks?" and get a comprehensive, non-lossy answer. RAG-based search always risks missing context.

Free large-context models (Gemini 1M, etc.) make it possible to maintain a full context buffer at $0 effective cost. Even if token usage doubles, the user-perceived cost is zero.

Proposed Behavior

1. Buffer Profile for delegate_task

Add a buffer_profile option to delegate_task that automatically:

Selects a free large-context model (Gemini CLI, etc.)
Loads the pre-built context file
Returns only concise summaries to the parent session

2. Context Accumulation (Cron Job)

A periodic cron job merges bulk information into a single context file:

Daily logs
Hindsight dump (or filtered recall results)
System state (active projects, configs, decisions)
Memory.md snapshot

Hard cap: 600k tokens (leaving ~400k headroom within a 1M context for queries + generation).

3. Overflow Management

When accumulated context exceeds 600k tokens:

A cleanup cron summarizes/compresses the oldest entries
Priority weighting by recency and relevance

4. On-demand Query from Main Session

The main session references the buffer subagent (similar to context_from) when it needs broader context:

"What decisions did we make about wp-memlog architecture?"
"Summarize all debugging sessions from the past week"
Only the summary enters the main context — preserving the 200k budget

Mental Model

┌──────────────────────────┐
│  Main Context (L1 Cache) │  ← Current task only, ~200k
│  Cost: Paid tokens       │
└──────────┬───────────────┘
           │ delegate_task (on-demand query)
           ▼
┌──────────────────────────┐
│  Context Buffer (RAM)    │  ← Full history loaded, up to 600k
│  Model: Free (Gemini 1M) │
│  Cost: $0 effective      │
└──────────┬───────────────┘
           │ Periodic dump
           ▼
┌──────────────────────────┐
│  Hindsight (HDD)         │  ← Vector search, long-term
│  Cost: Embedding only    │
└──────────────────────────┘

Open Questions

Merge sources: Full hindsight dump vs. filtered recall results? Include memory.md? Active project state?
Raw vs. Summarized: Load raw files (let the buffer model interpret) vs. pre-summarized entries (saves tokens but costs cron LLM calls)?
Refresh cadence: Daily cron vs. on session end vs. hybrid?
Overflow compression strategy: LLM summarization of oldest entries vs. simple truncation vs. recency-weighted eviction?
Multi-user: Should each Hermes user maintain their own buffer file?

Related Issues

#4949 — Persistent ACP background subagents (infrastructure prerequisite)
#4928 — Named delegation capability profiles (policy scoping)

extent analysis

TL;DR

Implement a context buffer subagent using a free large-context model like Gemini CLI to provide a middle ground for storing and querying context information.

Guidance

To implement the proposed behavior, add a buffer_profile option to delegate_task that selects a free large-context model and loads the pre-built context file.
Create a periodic cron job to merge bulk information into a single context file, including daily logs, hindsight dump, system state, and memory.md snapshot, with a hard cap of 600k tokens.
Develop an overflow management strategy, such as summarizing or compressing the oldest entries, to prevent the context buffer from exceeding the token limit.
Integrate the context buffer subagent with the main session, allowing it to reference the buffer on-demand for broader context queries.

Example

# Example of how to implement the buffer_profile option in delegate_task
def delegate_task(buffer_profile=None):
    if buffer_profile:
        # Select a free large-context model and load the pre-built context file
        model = GeminiCLI()
        context_file = load_context_file(buffer_profile)
        # Return only concise summaries to the parent session
        return model.summarize(context_file)

Notes

The implementation details of the context buffer subagent and the cron job will depend on the specific requirements and infrastructure of the project. The open questions listed in the issue, such as merge sources and refresh cadence, will need to be addressed to ensure a robust and efficient solution.

Recommendation

Apply the proposed behavior by implementing the context buffer subagent and integrating it with the main session, as this will provide a middle ground for storing and querying context information without incurring additional costs.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix feat(delegation): Infinite Context Buffer — offload bulk memory to free large-context subagents [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Motivation

Proposed Behavior

1. Buffer Profile for delegate_task

2. Context Accumulation (Cron Job)

3. Overflow Management

4. On-demand Query from Main Session

Mental Model

Open Questions

Related Issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix feat(delegation): Infinite Context Buffer — offload bulk memory to free large-context subagents [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Summary

Motivation

Proposed Behavior

1. Buffer Profile for delegate_task

2. Context Accumulation (Cron Job)

3. Overflow Management

4. On-demand Query from Main Session

Mental Model

Open Questions

Related Issues

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING