hermes - 💡(How to fix) Fix feat(delegation): Infinite Context Buffer — offload bulk memory to free large-context subagents [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#16742Fetched 2026-04-28 06:51:09
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Timeline (top)
labeled ×4

Use free large-context models (e.g. Gemini CLI, 1M tokens) as persistent context buffer subagents via delegate_task, pre-loaded with daily logs, session history, system state, and other bulk information. The main session queries this buffer on-demand and receives only a concise summary — achieving zero-information-loss recall at zero marginal cost.

Root Cause

Use free large-context models (e.g. Gemini CLI, 1M tokens) as persistent context buffer subagents via delegate_task, pre-loaded with daily logs, session history, system state, and other bulk information. The main session queries this buffer on-demand and receives only a concise summary — achieving zero-information-loss recall at zero marginal cost.

Code Example

┌──────────────────────────┐
Main Context (L1 Cache) │  ← Current task only, ~200k
Cost: Paid tokens       │
└──────────┬───────────────┘
delegate_task (on-demand query)
┌──────────────────────────┐
Context Buffer (RAM)    │  ← Full history loaded, up to 600k
Model: Free (Gemini 1M)Cost: $0 effective      │
└──────────┬───────────────┘
Periodic dump
┌──────────────────────────┐
Hindsight (HDD)         │  ← Vector search, long-term
Cost: Embedding only    │
└──────────────────────────┘
RAW_BUFFERClick to expand / collapse

Summary

Use free large-context models (e.g. Gemini CLI, 1M tokens) as persistent context buffer subagents via delegate_task, pre-loaded with daily logs, session history, system state, and other bulk information. The main session queries this buffer on-demand and receives only a concise summary — achieving zero-information-loss recall at zero marginal cost.

Motivation

Current Hermes memory hierarchy:

LayerAnalogyAccess PatternLimitation
HindsightHDDVector search (recall)Search quality dependent, can miss relevant info
Memory.mdRegistersAuto-injected every turn2,200 char hard cap
Main contextL1 CacheCurrent task only200k token limit

There is no RAM-equivalent layer — a middle ground where you can say "what did we work on over the past 2 weeks?" and get a comprehensive, non-lossy answer. RAG-based search always risks missing context.

Free large-context models (Gemini 1M, etc.) make it possible to maintain a full context buffer at $0 effective cost. Even if token usage doubles, the user-perceived cost is zero.

Proposed Behavior

1. Buffer Profile for delegate_task

Add a buffer_profile option to delegate_task that automatically:

  • Selects a free large-context model (Gemini CLI, etc.)
  • Loads the pre-built context file
  • Returns only concise summaries to the parent session

2. Context Accumulation (Cron Job)

A periodic cron job merges bulk information into a single context file:

  • Daily logs
  • Hindsight dump (or filtered recall results)
  • System state (active projects, configs, decisions)
  • Memory.md snapshot

Hard cap: 600k tokens (leaving ~400k headroom within a 1M context for queries + generation).

3. Overflow Management

When accumulated context exceeds 600k tokens:

  • A cleanup cron summarizes/compresses the oldest entries
  • Priority weighting by recency and relevance

4. On-demand Query from Main Session

The main session references the buffer subagent (similar to context_from) when it needs broader context:

  • "What decisions did we make about wp-memlog architecture?"
  • "Summarize all debugging sessions from the past week"
  • Only the summary enters the main context — preserving the 200k budget

Mental Model

┌──────────────────────────┐
│  Main Context (L1 Cache) │  ← Current task only, ~200k
│  Cost: Paid tokens       │
└──────────┬───────────────┘
           │ delegate_task (on-demand query)
┌──────────────────────────┐
│  Context Buffer (RAM)    │  ← Full history loaded, up to 600k
│  Model: Free (Gemini 1M) │
│  Cost: $0 effective      │
└──────────┬───────────────┘
           │ Periodic dump
┌──────────────────────────┐
│  Hindsight (HDD)         │  ← Vector search, long-term
│  Cost: Embedding only    │
└──────────────────────────┘

Open Questions

  • Merge sources: Full hindsight dump vs. filtered recall results? Include memory.md? Active project state?
  • Raw vs. Summarized: Load raw files (let the buffer model interpret) vs. pre-summarized entries (saves tokens but costs cron LLM calls)?
  • Refresh cadence: Daily cron vs. on session end vs. hybrid?
  • Overflow compression strategy: LLM summarization of oldest entries vs. simple truncation vs. recency-weighted eviction?
  • Multi-user: Should each Hermes user maintain their own buffer file?

Related Issues

  • #4949 — Persistent ACP background subagents (infrastructure prerequisite)
  • #4928 — Named delegation capability profiles (policy scoping)

extent analysis

TL;DR

Implement a context buffer subagent using a free large-context model like Gemini CLI to provide a middle ground for storing and querying context information.

Guidance

  • To implement the proposed behavior, add a buffer_profile option to delegate_task that selects a free large-context model and loads the pre-built context file.
  • Create a periodic cron job to merge bulk information into a single context file, including daily logs, hindsight dump, system state, and memory.md snapshot, with a hard cap of 600k tokens.
  • Develop an overflow management strategy, such as summarizing or compressing the oldest entries, to prevent the context buffer from exceeding the token limit.
  • Integrate the context buffer subagent with the main session, allowing it to reference the buffer on-demand for broader context queries.

Example

# Example of how to implement the buffer_profile option in delegate_task
def delegate_task(buffer_profile=None):
    if buffer_profile:
        # Select a free large-context model and load the pre-built context file
        model = GeminiCLI()
        context_file = load_context_file(buffer_profile)
        # Return only concise summaries to the parent session
        return model.summarize(context_file)

Notes

The implementation details of the context buffer subagent and the cron job will depend on the specific requirements and infrastructure of the project. The open questions listed in the issue, such as merge sources and refresh cadence, will need to be addressed to ensure a robust and efficient solution.

Recommendation

Apply the proposed behavior by implementing the context buffer subagent and integrating it with the main session, as this will provide a middle ground for storing and querying context information without incurring additional costs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat(delegation): Infinite Context Buffer — offload bulk memory to free large-context subagents [1 participants]