hermes - 💡(How to fix) Fix Feature: progressive background pre-compaction cache

StepCodex · 2026-05-30T17:53:22Z

[hermes] Feature Add progressive/background pre-compaction so Hermes can do most of the summarization work before the session reaches the hard compression thre… ## Feature Add progressive/background pre-compaction so Hermes can do most of the summarization work before the session reaches the hard compression threshold. ## Motivation Current context compression is synchronous: when the threshold is hit, Hermes pauses the active turn and summarizes a large transcript. This makes long sessions feel slow exactly when the user is trying to continue working. A better model is to maintain a validated precompact cache in the background, then use it when real compression is needed. ## Proposed design After a completed turn, if the session is above a lower soft threshold: 1. Select only the stable prefix of the transcript. - Exclude active tool calls. - Exclude messages newer than `protect_last_n`. - Exclude anything that might be affected by `/retry`, `/undo`, queued/steered input, etc. 2. Summarize that stable prefix in a low-priority background job. 3. Store a candidate summary keyed by: - session id - last summarized message id - prefix hash - system prompt hash - compression prompt/model version 4. On actual compression: - validate the prefix hash still matches - summarize only the delta since the candidate - merge candidate + delta + protected recent tail - rotate/split the session as today 5. If validation fails, discard the candidate and fall back to normal synchronous compression. ## Requirements / guardrails - Background jobs must never mutate live session messages. - Compression locks should still guard the final session rotation. - Candidate summaries must be treated as disposable cache, not canonical history. - Cost controls: debounce, soft threshold, max one candidate per session, optional disable. - Observability: report precompact hits/misses and saved synchronous summarization time. ## Why this is separate from stale payload eviction Payload eviction reduces how often compression is needed. Pre-compaction reduces the latency when compression is eventually needed. They are complementary. ## Related issues / concepts - Existing compression path and session split behavior - LCM/lossless-claw style incremental summaries - KV-cache reuse for compression is related but different: this feature is model/provider-agnostic caching of validated summary work.

hermes2026-05-30 17:53:22

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

RAW_BUFFERClick to expand / collapse

Feature

Add progressive/background pre-compaction so Hermes can do most of the summarization work before the session reaches the hard compression threshold.

Motivation

Current context compression is synchronous: when the threshold is hit, Hermes pauses the active turn and summarizes a large transcript. This makes long sessions feel slow exactly when the user is trying to continue working.

A better model is to maintain a validated precompact cache in the background, then use it when real compression is needed.

Proposed design

After a completed turn, if the session is above a lower soft threshold:

Select only the stable prefix of the transcript.
- Exclude active tool calls.
- Exclude messages newer than protect_last_n.
- Exclude anything that might be affected by /retry, /undo, queued/steered input, etc.
Summarize that stable prefix in a low-priority background job.
Store a candidate summary keyed by:
- session id
- last summarized message id
- prefix hash
- system prompt hash
- compression prompt/model version
On actual compression:
- validate the prefix hash still matches
- summarize only the delta since the candidate
- merge candidate + delta + protected recent tail
- rotate/split the session as today
If validation fails, discard the candidate and fall back to normal synchronous compression.

Requirements / guardrails

Background jobs must never mutate live session messages.
Compression locks should still guard the final session rotation.
Candidate summaries must be treated as disposable cache, not canonical history.
Cost controls: debounce, soft threshold, max one candidate per session, optional disable.
Observability: report precompact hits/misses and saved synchronous summarization time.

Why this is separate from stale payload eviction

Payload eviction reduces how often compression is needed. Pre-compaction reduces the latency when compression is eventually needed. They are complementary.

Related issues / concepts

Existing compression path and session split behavior
LCM/lossless-claw style incremental summaries
KV-cache reuse for compression is related but different: this feature is model/provider-agnostic caching of validated summary work.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering