hermes - 💡(How to fix) Fix [RFC] Session Memory Compact — zero-cost context compression for long sessions

StepCodex · 2026-05-30T23:10:59Z

[hermes] Motivation In long-running sessions, accumulated conversation history steadily eats into the context window. Eventually the model starts losing earlie… ## Motivation In long-running sessions, accumulated conversation history steadily eats into the context window. Eventually the model starts losing earlier context, tool outputs get truncated, and the user experiences "doesn't remember what we were doing" failures. Hermes already has a compressor module, but it operates at a coarse level — compressing entire blocks when a threshold is reached. ## Proposal: Session Memory Compact A lightweight, always-on compression layer that: 1. **Runs at session start** — before the model sees the history, re-read past turns 2. **Semantic compression** — compress each user/assistant turn independently with a dedicated prompt like: "Summarize what happened in this exchange in 1-2 sentences. Keep all names, file paths, decisions, and error messages intact." 3. **Append-only** — compressed versions are prepended to the history; originals remain until the token budget forces eviction 4. **No latency cost** — runs in a background thread during API response wait time ## Key design choices - **Per-turn granularity**: each user/assistant exchange is compressed independently, not slabs of N turns at once - **Readable output**: compressed summaries retain key identifiers (files, errors, decisions) so the model can still reference them - **Zero-config**: no thresholds to tune — always active, always benefiting - **Works with trailing cleanup**: compressed history can then be trimmed from the tail more aggressively ## Benefits - Extends effective context by 30-50 percent in long sessions - No user-facing latency (runs during API waits) - Graceful degradation — the more verbose the session, the more tokens saved - Opens the door for longer autonomous runs (delegate_task, cron) without context collapse

hermes2026-05-30 23:10:59

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

Semantic compression — compress each user/assistant turn independently with a dedicated prompt like: "Summarize what happened in this exchange in 1-2 sentences. Keep all names, file paths, decisions, and error messages intact."

RAW_BUFFERClick to expand / collapse

Motivation

In long-running sessions, accumulated conversation history steadily eats into the context window. Eventually the model starts losing earlier context, tool outputs get truncated, and the user experiences "doesn't remember what we were doing" failures.

Hermes already has a compressor module, but it operates at a coarse level — compressing entire blocks when a threshold is reached.

Proposal: Session Memory Compact

A lightweight, always-on compression layer that:

Runs at session start — before the model sees the history, re-read past turns
Semantic compression — compress each user/assistant turn independently with a dedicated prompt like: "Summarize what happened in this exchange in 1-2 sentences. Keep all names, file paths, decisions, and error messages intact."
Append-only — compressed versions are prepended to the history; originals remain until the token budget forces eviction
No latency cost — runs in a background thread during API response wait time

Key design choices

Per-turn granularity: each user/assistant exchange is compressed independently, not slabs of N turns at once
Readable output: compressed summaries retain key identifiers (files, errors, decisions) so the model can still reference them
Zero-config: no thresholds to tune — always active, always benefiting
Works with trailing cleanup: compressed history can then be trimmed from the tail more aggressively

Benefits

Extends effective context by 30-50 percent in long sessions
No user-facing latency (runs during API waits)
Graceful degradation — the more verbose the session, the more tokens saved
Opens the door for longer autonomous runs (delegate_task, cron) without context collapse

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering