hermes - 💡(How to fix) Fix Feature: Idle-triggered context compression to avoid pre-flight delays

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Lowering the threshold reduces individual pause length but increases pause frequency. Raising it does the opposite. Either way, compression collides with the user's active typing window because that's when context grows.

Code Example

compression:
  # Existing
  enabled: true
  threshold: 0.60          # hard pre-flight trigger (unchanged)
  target_ratio: 0.45
  protect_last_n: 60

  # New (all optional, default disabled = current behavior)
  idle_trigger_seconds: 300        # compress after N seconds of inactivity
  idle_lower_threshold: 0.50       # only if context already above this fraction
RAW_BUFFERClick to expand / collapse

Problem

Context compression in Hermes is triggered purely by token count: when prompt_tokens exceeds compression.threshold * context_length, the next turn pauses to summarize. In practice this almost always fires at the worst possible moment — the user has just hit Enter on a new message, wants the agent to act, and instead has to wait for a compression pass.

Lowering the threshold reduces individual pause length but increases pause frequency. Raising it does the opposite. Either way, compression collides with the user's active typing window because that's when context grows.

Proposal

Add an optional idle-triggered compression mode. While the user is actively typing or the agent is mid-turn, behavior stays as it is today. But when the session goes idle for N seconds and the context is already above a configurable lower threshold, run compression proactively in the background. When the user comes back, the work is already done.

Suggested config

compression:
  # Existing
  enabled: true
  threshold: 0.60          # hard pre-flight trigger (unchanged)
  target_ratio: 0.45
  protect_last_n: 60

  # New (all optional, default disabled = current behavior)
  idle_trigger_seconds: 300        # compress after N seconds of inactivity
  idle_lower_threshold: 0.50       # only if context already above this fraction

Behavior matrix

StateAction
Active turn, tokens < hard thresholdNothing (unchanged)
Active turn, tokens > hard thresholdPre-flight compression (unchanged)
Idle > N seconds, tokens > lower thresholdBackground compression (new)
Idle, tokens < lower thresholdNothing

Why this helps

  • Eliminates the most common UX papercut: "I just typed something, now it's thinking for 20 seconds before doing anything."
  • Doesn't change any defaults; opt-in via config.
  • Plays nicely with both CLI (hermes chat) and gateway sessions (Telegram, Discord, Slack) — both have a clear notion of idle time between user inputs.
  • Uses the same ContextCompressor that already exists; only the trigger changes.

Implementation sketch

  • gateway/session.py / cli.py already track when the last user message arrived. Surface that as last_user_activity_ts.
  • Add a lightweight asyncio task per active session (gateway) or per CLI loop iteration that checks idle state against idle_trigger_seconds and the current token estimate against idle_lower_threshold.
  • On match: call the existing ContextCompressor.compress() path.
  • Skip if a turn is in progress or a compression is already running.
  • Keep the existing pre-flight should_compress() as the fallback safety net.

Questions for maintainers

  1. Is this a direction you'd accept? Feature scope feels modest (~200-400 LOC + tests) but it does add a background task per session.
  2. Preferred home for the idle scheduler — gateway/ only, or also cli.py?
  3. Any concerns about background compression colliding with cron / scheduled jobs that might run during "idle" time?

Happy to implement and open a PR if there's interest. Wanted to confirm fit before writing code that might be rejected on direction grounds.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Feature: Idle-triggered context compression to avoid pre-flight delays