hermes - ✅(Solved) Fix Allow alternative context engines to suppress or customize preflight compression status [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#25115Fetched 2026-05-14 03:48:47
View on GitHub
Comments
0
Participants
1
Timeline
8
Reactions
0
Participants
Timeline (top)
cross-referenced ×4labeled ×4

Alternative context engines should be able to suppress or customize Hermes' generic user-facing preflight compression status message.

Today, users can see host/gateway status such as:

📦 Preflight compression...

That wording is misleading when the active context engine is not the built-in compressor. For example, when context.engine: lcm is active, LCM owns threshold decisions and may be doing lossless context maintenance rather than built-in lossy compression. The generic host text makes a healthy plugin-backed session look like it has fallen back to core compression or is repeatedly doing something wrong.

Root Cause

For context-engine plugins, user-facing status text is part of the operational contract. If the host says "compression" while a plugin like LCM is doing lossless context maintenance, users/operators reasonably suspect a fallback, regression, or data-loss path.

The runtime may be correct, but the status message creates false alarms. Alternative context engines should have a small API surface to keep host-visible preflight messaging accurate.

Fix Action

Fixed

PR fix notes

PR #20424: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316)

Description (problem / solution / changelog)

Summary

  • run_conversation now consults ContextEngine.should_compress_preflight() when the request is below threshold_tokens, so engines like hermes-lcm can run incremental leaf-chunk compaction (or other deferred maintenance) without waiting for the 75% context fill cutoff.
  • Default ContextEngine.should_compress_preflight() still returns False — the built-in ContextCompressor is unaffected.
  • Exceptions raised by the engine hook are caught at debug level and treated as "skip preflight", so a buggy plugin can't break an otherwise-healthy turn.

Closes #20316

Testing

  • scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_fires_below_threshold tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_skipped_when_returns_false tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_exception_does_not_break_turn -q
▶ running pytest with 4 workers, hermetic env, in /tmp/hermes-r2-1-fix
  (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)
bringing up nodes...
bringing up nodes...

...                                                                      [100%]
3 passed in 4.03s
  • scripts/run_tests.sh tests/agent/test_context_engine.py -q
...................                                                      [100%]
19 passed in 1.73s
  • scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_context_compression_triggered tests/run_agent/test_run_agent.py::TestRunConversation::test_glm_prompt_exceeds_max_length_triggers_compression -q
..                                                                       [100%]
2 passed in 6.34s

Changed files

  • run_agent.py (modified, +31/-0)
  • tests/run_agent/test_run_agent.py (modified, +136/-0)

PR #15806: fix(run_agent): wire up should_compress_preflight() per-turn ingest hook

Description (problem / solution / changelog)

ContextEngine.should_compress_preflight() is documented as the per-turn ingest entry for plugin engines, but run_agent.py never calls it. PR #10088 explicitly noted this as dead code when skipping #9675:

#9675 (preflight check) — dead code, run_agent.py never calls should_compress_preflight()

This breaks plugin context engines that rely on the hook for per-turn message ingest. hermes-lcm overrides should_compress_preflight() to persist messages each turn into its DAG store, but with the hook never called, the lossless message store stays empty until compress() fires at the threshold (typically ~96K tokens). Reproducible:

$ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 0

(Verified on hermes-agent v0.11.0 with hermes-lcm v0.7.0.)

Add two calls to should_compress_preflight(messages):

  1. Top of the main loop, right after api_call_count is incremented — per-turn ingest before each API call.
  2. End of run_conversation(), before the on_session_end plugin hook — final flush so the last assistant message reaches the engine when the turn exited via the no-tool-calls branch and skipped the per-turn hook above.

The return value is discarded; compression is still decided by the later should_compress(_real_tokens) call which uses the provider- reported token count. Both calls are wrapped in try/except so a misbehaving plugin engine cannot break the conversation loop.

Default ContextEngine.should_compress_preflight() returns False with no work, so this is zero overhead for the built-in ContextCompressor and any engine that does not override the hook.

After this fix: $ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 2

Refs:

  • #9675 (closed: feat(compressor): implement preflight compression check)
  • #10088 (merged body: skipped #9675 as dead code)
  • stephenschoettler/hermes-lcm#68 (LCM author flagged host integration issue but could not file upstream because GitHub Issues was off on a different fork)

What does this PR do?

<!-- Describe the change clearly. What problem does it solve? Why is this approach the right one? -->

Related Issue

<!-- Link the issue this PR addresses. If no issue exists, consider creating one first. -->

Fixes #

Type of Change

<!-- Check the one that applies. -->
  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

<!-- List the specific changes. Include file paths for code changes. -->

How to Test

<!-- Steps to verify this change works. For bugs: reproduction steps + proof that the fix works. -->

Checklist

<!-- Complete these before requesting review. -->

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: <!-- e.g. Ubuntu 24.04, macOS 15.2, Windows 11 -->

Documentation & Housekeeping

<!-- Check all that apply. It's OK to check "N/A" if a category doesn't apply to your change. -->
  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

<!-- Only fill this out if you're adding a skill. Delete this section otherwise. -->
  • This skill is broadly useful to most users (if bundled) — see Contributing Guide
  • SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
  • No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
  • I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

<!-- If applicable, add screenshots or log output showing the fix/feature in action. -->

Changed files

  • run_agent.py (modified, +29/-0)

Code Example

📦 Preflight compression...
RAW_BUFFERClick to expand / collapse

Summary

Alternative context engines should be able to suppress or customize Hermes' generic user-facing preflight compression status message.

Today, users can see host/gateway status such as:

📦 Preflight compression...

That wording is misleading when the active context engine is not the built-in compressor. For example, when context.engine: lcm is active, LCM owns threshold decisions and may be doing lossless context maintenance rather than built-in lossy compression. The generic host text makes a healthy plugin-backed session look like it has fallen back to core compression or is repeatedly doing something wrong.

Problem

Hermes core currently emits a generic preflight-compression status string from the host side. Plugin/alternative context engines have limited ability to say:

  • this preflight is expected and should be silent
  • this preflight is maintenance, not built-in compression
  • this compaction is handled by a plugin-specific mechanism
  • this status should use engine-specific language

This creates avoidable operator confusion for context-engine plugins.

Concrete example: hermes-lcm

With hermes-lcm:

  • context.engine: lcm
  • compression.enabled: true remains enabled as the host-level compaction gate
  • LCM_CONTEXT_THRESHOLD is the threshold LCM owns
  • core compression.threshold belongs to the built-in compressor

Related clarification: stephenschoettler/hermes-lcm#68 established that if LCM is actually loaded, preflight/compaction checks should go through the active context engine, not the built-in compressor threshold. That issue also exposed how host-side compression/status signals can be misleading even when LCM is working.

There is also a plugin-side issue tracking the same UX symptom from the LCM perspective:

  • stephenschoettler/hermes-lcm#168

Expected behavior

When an alternative context engine is active, Hermes should avoid showing generic built-in-compressor wording unless the built-in compressor is actually being used.

Acceptable implementation shapes:

  1. Add a context-engine API for a custom preflight status label, e.g.:
    • context_engine.preflight_status_message(...) -> str | None
    • None means suppress host-visible status
  2. Let should_compress_preflight(...) return a richer result than a bool, such as:
    • should_run: bool
    • status_message: str | None
    • status_kind: "silent" | "maintenance" | "compression"
  3. Add a simple capability/property on context engines, e.g.:
    • suppress_generic_preflight_status = True
    • preflight_status_label = "LCM context maintenance..."
  4. At minimum, make the host check whether context.engine != default/builtin before emitting 📦 Preflight compression..., and use neutral wording like:
    • Checking context budget...
    • Running context maintenance...

Suggested acceptance criteria

  • With context.engine: lcm, users no longer see generic 📦 Preflight compression... unless the built-in compressor is actually the active engine.
  • Alternative context engines can opt into one of:
    • silent preflight maintenance
    • engine-specific preflight text
    • neutral host text that does not imply built-in compression
  • Existing built-in compressor UX is preserved for default Hermes compression.
  • The solution composes with related preflight/context-engine work, including:
    • #20316
    • #20424

Why this matters

For context-engine plugins, user-facing status text is part of the operational contract. If the host says "compression" while a plugin like LCM is doing lossless context maintenance, users/operators reasonably suspect a fallback, regression, or data-loss path.

The runtime may be correct, but the status message creates false alarms. Alternative context engines should have a small API surface to keep host-visible preflight messaging accurate.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When an alternative context engine is active, Hermes should avoid showing generic built-in-compressor wording unless the built-in compressor is actually being used.

Acceptable implementation shapes:

  1. Add a context-engine API for a custom preflight status label, e.g.:
    • context_engine.preflight_status_message(...) -> str | None
    • None means suppress host-visible status
  2. Let should_compress_preflight(...) return a richer result than a bool, such as:
    • should_run: bool
    • status_message: str | None
    • status_kind: "silent" | "maintenance" | "compression"
  3. Add a simple capability/property on context engines, e.g.:
    • suppress_generic_preflight_status = True
    • preflight_status_label = "LCM context maintenance..."
  4. At minimum, make the host check whether context.engine != default/builtin before emitting 📦 Preflight compression..., and use neutral wording like:
    • Checking context budget...
    • Running context maintenance...

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Allow alternative context engines to suppress or customize preflight compression status [2 pull requests, 1 participants]