When an alternative context engine is active, Hermes should avoid showing generic built-in-compressor wording unless the built-in compressor is actually being used. Acceptable implementation shapes: 1. Add a context-engine API for a custom preflight status label, e.g.: - `context_engine.preflight_status_message(...) -> str | None` - `None` means suppress host-visible status 2. Let `should_compress_preflight(...)` return a richer result than a bool, such as: - `should_run: bool` - `status_message: str | None` - `status_kind: "silent" | "maintenance" | "compression"` 3. Add a simple capability/property on context engines, e.g.: - `suppress_generic_preflight_status = True` - `preflight_status_label = "LCM context maintenance..."` 4. At minimum, make the host check whether `context.engine != default/builtin` before emitting `📦 Preflight compression...`, and use neutral wording like: - `Checking context budget...` - `Running context maintenance...`

hermes - ✅(Solved) Fix Allow alternative context engines to suppress or customize preflight compression status [2 pull requests, 1 participants]

barronlroth · 2026-05-13T16:54:27Z

[hermes] Alternative context engines should be able to suppress or customize Hermes' generic user-facing preflight compression status message. Today, users can… Alternative context engines should be able to suppress or customize Hermes' generic user-facing preflight compression status message. Today, users can see host/gateway status such as: ```text 📦 Preflight compression... ``` That wording is misleading when the active context engine is not the built-in compressor. For example, when `context.engine: lcm` is active, LCM owns threshold decisions and may be doing lossless context maintenance rather than built-in lossy compression. The generic host text makes a healthy plugin-backed session look like it has fallen back to core compression or is repeatedly doing something wrong. # PR #20424: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316) - Repository: NousResearch/hermes-agent - Author: Beandon13 - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/20424 ## Description (problem / solution / changelog) ## Summary - ``run_conversation`` now consults ``ContextEngine.should_compress_preflight()`` when the request is below ``threshold_tokens``, so engines like hermes-lcm can run incremental leaf-chunk compaction (or other deferred maintenance) without waiting for the 75% context fill cutoff. - Default ``ContextEngine.should_compress_preflight()`` still returns ``False`` — the built-in ``ContextCompressor`` is unaffected. - Exceptions raised by the engine hook are caught at debug level and treated as "skip preflight", so a buggy plugin can't break an otherwise-healthy turn. Closes #20316 ## Testing - scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_fires_below_threshold tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_skipped_when_returns_false tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_exception_does_not_break_turn -q ``` ▶ running pytest with 4 workers, hermetic env, in /tmp/hermes-r2-1-fix (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset) bringing up nodes... bringing up nodes... ... [100%] 3 passed in 4.03s ``` - scripts/run_tests.sh tests/agent/test_context_engine.py -q ``` ................... [100%] 19 passed in 1.73s ``` - scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_context_compression_triggered tests/run_agent/test_run_agent.py::TestRunConversation::test_glm_prompt_exceeds_max_length_triggers_compression -q ``` .. [100%] 2 passed in 6.34s ``` ## Changed files - `run_agent.py` (modified, +31/-0) - `tests/run_agent/test_run_agent.py` (modified, +136/-0) --- # PR #15806: fix(run_agent): wire up should_compress_preflight() per-turn ingest hook - Repository: NousResearch/hermes-agent - Author: catgodtw - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/15806 ## Description (problem / solution / changelog) ContextEngine.should_compress_preflight() is documented as the per-turn ingest entry for plugin engines, but run_agent.py never calls it. PR #10088 explicitly noted this as dead code when skipping #9675: > #9675 (preflight check) — dead code, run_agent.py never calls > should_compress_preflight() This breaks plugin context engines that rely on the hook for per-turn message ingest. hermes-lcm overrides should_compress_preflight() to persist messages each turn into its DAG store, but with the hook never called, the lossless message store stays empty until compress() fires at the threshold (typically ~96K tokens). Reproducible: $ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 0 (Verified on hermes-agent v0.11.0 with hermes-lcm v0.7.0.) Add two calls to should_compress_preflight(messages): 1. Top of the main loop, right after api_call_count is incremented — per-turn ingest before each API call. 2. End of run_conversation(), before the on_session_end plugin hook — final flush so the last assistant message reaches the engine when the turn exited via the no-tool-calls branch and skipped the per-turn hook above. The return value is discarded; compression is still decided by the later should_compress(_real_tokens) call which uses the provider- reported token count. Both calls are wrapped in try/except so a misbehaving plugin engine cannot break the conversation loop. Default ContextEngine.should_compress_preflight() returns False with no work, so this is zero overhead for the built-in ContextCompressor and any engine that does not override the hook. After this fix: $ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 2 Refs: - #9675 (closed: feat(compressor): implement preflight compression check) - #10088 (merged body: skipped #9675 as dead code) - stephenschoettler/hermes-lcm#68 (LCM author flagged host integration issue but could not file upstream because GitHub Issues

hermes2026-05-13 16:54:27

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#25115•Fetched 2026-05-14 03:48:47

View on GitHub

Comments

Participants

Timeline

Reactions

Author

barronlroth

Participants

barronlroth

Timeline (top)

cross-referenced ×4labeled ×4

Alternative context engines should be able to suppress or customize Hermes' generic user-facing preflight compression status message.

Today, users can see host/gateway status such as:

📦 Preflight compression...

That wording is misleading when the active context engine is not the built-in compressor. For example, when context.engine: lcm is active, LCM owns threshold decisions and may be doing lossless context maintenance rather than built-in lossy compression. The generic host text makes a healthy plugin-backed session look like it has fallen back to core compression or is repeatedly doing something wrong.

Root Cause

For context-engine plugins, user-facing status text is part of the operational contract. If the host says "compression" while a plugin like LCM is doing lossless context maintenance, users/operators reasonably suspect a fallback, regression, or data-loss path.

The runtime may be correct, but the status message creates false alarms. Alternative context engines should have a small API surface to keep host-visible preflight messaging accurate.

Fix Action

Fixed

Fixed by PR: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316) (https://github.com/NousResearch/hermes-agent/pull/20424)
Fixed by PR: fix(run_agent): wire up should_compress_preflight() per-turn ingest hook (https://github.com/NousResearch/hermes-agent/pull/15806)

PR fix notes

PR #20424: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316)

Repository: NousResearch/hermes-agent
Author: Beandon13
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/20424

Description (problem / solution / changelog)

Summary

run_conversation now consults ContextEngine.should_compress_preflight() when the request is below threshold_tokens, so engines like hermes-lcm can run incremental leaf-chunk compaction (or other deferred maintenance) without waiting for the 75% context fill cutoff.
Default ContextEngine.should_compress_preflight() still returns False — the built-in ContextCompressor is unaffected.
Exceptions raised by the engine hook are caught at debug level and treated as "skip preflight", so a buggy plugin can't break an otherwise-healthy turn.

Closes #20316

Testing

scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_fires_below_threshold tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_skipped_when_returns_false tests/run_agent/test_run_agent.py::TestRunConversation::test_engine_preflight_exception_does_not_break_turn -q

▶ running pytest with 4 workers, hermetic env, in /tmp/hermes-r2-1-fix
  (TZ=UTC LANG=C.UTF-8 PYTHONHASHSEED=0; all credential env vars unset)
bringing up nodes...
bringing up nodes...

...                                                                      [100%]
3 passed in 4.03s

scripts/run_tests.sh tests/agent/test_context_engine.py -q

...................                                                      [100%]
19 passed in 1.73s

scripts/run_tests.sh tests/run_agent/test_run_agent.py::TestRunConversation::test_context_compression_triggered tests/run_agent/test_run_agent.py::TestRunConversation::test_glm_prompt_exceeds_max_length_triggers_compression -q

..                                                                       [100%]
2 passed in 6.34s

Changed files

run_agent.py (modified, +31/-0)
tests/run_agent/test_run_agent.py (modified, +136/-0)

PR #15806: fix(run_agent): wire up should_compress_preflight() per-turn ingest hook

Repository: NousResearch/hermes-agent
Author: catgodtw
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/15806

Description (problem / solution / changelog)

ContextEngine.should_compress_preflight() is documented as the per-turn ingest entry for plugin engines, but run_agent.py never calls it. PR #10088 explicitly noted this as dead code when skipping #9675:

#9675 (preflight check) — dead code, run_agent.py never calls should_compress_preflight()

This breaks plugin context engines that rely on the hook for per-turn message ingest. hermes-lcm overrides should_compress_preflight() to persist messages each turn into its DAG store, but with the hook never called, the lossless message store stays empty until compress() fires at the threshold (typically ~96K tokens). Reproducible:

$ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 0

(Verified on hermes-agent v0.11.0 with hermes-lcm v0.7.0.)

Add two calls to should_compress_preflight(messages):

Top of the main loop, right after api_call_count is incremented — per-turn ingest before each API call.
End of run_conversation(), before the on_session_end plugin hook — final flush so the last assistant message reaches the engine when the turn exited via the no-tool-calls branch and skipped the per-turn hook above.

The return value is discarded; compression is still decided by the later should_compress(_real_tokens) call which uses the provider- reported token count. Both calls are wrapped in try/except so a misbehaving plugin engine cannot break the conversation loop.

Default ContextEngine.should_compress_preflight() returns False with no work, so this is zero overhead for the built-in ContextCompressor and any engine that does not override the hook.

After this fix: $ hermes chat -q "test" -Q $ sqlite3 ~/.hermes/lcm.db "SELECT COUNT(*) FROM messages;" 2

Refs:

#9675 (closed: feat(compressor): implement preflight compression check)
#10088 (merged body: skipped #9675 as dead code)
stephenschoettler/hermes-lcm#68 (LCM author flagged host integration issue but could not file upstream because GitHub Issues was off on a different fork)

What does this PR do?

Related Issue

Fixes #

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
🔒 Security fix
📝 Documentation update
✅ Tests (adding or improving test coverage)
♻️ Refactor (no behavior change)
🎯 New skill (bundled or hub)

Changes Made

How to Test

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature (no unrelated commits)
I've run pytest tests/ -q and all tests pass
I've added tests for my changes (required for bug fixes, strongly encouraged for features)
I've tested on my platform:

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) — or N/A
I've updated cli-config.yaml.example if I added/changed config keys — or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
I've updated tool descriptions/schemas if I changed tool behavior — or N/A

For New Skills

This skill is broadly useful to most users (if bundled) — see Contributing Guide
SKILL.md follows the standard format (frontmatter, trigger conditions, steps, pitfalls)
No external dependencies that aren't already available (prefer stdlib, curl, existing Hermes tools)
I've tested the skill end-to-end: hermes --toolsets skills -q "Use the X skill to do Y"

Screenshots / Logs

Changed files

run_agent.py (modified, +29/-0)

Code Example

📦 Preflight compression...

RAW_BUFFERClick to expand / collapse

Summary

Alternative context engines should be able to suppress or customize Hermes' generic user-facing preflight compression status message.

Today, users can see host/gateway status such as:

📦 Preflight compression...

Problem

Hermes core currently emits a generic preflight-compression status string from the host side. Plugin/alternative context engines have limited ability to say:

this preflight is expected and should be silent
this preflight is maintenance, not built-in compression
this compaction is handled by a plugin-specific mechanism
this status should use engine-specific language

This creates avoidable operator confusion for context-engine plugins.

Concrete example: hermes-lcm

With hermes-lcm:

context.engine: lcm
compression.enabled: true remains enabled as the host-level compaction gate
LCM_CONTEXT_THRESHOLD is the threshold LCM owns
core compression.threshold belongs to the built-in compressor

Related clarification: stephenschoettler/hermes-lcm#68 established that if LCM is actually loaded, preflight/compaction checks should go through the active context engine, not the built-in compressor threshold. That issue also exposed how host-side compression/status signals can be misleading even when LCM is working.

There is also a plugin-side issue tracking the same UX symptom from the LCM perspective:

stephenschoettler/hermes-lcm#168

Expected behavior

When an alternative context engine is active, Hermes should avoid showing generic built-in-compressor wording unless the built-in compressor is actually being used.

Acceptable implementation shapes:

Add a context-engine API for a custom preflight status label, e.g.:
- context_engine.preflight_status_message(...) -> str | None
- None means suppress host-visible status
Let should_compress_preflight(...) return a richer result than a bool, such as:
- should_run: bool
- status_message: str | None
- status_kind: "silent" | "maintenance" | "compression"
Add a simple capability/property on context engines, e.g.:
- suppress_generic_preflight_status = True
- preflight_status_label = "LCM context maintenance..."
At minimum, make the host check whether context.engine != default/builtin before emitting 📦 Preflight compression..., and use neutral wording like:
- Checking context budget...
- Running context maintenance...

Suggested acceptance criteria

With context.engine: lcm, users no longer see generic 📦 Preflight compression... unless the built-in compressor is actually the active engine.
Alternative context engines can opt into one of:
- silent preflight maintenance
- engine-specific preflight text
- neutral host text that does not imply built-in compression
Existing built-in compressor UX is preserved for default Hermes compression.
The solution composes with related preflight/context-engine work, including:
- #20316
- #20424

Why this matters

The runtime may be correct, but the status message creates false alarms. Alternative context engines should have a small API surface to keep host-visible preflight messaging accurate.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

When an alternative context engine is active, Hermes should avoid showing generic built-in-compressor wording unless the built-in compressor is actually being used.

Acceptable implementation shapes:

Add a context-engine API for a custom preflight status label, e.g.:
- context_engine.preflight_status_message(...) -> str | None
- None means suppress host-visible status
Let should_compress_preflight(...) return a richer result than a bool, such as:
- should_run: bool
- status_message: str | None
- status_kind: "silent" | "maintenance" | "compression"
Add a simple capability/property on context engines, e.g.:
- suppress_generic_preflight_status = True
- preflight_status_label = "LCM context maintenance..."
At minimum, make the host check whether context.engine != default/builtin before emitting 📦 Preflight compression..., and use neutral wording like:
- Checking context budget...
- Running context maintenance...

#api #container setup #orchestration issue #cache issue #memory leak

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Allow alternative context engines to suppress or customize preflight compression status [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #20424: fix(run_agent): call should_compress_preflight() for sub-threshold engines (#20316)

Description (problem / solution / changelog)

Summary

Testing

Changed files

PR #15806: fix(run_agent): wire up should_compress_preflight() per-turn ingest hook

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Changed files

Code Example

Summary

Problem

Concrete example: hermes-lcm

Expected behavior

Suggested acceptance criteria

Why this matters

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING