hermes - 💡(How to fix) Fix [Feature] ContextEngine: per-turn message observation hook (currently requires abusing compress() as a backdoor)

hermes2026-05-11 15:00:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

I'm building hermes-mneme — a retrieval-based ContextEngine plugin that maintains a persistent SQLite event store + embedding index for the conversation, then assembles each prompt from a token-budget-respecting mix of recent turns, retrieved fragments, and execution state.

To do this well, the plugin needs to see the full messages: List[Dict] on every turn so it can:

Ingest new messages into the event store (with deterministic event_ids for idempotent re-ingest)
Index them for semantic recall
Track an execution graph (tool_call → tool_output → decision)

Root Cause

should_compress lies. It always returns True, which defeats its purpose as a token-budget gate.
compression_count (tracked on ContextEngine for HUD display) inflates wildly — every turn looks like a compression event.
Plugins doing pure observation pay the cost of compress()'s wider contract — they must always return a valid messages list, even when they want to do nothing.
New plugin authors have to reverse-engineer this idiom from existing plugins; there's nothing in the docs about it. MemoryProvider has on_turn_start(turn_number, message, **kwargs) for analogous needs, but it gets only the latest message, not the full history.

Fix Action

Fix / Workaround

This pushes plugins that need per-turn observation into an architecturally wrong workaround: force should_compress() to always return True, then use compress() as a per-turn "see the messages" callback. From hermes-mneme's engine.py:495:

Code Example

def should_compress(self, prompt_tokens: int = None) -> bool:
    """Always returns True — this engine replaces the default compressor
    and must be called on every turn to assemble context.
    The plugin manages its own token budget inside compress().
    """
    return True

---

def on_turn_complete(
    self,
    messages: List[Dict[str, Any]],
    usage: Dict[str, Any],
    **kwargs,
) -> None:
    """Called after each turn's response, with the current full message list.

    Engines can use this for per-turn observation (ingestion, indexing,
    state tracking) without having to override should_compress / compress.
    Default is a no-op.
    """

RAW_BUFFERClick to expand / collapse

Context

To do this well, the plugin needs to see the full messages: List[Dict] on every turn so it can:

Ingest new messages into the event store (with deterministic event_ids for idempotent re-ingest)
Index them for semantic recall
Track an execution graph (tool_call → tool_output → decision)

The problem

ContextEngine (agent/context_engine.py) exposes the message list only inside compress(messages, ...). The other hooks — should_compress, update_from_response, on_session_start/end/reset — don't receive messages.

def should_compress(self, prompt_tokens: int = None) -> bool:
    """Always returns True — this engine replaces the default compressor
    and must be called on every turn to assemble context.
    The plugin manages its own token budget inside compress().
    """
    return True

And an in-line comment further down in compress() (engine.py:751) explicitly documents the intent:

"NB: we do NOT mark _pending_compression here. The plugin calls compress() on every turn purely to ingest messages and assemble a retrieval tail — this is NOT a session boundary."

Why this matters

should_compress lies. It always returns True, which defeats its purpose as a token-budget gate.
compression_count (tracked on ContextEngine for HUD display) inflates wildly — every turn looks like a compression event.
Plugins doing pure observation pay the cost of compress()'s wider contract — they must always return a valid messages list, even when they want to do nothing.
New plugin authors have to reverse-engineer this idiom from existing plugins; there's nothing in the docs about it. MemoryProvider has on_turn_start(turn_number, message, **kwargs) for analogous needs, but it gets only the latest message, not the full history.

Proposal

Add an optional lifecycle hook to ContextEngine:

def on_turn_complete(
    self,
    messages: List[Dict[str, Any]],
    usage: Dict[str, Any],
    **kwargs,
) -> None:
    """Called after each turn's response, with the current full message list.

    Engines can use this for per-turn observation (ingestion, indexing,
    state tracking) without having to override should_compress / compress.
    Default is a no-op.
    """

Wired into run_agent.py at the same call site as update_from_response() (which is already the per-turn callback site — it just doesn't receive messages).

Reasonable alternative names: on_messages_updated, observe_turn. The signature could also pass turn_number for symmetry with MemoryProvider.on_turn_start(turn_number, message, **kwargs).

Migration impact

Zero. The hook is optional with a no-op default. Existing engines (including the built-in ContextCompressor) need no changes. Plugins like hermes-mneme can migrate at their own pace and drop the should_compress = True hack.

Happy to PR

If this direction sounds reasonable I'd be glad to send a PR — happy to align on the signature first.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#latency issue #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Feature] ContextEngine: per-turn message observation hook (currently requires abusing compress() as a backdoor)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Context

The problem

Why this matters

Proposal

Migration impact

Happy to PR

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Feature] ContextEngine: per-turn message observation hook (currently requires abusing compress() as a backdoor)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Context

The problem

Why this matters

Proposal

Migration impact

Happy to PR

Still need to ship something?

RELATED_DISCOVERY

TRENDING