hermes - 💡(How to fix) Fix [Feature] ContextEngine: per-turn message observation hook (currently requires abusing compress() as a backdoor)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

I'm building hermes-mneme — a retrieval-based ContextEngine plugin that maintains a persistent SQLite event store + embedding index for the conversation, then assembles each prompt from a token-budget-respecting mix of recent turns, retrieved fragments, and execution state.

To do this well, the plugin needs to see the full messages: List[Dict] on every turn so it can:

  • Ingest new messages into the event store (with deterministic event_ids for idempotent re-ingest)
  • Index them for semantic recall
  • Track an execution graph (tool_call → tool_output → decision)

Root Cause

  • should_compress lies. It always returns True, which defeats its purpose as a token-budget gate.
  • compression_count (tracked on ContextEngine for HUD display) inflates wildly — every turn looks like a compression event.
  • Plugins doing pure observation pay the cost of compress()'s wider contract — they must always return a valid messages list, even when they want to do nothing.
  • New plugin authors have to reverse-engineer this idiom from existing plugins; there's nothing in the docs about it. MemoryProvider has on_turn_start(turn_number, message, **kwargs) for analogous needs, but it gets only the latest message, not the full history.

Fix Action

Fix / Workaround

This pushes plugins that need per-turn observation into an architecturally wrong workaround: force should_compress() to always return True, then use compress() as a per-turn "see the messages" callback. From hermes-mneme's engine.py:495:

Code Example

def should_compress(self, prompt_tokens: int = None) -> bool:
    """Always returns Truethis engine replaces the default compressor
    and must be called on every turn to assemble context.
    The plugin manages its own token budget inside compress().
    """
    return True

---

def on_turn_complete(
    self,
    messages: List[Dict[str, Any]],
    usage: Dict[str, Any],
    **kwargs,
) -> None:
    """Called after each turn's response, with the current full message list.

    Engines can use this for per-turn observation (ingestion, indexing,
    state tracking) without having to override should_compress / compress.
    Default is a no-op.
    """
RAW_BUFFERClick to expand / collapse

Context

I'm building hermes-mneme — a retrieval-based ContextEngine plugin that maintains a persistent SQLite event store + embedding index for the conversation, then assembles each prompt from a token-budget-respecting mix of recent turns, retrieved fragments, and execution state.

To do this well, the plugin needs to see the full messages: List[Dict] on every turn so it can:

  • Ingest new messages into the event store (with deterministic event_ids for idempotent re-ingest)
  • Index them for semantic recall
  • Track an execution graph (tool_call → tool_output → decision)

The problem

ContextEngine (agent/context_engine.py) exposes the message list only inside compress(messages, ...). The other hooks — should_compress, update_from_response, on_session_start/end/reset — don't receive messages.

This pushes plugins that need per-turn observation into an architecturally wrong workaround: force should_compress() to always return True, then use compress() as a per-turn "see the messages" callback. From hermes-mneme's engine.py:495:

def should_compress(self, prompt_tokens: int = None) -> bool:
    """Always returns True — this engine replaces the default compressor
    and must be called on every turn to assemble context.
    The plugin manages its own token budget inside compress().
    """
    return True

And an in-line comment further down in compress() (engine.py:751) explicitly documents the intent:

"NB: we do NOT mark _pending_compression here. The plugin calls compress() on every turn purely to ingest messages and assemble a retrieval tail — this is NOT a session boundary."

Why this matters

  • should_compress lies. It always returns True, which defeats its purpose as a token-budget gate.
  • compression_count (tracked on ContextEngine for HUD display) inflates wildly — every turn looks like a compression event.
  • Plugins doing pure observation pay the cost of compress()'s wider contract — they must always return a valid messages list, even when they want to do nothing.
  • New plugin authors have to reverse-engineer this idiom from existing plugins; there's nothing in the docs about it. MemoryProvider has on_turn_start(turn_number, message, **kwargs) for analogous needs, but it gets only the latest message, not the full history.

Proposal

Add an optional lifecycle hook to ContextEngine:

def on_turn_complete(
    self,
    messages: List[Dict[str, Any]],
    usage: Dict[str, Any],
    **kwargs,
) -> None:
    """Called after each turn's response, with the current full message list.

    Engines can use this for per-turn observation (ingestion, indexing,
    state tracking) without having to override should_compress / compress.
    Default is a no-op.
    """

Wired into run_agent.py at the same call site as update_from_response() (which is already the per-turn callback site — it just doesn't receive messages).

Reasonable alternative names: on_messages_updated, observe_turn. The signature could also pass turn_number for symmetry with MemoryProvider.on_turn_start(turn_number, message, **kwargs).

Migration impact

Zero. The hook is optional with a no-op default. Existing engines (including the built-in ContextCompressor) need no changes. Plugins like hermes-mneme can migrate at their own pace and drop the should_compress = True hack.

Happy to PR

If this direction sounds reasonable I'd be glad to send a PR — happy to align on the signature first.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING