hermes - 💡(How to fix) Fix Context compaction can misread preserved todo/tool state as current user intent and leak MEDIA directives [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14665Fetched 2026-04-24 06:15:29
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3commented ×1

Root Cause

There are two separate bad outcomes here:

RAW_BUFFERClick to expand / collapse

Bug description

context compression can preserve cross-session/tool state in a way that looks like a fresh user request in the new session.

In the failure mode I hit, three things stack together:

  1. the compaction summary carries forward an old ## Active Task
  2. the preserved todo list is injected as a normal user message
  3. tool outputs such as memory / session_search are serialized verbatim into the summarizer input, including strings like MEDIA:

That can cause the resumed assistant to follow an old task instead of the latest real user message, and can also make MEDIA: directives leak back into normal assistant text.

Why this matters

There are two separate bad outcomes here:

1) Wrong task resumption after compaction

The post-compaction todo injection currently looks like ordinary conversation text, so the model can treat it as the current user ask.

2) MEDIA: directive contamination

If memory / session_search / other tool results contain text like MEDIA:/tmp/foo.png, that text can be preserved in the compaction chain and later echoed by the model as plain content.

On gateway integrations that parse MEDIA: tags for file delivery, this can lead to bogus attachment attempts (for example trying to send a non-existent file path extracted from quoted prose or preference text).

Minimal repro shape

A deterministic repro can be built with a compressed conversation containing:

  • a compaction summary with an old ## Active Task
  • a preserved active todo snapshot
  • a memory or session_search tool result containing MEDIA: text
  • a latest real user message that should be the only active request

Observed behavior:

  • the assistant may resume the old task / preserved todo state instead of the latest real user message
  • MEDIA: text from tool state can survive into later assistant-visible context as if it were ordinary text

Suspect locations

  • agent/context_compressor.py
    • _serialize_for_summary() currently serializes tool result content and tool-call args directly into the summarizer input
  • tools/todo_tool.py
    • format_for_injection() renders preserved todo state as natural-language text
  • run_agent.py
    • _compress_context() injects the todo snapshot back into the compressed message list as a user message

Why the existing gateway-side MEDIA: hardening is not enough

I know there was already work around stricter MEDIA: extraction in gateway parsing, but this bug happens earlier in the pipeline:

  • summary contamination / stale task carry-over
  • todo state being injected as if it were a user utterance
  • tool-state text containing control directives being preserved and resurfaced

So even if gateway extraction is stricter, the conversation state can still get semantically polluted after compaction.

Suggested fix directions

  1. Treat memory, session_search, todo and similar tool state as non-intent state, not current user intent, when building summary input
  2. Mask control directives like MEDIA: before tool outputs are fed into compaction summaries
  3. Do not inject preserved todo state as natural-language text that looks like a fresh user message
  4. Ensure preserved todo state does not outrank the latest real user message after compaction

Regression coverage that would be useful

  • summary input containing memory / session_search results with MEDIA: should not preserve raw MEDIA: tokens
  • preserved todo state should be clearly machine-generated state, not look like a new user request
  • after compaction, the latest real user message should remain the active request even when summary + preserved todo state are both present

If helpful, I can turn the local repro/fix into a PR next.

extent analysis

TL;DR

The issue can be fixed by modifying the context compression to treat tool state as non-intent state and mask control directives like MEDIA:.

Guidance

  • Modify the _serialize_for_summary() function in agent/context_compressor.py to exclude tool result content and tool-call args from the summarizer input.
  • Update the format_for_injection() function in tools/todo_tool.py to render preserved todo state in a way that is clearly distinguishable from natural-language text.
  • Change the _compress_context() function in run_agent.py to inject preserved todo state in a way that does not outrank the latest real user message.
  • Consider adding regression tests to ensure that summary input containing memory / session_search results with MEDIA: does not preserve raw MEDIA: tokens.

Example

# agent/context_compressor.py
def _serialize_for_summary(self, tool_results):
    # Mask control directives like MEDIA:
    tool_results = [result.replace('MEDIA:', '') for result in tool_results]
    # ...

Notes

The suggested fix directions provided in the issue are a good starting point, but the actual implementation may require additional changes to ensure that the context compression is working correctly.

Recommendation

Apply workaround by modifying the context compression to treat tool state as non-intent state and mask control directives like MEDIA:. This should prevent the issue of wrong task resumption and MEDIA: directive contamination.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING