hermes - 💡(How to fix) Fix Context compaction can misread preserved todo/tool state as current user intent and leak MEDIA directives [1 comments, 2 participants]

Bug description

context compression can preserve cross-session/tool state in a way that looks like a fresh user request in the new session.

In the failure mode I hit, three things stack together:

the compaction summary carries forward an old ## Active Task
the preserved todo list is injected as a normal user message
tool outputs such as memory / session_search are serialized verbatim into the summarizer input, including strings like MEDIA:

That can cause the resumed assistant to follow an old task instead of the latest real user message, and can also make MEDIA: directives leak back into normal assistant text.

Why this matters

There are two separate bad outcomes here:

1) Wrong task resumption after compaction

The post-compaction todo injection currently looks like ordinary conversation text, so the model can treat it as the current user ask.

2) `MEDIA:` directive contamination

If memory / session_search / other tool results contain text like MEDIA:/tmp/foo.png, that text can be preserved in the compaction chain and later echoed by the model as plain content.

On gateway integrations that parse MEDIA: tags for file delivery, this can lead to bogus attachment attempts (for example trying to send a non-existent file path extracted from quoted prose or preference text).

Minimal repro shape

A deterministic repro can be built with a compressed conversation containing:

a compaction summary with an old ## Active Task
a preserved active todo snapshot
a memory or session_search tool result containing MEDIA: text
a latest real user message that should be the only active request

Observed behavior:

the assistant may resume the old task / preserved todo state instead of the latest real user message
MEDIA: text from tool state can survive into later assistant-visible context as if it were ordinary text

Suspect locations

agent/context_compressor.py
- _serialize_for_summary() currently serializes tool result content and tool-call args directly into the summarizer input
tools/todo_tool.py
- format_for_injection() renders preserved todo state as natural-language text
run_agent.py
- _compress_context() injects the todo snapshot back into the compressed message list as a user message

Why the existing gateway-side `MEDIA:` hardening is not enough

I know there was already work around stricter MEDIA: extraction in gateway parsing, but this bug happens earlier in the pipeline:

summary contamination / stale task carry-over
todo state being injected as if it were a user utterance
tool-state text containing control directives being preserved and resurfaced

So even if gateway extraction is stricter, the conversation state can still get semantically polluted after compaction.

Suggested fix directions

Treat memory, session_search, todo and similar tool state as non-intent state, not current user intent, when building summary input
Mask control directives like MEDIA: before tool outputs are fed into compaction summaries
Do not inject preserved todo state as natural-language text that looks like a fresh user message
Ensure preserved todo state does not outrank the latest real user message after compaction

Regression coverage that would be useful

summary input containing memory / session_search results with MEDIA: should not preserve raw MEDIA: tokens
preserved todo state should be clearly machine-generated state, not look like a new user request
after compaction, the latest real user message should remain the active request even when summary + preserved todo state are both present

If helpful, I can turn the local repro/fix into a PR next.

extent analysis

TL;DR

The issue can be fixed by modifying the context compression to treat tool state as non-intent state and mask control directives like MEDIA:.

Guidance

Modify the _serialize_for_summary() function in agent/context_compressor.py to exclude tool result content and tool-call args from the summarizer input.
Update the format_for_injection() function in tools/todo_tool.py to render preserved todo state in a way that is clearly distinguishable from natural-language text.
Change the _compress_context() function in run_agent.py to inject preserved todo state in a way that does not outrank the latest real user message.
Consider adding regression tests to ensure that summary input containing memory / session_search results with MEDIA: does not preserve raw MEDIA: tokens.

Example

# agent/context_compressor.py
def _serialize_for_summary(self, tool_results):
    # Mask control directives like MEDIA:
    tool_results = [result.replace('MEDIA:', '') for result in tool_results]
    # ...

Notes

The suggested fix directions provided in the issue are a good starting point, but the actual implementation may require additional changes to ensure that the context compression is working correctly.

Recommendation

Apply workaround by modifying the context compression to treat tool state as non-intent state and mask control directives like MEDIA:. This should prevent the issue of wrong task resumption and MEDIA: directive contamination.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Context compaction can misread preserved todo/tool state as current user intent and leak MEDIA directives [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Bug description

Why this matters

1) Wrong task resumption after compaction

2) `MEDIA:` directive contamination

Minimal repro shape

Suspect locations

Why the existing gateway-side `MEDIA:` hardening is not enough

Suggested fix directions

Regression coverage that would be useful

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Context compaction can misread preserved todo/tool state as current user intent and leak MEDIA directives [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Bug description

Why this matters

1) Wrong task resumption after compaction

2) MEDIA: directive contamination

Minimal repro shape

Suspect locations

Why the existing gateway-side MEDIA: hardening is not enough

Suggested fix directions

Regression coverage that would be useful

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

2) `MEDIA:` directive contamination

Why the existing gateway-side `MEDIA:` hardening is not enough