hermes - ✅(Solved) Fix Context compression + session resume causes model to re-execute the original first task instead of continuing from compressed state [1 pull requests, 1 comments, 2 participants]

hermes2026-04-29 08:18:39

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#17344•Fetched 2026-04-30 06:48:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Zjianru

Participants

alt-glitch

Zjianru

Timeline (top)

labeled ×3commented ×1cross-referenced ×1

Root Cause

My issue and PR #17301 may share a common root cause — both involve the SUMMARY_PREFIX / compression handoff being interpreted too broadly by the model. PR #17301 exempts memory and skills; the fix needed for my issue may require clarifying how the model should treat the preserved first user message in the tail.

Fix Action

Fixed

Fixed by PR: fix(compressor): shrink protect_first_n on recompaction (#17344) (https://github.com/NousResearch/hermes-agent/pull/17349)

PR fix notes

PR #17349: fix(compressor): shrink protect_first_n on recompaction (#17344)

Repository: NousResearch/hermes-agent
Author: Sanjays2402
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/17349

Description (problem / solution / changelog)

Closes #17344.

Bug

Reporter traced a 6-session compression chain in which every child session carried the identical original first user request — as if no progress had been made. After resume (or on new sessions opened post-compression), the model re-executes the original first task instead of continuing from the handoff summary's ## Active Task.

Root cause

ContextCompressor protects protect_first_n=3 messages at the head — [system, user1, assistant1]. On every cycle that head is preserved verbatim:

[system + compaction note, user1 (ORIGINAL), assistant1, summary, …tail…, latest_user]

SUMMARY_PREFIX says "resume from ## Active Task," but user1 (ORIGINAL) is sitting right next to it as a still-prominent user-role message. The model latches onto the first plausible unanswered request and re-executes it — structured summary prose loses against direct attention on a user message. After 6 cycles that same user1 has been re-anchored 6 times.

Fix

On the second and subsequent compactions — detected by checking whether messages[0] (the system prompt) already carries the compaction note we appended last time — shrink protect_first_n to 1 for that call. The original [user1, assistant1] then flow into the summariser pool, and the structured ## Active Task section becomes the sole steering signal as designed.

The shrink is per-call; self.protect_first_n is left untouched so fresh sessions continue to use the configured default.

effective_protect_first_n = self.protect_first_n
if self._is_recompaction(messages) and self.protect_first_n > 1:
    effective_protect_first_n = 1

Detection signal

Reuses the existing compaction note already written to the system prompt on first compaction. A new _COMPRESSION_NOTE_SENTINEL constant captures a stable substring ("earlier conversation turns have been compacted into a handoff summary") so PR #17301 — which expands the note text — will not break detection. New helper ContextCompressor._is_recompaction(messages) does the lookup with no I/O, returns False on malformed input, and handles multimodal system content via the existing _content_text_for_contains() helper.

Why not just strengthen `SUMMARY_PREFIX`?

The prefix already says "Respond ONLY to the latest user message that appears AFTER this summary." Stronger prose helps marginally but cannot compete with structural attention on a head-preserved user message. The reporter explicitly noted: "the model responds as if the session had just started." That's an architectural problem, not a wording problem.

Coordination with PR #17301 / #17251

PR #17301 (open, by @HiddenPuppy) addresses a sibling problem: SUMMARY_PREFIX over-applies "background reference" framing to memory and skills. Both fixes stem from the same root concern (compaction handoff misinterpreted by the model) but are orthogonal — #17301 carves out exceptions inside SUMMARY_PREFIX text; this PR shrinks protect_first_n on recompaction. They compose cleanly; merge order doesn't matter.

Out of scope

The reporter also flagged parent_session_id = NULL observations on chained sessions. That's a separate DB-write concern — run_agent.py:8891 explicitly passes parent_session_id=old_session_id and resolve_resume_session_id (#15000) handles chain-walking. If NULL is observed it's likely a different write-path failure and deserves its own bug. This PR stays focused on the message-level fix that unbreaks the user-visible "restarts first task" behaviour.

Tests

TestIsRecompaction (6 cases) — sentinel detection edge cases:

test_fresh_system_prompt_is_not_recompaction
test_system_prompt_with_compaction_note_is_recompaction
test_empty_messages_safe
test_non_system_first_message_is_not_recompaction
test_multimodal_system_content_is_inspected
test_garbage_content_does_not_raise

TestRecompactionShrinksProtectFirstN (5 cases) — behavioural:

test_first_compaction_preserves_first_exchange_in_head (control)
test_recompaction_demotes_first_exchange_to_summary (the bug)
test_recompaction_preserves_latest_user_message_in_tail
test_recompaction_keeps_protect_first_n_attribute_unchanged
test_protect_first_n_one_no_op_for_recompaction

TestRecompactionMinForCompressGate (1 case) — _min_for_compress early-return uses the effective (post-shrink) head count.

$ python -m pytest tests/agent/test_context_compressor.py \
                   tests/agent/test_context_compressor_recompaction.py \
                   tests/run_agent/test_compression_boundary_hook.py \
                   tests/run_agent/test_compression_persistence.py \
                   tests/run_agent/test_413_compression.py -q
99 passed in 8.26s

87 pre-existing + 12 new, zero regressions.

Changed files

agent/context_compressor.py (modified, +48/-2)
tests/agent/test_context_compressor_recompaction.py (added, +234/-0)

Code Example

[0] system:     (with compaction note appended)
[1] user:       "original first user request"   ← preserved in tail
[2] assistant:  "[CONTEXT COMPACTION] Summary:
                ## Goal: ...
                ## Progress: ...
                ## Active Task: ...
                ## Next Steps: ..."
[3] user:       "latest user message"

RAW_BUFFERClick to expand / collapse

Bug Description

When a foreground session undergoes context compression and is subsequently resumed (via hermes resume or by reopening the session), the model appears to re-execute the original first task instead of continuing from the compressed state. All progress encoded in the compression summary is ignored, and the model responds as if the session had just started.

This behavior has been observed in:

New sessions opened after a compression cycle completes
Resumed sessions restored via hermes resume

Expected Behavior

After context compression, a resumed session should:

Load the compression summary (containing ## Goal, ## Progress, ## Active Task, ## Next Steps)
Continue from ## Active Task / ## Next Steps as described in the summary
Treat the preserved original first user message in the tail as historical context — not as an active instruction to execute

Actual Behavior

The model responds to the original first user message preserved in the tail, effectively restarting the first task from scratch. The compression summary is either ignored or overridden by the model's tendency to respond to the most prominent user message in context.

Technical Background

How compression is structured (from official documentation)

From the official context compression docs, after a compression cycle the message list has this structure:

[0] system:     (with compaction note appended)
[1] user:       "original first user request"   ← preserved in tail
[2] assistant:  "[CONTEXT COMPACTION] Summary:
                ## Goal: ...
                ## Progress: ...
                ## Active Task: ...
                ## Next Steps: ..."
[3] user:       "latest user message"

The design intention is: the model reads ## Active Task / ## Next Steps and continues from there, while the preserved tail messages serve as historical context.

What I observe in affected sessions

I traced a 6-session compression chain where each session ended in compression over ~18 hours. Every child session carried the identical original first user message, as if no progress had been made across the entire chain. The compression summaries described completed work, yet on resume the model appeared to restart from the preserved first message.

I also verified that parent_session_id in the SQLite sessions table shows NULL for sessions that are clearly part of a compression chain — which may indicate the session lineage is not being properly tracked, or that new sessions are being created instead of resumed ones during the compression handoff.

Related Issues and PRs

PR #17301 — "fix: exempt memory and skills from 'background reference' label in context compaction"

(Opened today by HiddenPuppy)

This PR addresses a related but distinct issue: after compaction, the SUMMARY_PREFIX instruction causes the agent to treat its own persistent memory and skills as "background reference," making them effectively inaccessible. This results in the agent "forgetting" facts about the session after restart.

Reference: https://github.com/NousResearch/hermes-agent/pull/17301

Issue #17251 — "Context Compaction Demotes Memory to Background Reference"

(Opened today by ifearghal)

Reporter describes identical symptoms: after gateway restart / context compaction, the agent "wakes up with no memory of the session," doesn't know key infrastructure details, and asks the user to re-explain everything. This confirms the issue is not isolated to my environment.

Reference: https://github.com/NousResearch/hermes-agent/issues/17251

Questions and Offer to Assist

Is there an intended mechanism that should prevent the model from responding to the preserved original first user message after compression? Or is the model expected to self-regulate by reading ## Active Task?
Could the role-merge logic (summary and tail getting merged when they share the same role) be causing the ## Active Task section to lose prominence in the model's attention?
Would PR #17301's fix (adding EXCEPTIONS to SUMMARY_PREFIX) also address this variant, or does this require a separate fix targeting how the preserved first user message is handled?
Is the parent_session_id = NULL behavior in the compression chain expected? This may be contributing to sessions being reopened as "new" instead of "resumed."

I am happy to provide any additional session data, traces, or diagnostic information that would help. If this is best addressed via a PR, I am also willing to work on a fix — please let me know which failure point you recommend targeting first.

extent analysis

TL;DR

The model likely needs a mechanism to ignore the preserved original first user message after context compression and instead continue from the ## Active Task described in the compression summary.

Guidance

Investigate the role-merge logic to determine if it's causing the ## Active Task section to lose prominence in the model's attention.
Review PR #17301 to see if its fix for exempting memory and skills from 'background reference' label could also address the issue of the model responding to the preserved original first user message.
Verify if the parent_session_id = NULL behavior in the compression chain is expected and if it's contributing to sessions being reopened as "new" instead of "resumed".
Consider adding a mechanism to explicitly mark the preserved first user message as historical context, preventing the model from responding to it as an active instruction.

Example

No code example is provided due to the lack of specific implementation details in the issue.

Notes

The issue may share a common root cause with PR #17301, and addressing one issue may help resolve the other. However, without further investigation, it's unclear if the same fix will work for both issues.

Recommendation

Apply a workaround to explicitly mark the preserved first user message as historical context, as this seems to be the most direct way to prevent the model from responding to it as an active instruction. This could involve modifying the compression summary or the model's attention mechanism to ignore the preserved message.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.