hermes - 💡(How to fix) Fix Feature: Agent-Initiated Turn Folding (Cache-Prefix Safe) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#16501Fetched 2026-04-28 06:52:55
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

Error Message

3. Fail loud, never silent. The compression bug family (#11914, #10719, #11585, #13905, PR #14514) all share one root cause: failed summarization silently substituted a static "context lost" placeholder, destroying the user's task. fold_turns will not repeat this. If summary generation fails, the fold is aborted, the original turns stay untouched, the error is surfaced to the agent, no partial state is persisted.

  • Out-of-range folds (touching prefix or tail) are rejected with a clear error

Root Cause

And #7619 is the canonical victim: at 149 turns, a clean topic switch is ignored because old tokens dominate attention. Memory recall worked. The model still anchored.

Code Example

fold_turns(
    start_turn_id: str,        # inclusive; must be after cache-safe boundary
    end_turn_id: str,          # inclusive; must be before active tail
    reason: str,               # required; rendered into the marker
    summary: str | None = None # optional; auxiliary LLM fills in if absent
)

---

[ system + tool schemas ]   ← prefix, never touched
[ early cached turns    ]   ← never folded
[ foldable mid-band     ]   ← only this range is eligible
[ active tail (last N)  ]protected

---

[FOLDED · fld_01HXYZ · 14 turns · ~18.4k tokens]
Reason: user said the Postgres-on-RDS branch was wrong; pivoting to Aurora.
Kept: cost ceiling $2,400/mo, EU-only data residency.
Recover: fold_unfold("fld_01HXYZ")
RAW_BUFFERClick to expand / collapse

Feature: Agent-Initiated Turn Folding (Cache-Prefix Safe)

Problem

When an agent goes down the wrong path and the user pushes back ("no, that's wrong, drop it"), the failed exploration stays in context for the rest of the session. Those tokens are not just wasted — they actively bias the model toward repeating the same mistake.

This is not a hypothesis. PR #6784 names the mechanism precisely:

"the problem isn't reasoning — it's token probability. The fix is removing the repeated wrong-name occurrences from context so the probability distribution shifts."

And #7619 is the canonical victim: at 149 turns, a clean topic switch is ignored because old tokens dominate attention. Memory recall worked. The model still anchored.

Today there is no clean way to reclaim a known-wrong mid-conversation range. /undo only reaches the last exchange. /compress runs on context pressure, not user feedback. #513 prunes tool outputs by age, not by direction. PR #6784 only fires on detectable loops.

Proposal

A new agent-callable tool, fold_turns, that collapses a contiguous mid-conversation range into a short recoverable summary, under three hard constraints.

Tool

fold_turns(
    start_turn_id: str,        # inclusive; must be after cache-safe boundary
    end_turn_id: str,          # inclusive; must be before active tail
    reason: str,               # required; rendered into the marker
    summary: str | None = None # optional; auxiliary LLM fills in if absent
)

Returns {fold_id, turns_folded, tokens_reclaimed_estimate, user_confirmation_required}.

A second tool, fold_unfold(fold_id), returns the original payload as a tool result for inspection — it does not re-insert it into context.

Layout

[ system + tool schemas ]   ← prefix, never touched
[ early cached turns    ]   ← never folded
[ foldable mid-band     ]   ← only this range is eligible
[ active tail (last N)  ]   ← protected

Folding is rejected if start is before the cache-safe boundary or end is inside the active tail.

Marker

The folded range is replaced in-stream by one message:

[FOLDED · fld_01HXYZ · 14 turns · ~18.4k tokens]
Reason: user said the Postgres-on-RDS branch was wrong; pivoting to Aurora.
Kept: cost ceiling $2,400/mo, EU-only data residency.
Recover: fold_unfold("fld_01HXYZ")

Original turns spill atomically (temp + fsync + rename, per #8029) to ~/.hermes/sessions/<id>/folds/<fold_id>.jsonl.

Three hard constraints

1. Cache-prefix safe. A fold invalidates prompt cache from the fold start onward. It must never touch the system prompt, tool schemas, or the early cached turns. We expose tokens_reclaimed_estimate and reject folds smaller than min_reclaim_tokens (default 5k) — below that, the cache miss outweighs the saving.

2. User-confirmed by default. An anchored agent often cannot tell it is anchored (the lesson of PR #6784 and #7619). So agent self-judgment is a proposal, not an action. Three trust levels:

TriggerTrustBehavior
User feedback ("that direction was wrong")highexecute immediately
System detection (loop detector, repeated tool errors)highexecute, log loudly
Agent self-judgmentlowstage as proposal; one-line confirm in next turn; expires if ignored

3. Fail loud, never silent. The compression bug family (#11914, #10719, #11585, #13905, PR #14514) all share one root cause: failed summarization silently substituted a static "context lost" placeholder, destroying the user's task. fold_turns will not repeat this. If summary generation fails, the fold is aborted, the original turns stay untouched, the error is surfaced to the agent, no partial state is persisted.

How it composes

Existing mechanismRelationship
/undoDifferent scope — last exchange only
/compressDifferent trigger — context pressure, not feedback
#513 two-phase pruneComplementary — prune cheap tool outputs first, fold reasoning later
PR #6784 loop prunerBecomes one of the high-trust triggers
#678 memory forgetDifferent layer — persistent memory, not working context
#15498 after_turn hookIntegration point for proposal staging
checkpoint_managerAfter a high-trust fold, runtime offers matching filesystem rollback (never implicit)

This RFC replaces nothing.

Implementation sketch

  • tools/fold_tool.py — new; fold_turns, fold_unfold, marker rendering, atomic sidecar IO
  • agent/turn_folder.py — new; boundary computation, range validation, summary generation with auxiliary fallback
  • agent/context_compressor.py — expose boundary helpers only; no logic change
  • run_agent.py — wire proposal staging through the existing after_turn hook
  • cli-config.yaml.example — new fold section: protect_prefix_turns, protect_tail_turns, min_reclaim_tokens, auto_confirm_sources
  • Tests cover: boundary enforcement, prefix-bit-identity, atomic write, abort-on-failure, proposal flow

First cut targets CLI/TUI. Gateway confirmation UX (Telegram/Discord) is a follow-up.

Expected behavior

These are the properties a first cut should aim for. They are not hard gates — happy to refine any of them in review:

  • Out-of-range folds (touching prefix or tail) are rejected with a clear error
  • Prompt-cache prefix stays bit-identical before and after a fold (hash-verifiable)
  • Folded payload is persisted atomically and recoverable via fold_unfold
  • Summary generation failure aborts the fold cleanly — original turns untouched, no placeholder written
  • Agent-initiated folds default to the proposal flow; user and system triggers execute immediately
  • Folds below the configured reclaim threshold are skipped by default
  • Key events (proposed / executed / aborted / unfolded) are observable

Concrete thresholds, config keys, and boundary detection strategy can all be settled in review — this section is just the intended mental model.

Things I'm not sure about — would love input

  1. Cache-safe boundary detection — is a conservative "first ~25%" assumption good enough, or should we expose a per-provider hook?
  2. Summary generation — reuse the auxiliary client with a fallback chain (per the lessons in #11914), or require the caller to pass summary synchronously?
  3. Low-value fold proposals — silently dropped, or always surfaced to the user?
  4. Anything in the design above that conflicts with directions already on the roadmap that I might have missed.

On implementation

If the direction feels right, happy to drive this to PRs. A reasonable split would be:

  1. Tool skeleton — fold_turns / fold_unfold, atomic sidecar persistence, range validation, minimal proposal flow on top of the existing after_turn hook
  2. Cache-boundary estimation, summary generation with fallback chain, CLI/TUI confirmation surface
  3. Wire PR #6784's loop detector in as a high-trust trigger source

Each step is small enough to land independently. I'd plan to benchmark on the long-session scenarios from #7619 first (tokens reclaimed vs. cache-miss cost) so the conversation has data behind it.

If a maintainer wants to claim this, also great — happy to defer or collaborate. Open to feedback on whether the direction or the scoping needs to shift before any code lands.

Prior art

  • PR #6784 — token-level anchoring (theoretical basis)
  • #7619 — long-session topic-switch failure (motivating bug)
  • #513 — two-phase compression (complementary)
  • #11914 / #10719 / #11585 / #13905 / PR #14514 — silent-placeholder bug family (failure mode to avoid)
  • #678 — memory forgetting (adjacent layer)
  • #8029 — atomic transcript writes (persistence discipline)
  • #15498 — after_turn lifecycle hook (integration point)

extent analysis

TL;DR

Implement a new agent-callable tool, fold_turns, to collapse a contiguous mid-conversation range into a short recoverable summary, addressing the issue of failed explorations staying in context and biasing the model.

Guidance

  • Identify the cache-safe boundary to ensure folds do not touch the system prompt, tool schemas, or early cached turns.
  • Develop a summary generation mechanism with a fallback chain to handle failures and avoid silent placeholder substitutions.
  • Implement a proposal flow for agent-initiated folds, allowing user confirmation and execution of high-trust triggers.
  • Test the fold_turns tool with various scenarios, including boundary enforcement, atomic write, and abort-on-failure cases.
  • Consider exposing a per-provider hook for cache-safe boundary detection and decide on a strategy for handling low-value fold proposals.

Example

# Example usage of the fold_turns tool
fold_turns(
    start_turn_id="turn_123",
    end_turn_id="turn_150",
    reason="user said the direction was wrong",
    summary="pivoting to Aurora"
)

Notes

The implementation should prioritize cache safety, user confirmation, and failure handling to avoid introducing new issues. The design should be flexible to accommodate future changes and refinements.

Recommendation

Apply the proposed fold_turns tool as a workaround to address the issue of failed explorations staying in context, with the goal of improving the model's performance and user experience.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

These are the properties a first cut should aim for. They are not hard gates — happy to refine any of them in review:

  • Out-of-range folds (touching prefix or tail) are rejected with a clear error
  • Prompt-cache prefix stays bit-identical before and after a fold (hash-verifiable)
  • Folded payload is persisted atomically and recoverable via fold_unfold
  • Summary generation failure aborts the fold cleanly — original turns untouched, no placeholder written
  • Agent-initiated folds default to the proposal flow; user and system triggers execute immediately
  • Folds below the configured reclaim threshold are skipped by default
  • Key events (proposed / executed / aborted / unfolded) are observable

Concrete thresholds, config keys, and boundary detection strategy can all be settled in review — this section is just the intended mental model.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING