These are the properties a first cut should aim for. They are not hard gates — happy to refine any of them in review: - Out-of-range folds (touching prefix or tail) are rejected with a clear error - Prompt-cache prefix stays bit-identical before and after a fold (hash-verifiable) - Folded payload is persisted atomically and recoverable via `fold_unfold` - Summary generation failure aborts the fold cleanly — original turns untouched, no placeholder written - Agent-initiated folds default to the proposal flow; user and system triggers execute immediately - Folds below the configured reclaim threshold are skipped by default - Key events (`proposed` / `executed` / `aborted` / `unfolded`) are observable Concrete thresholds, config keys, and boundary detection strategy can all be settled in review — this section is just the intended mental model.

hermes - 💡(How to fix) Fix Feature: Agent-Initiated Turn Folding (Cache-Prefix Safe) [1 participants]

hermes2026-04-27 10:40:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16501•Fetched 2026-04-28 06:52:55

View on GitHub

Comments

Participants

Timeline

Reactions

Author

sontianye

Participants

sontianye

Timeline (top)

labeled ×3

Error Message

3. Fail loud, never silent. The compression bug family (#11914, #10719, #11585, #13905, PR #14514) all share one root cause: failed summarization silently substituted a static "context lost" placeholder, destroying the user's task. fold_turns will not repeat this. If summary generation fails, the fold is aborted, the original turns stay untouched, the error is surfaced to the agent, no partial state is persisted.

Out-of-range folds (touching prefix or tail) are rejected with a clear error

Root Cause

And #7619 is the canonical victim: at 149 turns, a clean topic switch is ignored because old tokens dominate attention. Memory recall worked. The model still anchored.

Code Example

fold_turns(
    start_turn_id: str,        # inclusive; must be after cache-safe boundary
    end_turn_id: str,          # inclusive; must be before active tail
    reason: str,               # required; rendered into the marker
    summary: str | None = None # optional; auxiliary LLM fills in if absent
)

---

[ system + tool schemas ]   ← prefix, never touched
[ early cached turns    ]   ← never folded
[ foldable mid-band     ]   ← only this range is eligible
[ active tail (last N)  ]   ← protected

---

[FOLDED · fld_01HXYZ · 14 turns · ~18.4k tokens]
Reason: user said the Postgres-on-RDS branch was wrong; pivoting to Aurora.
Kept: cost ceiling $2,400/mo, EU-only data residency.
Recover: fold_unfold("fld_01HXYZ")

RAW_BUFFERClick to expand / collapse

Feature: Agent-Initiated Turn Folding (Cache-Prefix Safe)

Problem

When an agent goes down the wrong path and the user pushes back ("no, that's wrong, drop it"), the failed exploration stays in context for the rest of the session. Those tokens are not just wasted — they actively bias the model toward repeating the same mistake.

This is not a hypothesis. PR #6784 names the mechanism precisely:

"the problem isn't reasoning — it's token probability. The fix is removing the repeated wrong-name occurrences from context so the probability distribution shifts."

And #7619 is the canonical victim: at 149 turns, a clean topic switch is ignored because old tokens dominate attention. Memory recall worked. The model still anchored.

Today there is no clean way to reclaim a known-wrong mid-conversation range. /undo only reaches the last exchange. /compress runs on context pressure, not user feedback. #513 prunes tool outputs by age, not by direction. PR #6784 only fires on detectable loops.

Proposal

A new agent-callable tool, fold_turns, that collapses a contiguous mid-conversation range into a short recoverable summary, under three hard constraints.

Tool

fold_turns(
    start_turn_id: str,        # inclusive; must be after cache-safe boundary
    end_turn_id: str,          # inclusive; must be before active tail
    reason: str,               # required; rendered into the marker
    summary: str | None = None # optional; auxiliary LLM fills in if absent
)

Returns {fold_id, turns_folded, tokens_reclaimed_estimate, user_confirmation_required}.

A second tool, fold_unfold(fold_id), returns the original payload as a tool result for inspection — it does not re-insert it into context.

Layout

[ system + tool schemas ]   ← prefix, never touched
[ early cached turns    ]   ← never folded
[ foldable mid-band     ]   ← only this range is eligible
[ active tail (last N)  ]   ← protected

Folding is rejected if start is before the cache-safe boundary or end is inside the active tail.

Marker

The folded range is replaced in-stream by one message:

[FOLDED · fld_01HXYZ · 14 turns · ~18.4k tokens]
Reason: user said the Postgres-on-RDS branch was wrong; pivoting to Aurora.
Kept: cost ceiling $2,400/mo, EU-only data residency.
Recover: fold_unfold("fld_01HXYZ")

Original turns spill atomically (temp + fsync + rename, per #8029) to ~/.hermes/sessions/<id>/folds/<fold_id>.jsonl.

Three hard constraints

1. Cache-prefix safe. A fold invalidates prompt cache from the fold start onward. It must never touch the system prompt, tool schemas, or the early cached turns. We expose tokens_reclaimed_estimate and reject folds smaller than min_reclaim_tokens (default 5k) — below that, the cache miss outweighs the saving.

2. User-confirmed by default. An anchored agent often cannot tell it is anchored (the lesson of PR #6784 and #7619). So agent self-judgment is a proposal, not an action. Three trust levels:

Trigger	Trust	Behavior
User feedback ("that direction was wrong")	high	execute immediately
System detection (loop detector, repeated tool errors)	high	execute, log loudly
Agent self-judgment	low	stage as proposal; one-line confirm in next turn; expires if ignored

How it composes

Existing mechanism	Relationship
`/undo`	Different scope — last exchange only
`/compress`	Different trigger — context pressure, not feedback
#513 two-phase prune	Complementary — prune cheap tool outputs first, fold reasoning later
PR #6784 loop pruner	Becomes one of the high-trust triggers
#678 memory forget	Different layer — persistent memory, not working context
#15498 `after_turn` hook	Integration point for proposal staging
`checkpoint_manager`	After a high-trust fold, runtime offers matching filesystem rollback (never implicit)

This RFC replaces nothing.

Implementation sketch

tools/fold_tool.py — new; fold_turns, fold_unfold, marker rendering, atomic sidecar IO
agent/turn_folder.py — new; boundary computation, range validation, summary generation with auxiliary fallback
agent/context_compressor.py — expose boundary helpers only; no logic change
run_agent.py — wire proposal staging through the existing after_turn hook
cli-config.yaml.example — new fold section: protect_prefix_turns, protect_tail_turns, min_reclaim_tokens, auto_confirm_sources
Tests cover: boundary enforcement, prefix-bit-identity, atomic write, abort-on-failure, proposal flow

First cut targets CLI/TUI. Gateway confirmation UX (Telegram/Discord) is a follow-up.

Expected behavior

These are the properties a first cut should aim for. They are not hard gates — happy to refine any of them in review:

Out-of-range folds (touching prefix or tail) are rejected with a clear error
Prompt-cache prefix stays bit-identical before and after a fold (hash-verifiable)
Folded payload is persisted atomically and recoverable via fold_unfold
Summary generation failure aborts the fold cleanly — original turns untouched, no placeholder written
Agent-initiated folds default to the proposal flow; user and system triggers execute immediately
Folds below the configured reclaim threshold are skipped by default
Key events (proposed / executed / aborted / unfolded) are observable

Concrete thresholds, config keys, and boundary detection strategy can all be settled in review — this section is just the intended mental model.

Things I'm not sure about — would love input

Cache-safe boundary detection — is a conservative "first ~25%" assumption good enough, or should we expose a per-provider hook?
Summary generation — reuse the auxiliary client with a fallback chain (per the lessons in #11914), or require the caller to pass summary synchronously?
Low-value fold proposals — silently dropped, or always surfaced to the user?
Anything in the design above that conflicts with directions already on the roadmap that I might have missed.

On implementation

If the direction feels right, happy to drive this to PRs. A reasonable split would be:

Tool skeleton — fold_turns / fold_unfold, atomic sidecar persistence, range validation, minimal proposal flow on top of the existing after_turn hook
Cache-boundary estimation, summary generation with fallback chain, CLI/TUI confirmation surface
Wire PR #6784's loop detector in as a high-trust trigger source

Each step is small enough to land independently. I'd plan to benchmark on the long-session scenarios from #7619 first (tokens reclaimed vs. cache-miss cost) so the conversation has data behind it.

If a maintainer wants to claim this, also great — happy to defer or collaborate. Open to feedback on whether the direction or the scoping needs to shift before any code lands.

Prior art

PR #6784 — token-level anchoring (theoretical basis)
#7619 — long-session topic-switch failure (motivating bug)
#513 — two-phase compression (complementary)
#11914 / #10719 / #11585 / #13905 / PR #14514 — silent-placeholder bug family (failure mode to avoid)
#678 — memory forgetting (adjacent layer)
#8029 — atomic transcript writes (persistence discipline)
#15498 — after_turn lifecycle hook (integration point)

extent analysis

TL;DR

Implement a new agent-callable tool, fold_turns, to collapse a contiguous mid-conversation range into a short recoverable summary, addressing the issue of failed explorations staying in context and biasing the model.

Guidance

Identify the cache-safe boundary to ensure folds do not touch the system prompt, tool schemas, or early cached turns.
Develop a summary generation mechanism with a fallback chain to handle failures and avoid silent placeholder substitutions.
Implement a proposal flow for agent-initiated folds, allowing user confirmation and execution of high-trust triggers.
Test the fold_turns tool with various scenarios, including boundary enforcement, atomic write, and abort-on-failure cases.
Consider exposing a per-provider hook for cache-safe boundary detection and decide on a strategy for handling low-value fold proposals.

Example

# Example usage of the fold_turns tool
fold_turns(
    start_turn_id="turn_123",
    end_turn_id="turn_150",
    reason="user said the direction was wrong",
    summary="pivoting to Aurora"
)

Notes

The implementation should prioritize cache safety, user confirmation, and failure handling to avoid introducing new issues. The design should be flexible to accommodate future changes and refinements.

Recommendation

Apply the proposed fold_turns tool as a workaround to address the issue of failed explorations staying in context, with the goal of improving the model's performance and user experience.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

These are the properties a first cut should aim for. They are not hard gates — happy to refine any of them in review:

Out-of-range folds (touching prefix or tail) are rejected with a clear error
Prompt-cache prefix stays bit-identical before and after a fold (hash-verifiable)
Folded payload is persisted atomically and recoverable via fold_unfold
Summary generation failure aborts the fold cleanly — original turns untouched, no placeholder written
Agent-initiated folds default to the proposal flow; user and system triggers execute immediately
Folds below the configured reclaim threshold are skipped by default
Key events (proposed / executed / aborted / unfolded) are observable

Concrete thresholds, config keys, and boundary detection strategy can all be settled in review — this section is just the intended mental model.

#LLM response #prompt template #agent execution #callback error #memory management

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Feature: Agent-Initiated Turn Folding (Cache-Prefix Safe) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Feature: Agent-Initiated Turn Folding (Cache-Prefix Safe)

Problem

Proposal

Tool

Layout

Marker

Three hard constraints

How it composes

Implementation sketch

Expected behavior

Things I'm not sure about — would love input

On implementation

Prior art

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Feature: Agent-Initiated Turn Folding (Cache-Prefix Safe) [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Feature: Agent-Initiated Turn Folding (Cache-Prefix Safe)

Problem

Proposal

Tool

Layout

Marker

Three hard constraints

How it composes

Implementation sketch

Expected behavior

Things I'm not sure about — would love input

On implementation

Prior art

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING