hermes - ✅(Solved) Fix Improve long-session continuity across compactions [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#23307Fetched 2026-05-11 03:30:03
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×3cross-referenced ×1

Fix Action

Fixed

PR fix notes

PR #23308: feat: improve long-session continuity across compactions

Description (problem / solution / changelog)

What does this PR do?

Improves long-session continuity across repeated Hermes context compactions by adding recoverable session lineage, safer compression persistence/validation, durable run-ledger artifacts, memory pre-compress checkpointing, and reusable workflow templates for long autonomous work.

Related Issue / Plan / Control Doc

Fixes #23307

  • PRD / implementation plan: /mnt/data/hermes/plans/2026-05-10_032152-memory-across-compactions.md
  • Long-task control doc / resume capsule: /mnt/data/hermes/plans/2026-05-10_033003-memory-across-compactions-execution.md
  • Run ledger / state capsule handles: new hermes runs CLI and agent/run_ledger.py / agent/run_ledger_reader.py; branch validation did not mutate real run ledgers
  • Evidence or artifact manifest: /mnt/data/hermes/plans/2026-05-10_033003-final-review-context.md

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • hermes_state.py: resume resolution now follows validated compression continuation tips and avoids arbitrary child-session jumps.
  • scripts/backfill_sessions_from_json.py: adds an explicit manual JSON transcript -> SQLite repair script with dry-run, idempotent skip behavior, force repair, include-active opt-in, malformed-file reporting, and tests.
  • run_agent.py: persists parent transcript before compression rotation and persists the child compressed summary immediately after continuation creation.
  • agent/context_compressor.py: validates compression summaries and fails closed on invalid/truncated/template-leaking summaries; accepts bounded/redacted memory checkpoint source material.
  • agent/run_ledger.py: adds durable append-only run events, artifact references, and resume capsules for long runs.
  • agent/run_ledger_reader.py + hermes_cli/run_ledger_cli.py: adds read-only hermes runs list/events/capsule/recover retrieval surfaces.
  • plugins/memory/hindsight/__init__.py: adds bounded/redacted Hindsight pre-compress checkpoint extraction and non-blocking retain enqueueing.
  • .github/* and docs/templates/*: add long-task PR/issue/control-doc/evidence manifest templates.
  • Tests: focused coverage for session resume, backfill repair, compression split persistence, summary validation, run ledger writing/reading, CLI behavior, Hindsight pre-compress checkpoints, and template validation.

How to Test

  1. Run the focused impacted suite: scripts/run_tests.sh tests/hermes_state/test_resolve_resume_session_id.py tests/test_hermes_state.py tests/scripts/test_backfill_sessions_from_json.py tests/run_agent/test_compression_split_persistence.py tests/run_agent/test_compression_boundary_hook.py tests/run_agent/test_run_ledger_compression_capsule.py tests/run_agent/test_run_ledger_session_reset.py tests/run_agent/test_run_ledger_tool_events.py tests/run_agent/test_compress_focus_plugin_fallback.py tests/gateway/test_compress_command.py tests/agent/test_context_compressor.py tests/agent/test_context_compressor_summary_continuity.py tests/agent/test_run_ledger.py tests/agent/test_run_ledger_readonly.py tests/hermes_cli/test_run_ledger_cli.py tests/cli/test_cli_new_session.py tests/cli/test_branch_command.py tests/plugins/memory/test_hindsight_provider.py -q
  2. Run Ruff on touched code/tests.
  3. Validate workflow templates: parse GitHub issue YAML, parse JSON schema/template, run Draft 2020-12 schema validation against the template.
  4. Run git diff --check origin/main...HEAD.

TDD / Review Evidence

  • Planned tests vs. acceptance criteria: documented slice-by-slice in /mnt/data/hermes/plans/2026-05-10_033003-memory-across-compactions-execution.md before each implementation slice.
  • RED evidence: each code slice recorded expected failures before implementation (resume tip, backfill script missing/edge cases, compression split persistence, invalid summary fallback, run ledger, retrieval CLI, Hindsight pre-compress bridge).
  • GREEN evidence: final focused impacted suite passed: 498 passed in 23.55s.
  • Broader validation / regression checks: Ruff passed; workflow template YAML/JSON/schema validation passed; git diff --check origin/main...HEAD passed; worktree clean.
  • Independent review / second opinion: OpenRouter second review was run on plans/diffs/focused contexts per slice and on a final consolidated context. Final review passed with non-blocking notes triaged below.

Operational / Safety Impact

  • Config or migration impact: no required config migration. New run-ledger config defaults are additive. hermes runs retrieval paths are read-only.
  • Storage / retention / privacy impact: new run ledgers and capsules are durable local artifacts; code includes bounded artifact references and conservative redaction for memory checkpoints. Future retention/indexing policy is a follow-up area.
  • Rollback plan: revert this PR; manual backfill script is opt-in and defaults to non-destructive behavior.
  • Not authorized / out of scope: no production DB repair was run; no live credential mutation; no destructive repair without explicit operator command.

Reviewer / second-review notes triaged

  • Backfill safety: script includes dry-run, idempotent skip behavior, force opt-in, include-active opt-in, per-file malformed/error reporting, and focused tests. No automatic production repair was run.
  • Memory checkpoint token budgeting: checkpoint source text is redacted and hard-capped by character count; token-aware budgeting can be a follow-up but the bounded prompt insertion is acceptable for this slice.
  • Run-ledger list scalability: current reader favors correctness/read-only safety; indexing/pagination improvements can be added later if run volume warrants it.
  • Strict evidence schema: deliberate validation discipline with root/artifact extensions escape hatches for custom metadata.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs/issues and created related issue #23307
  • My PR contains only changes related to this fix/feature
  • I've run the focused impacted test suite and all tests pass
  • I've added tests for my changes
  • For behavioral code changes, I followed TDD or documented why it was not applicable
  • For long-running/autonomous work, I updated the control doc/resume capsule and evidence manifest, or marked them N/A
  • I resolved or explicitly triaged reviewer/second-review comments
  • I've tested on Ubuntu/Linux environment

Documentation & Housekeeping

  • I've updated relevant documentation/templates
  • I've updated cli-config.yaml.example if I added/changed config keys — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — N/A; workflow templates added instead
  • I've considered cross-platform impact — no platform-specific shell behavior in runtime paths beyond existing POSIX lock tests guarded by platform behavior
  • I've updated tool descriptions/schemas if I changed tool behavior — N/A

Screenshots / Logs

Final validation summary:

  • Focused impacted tests: 498 passed in 23.55s
  • Ruff: all checks passed
  • Workflow template YAML/JSON/schema validation: passed
  • git diff --check origin/main...HEAD: passed

Changed files

  • .github/ISSUE_TEMPLATE/feature_request.yml (modified, +46/-0)
  • .github/PULL_REQUEST_TEMPLATE.md (modified, +31/-2)
  • agent/context_compressor.py (modified, +192/-9)
  • agent/context_engine.py (modified, +5/-1)
  • agent/run_ledger.py (added, +726/-0)
  • agent/run_ledger_reader.py (added, +486/-0)
  • cli.py (modified, +6/-0)
  • docs/templates/evidence-manifest.schema.json (added, +201/-0)
  • docs/templates/evidence-manifest.template.json (added, +37/-0)
  • docs/templates/long-task-control-doc.md (added, +141/-0)
  • gateway/run.py (modified, +19/-7)
  • hermes_cli/config.py (modified, +11/-0)
  • hermes_cli/main.py (modified, +68/-1)
  • hermes_cli/run_ledger_cli.py (added, +148/-0)
  • hermes_state.py (modified, +20/-62)
  • plugins/memory/hindsight/__init__.py (modified, +158/-0)
  • run_agent.py (modified, +414/-9)
  • scripts/backfill_sessions_from_json.py (added, +356/-0)
  • tests/agent/test_context_compressor.py (modified, +333/-30)
  • tests/agent/test_context_compressor_summary_continuity.py (modified, +44/-2)
  • tests/agent/test_run_ledger.py (added, +315/-0)
  • tests/agent/test_run_ledger_readonly.py (added, +338/-0)
  • tests/cli/test_branch_command.py (modified, +1/-0)
  • tests/cli/test_cli_new_session.py (modified, +2/-0)
  • tests/gateway/test_compress_command.py (modified, +44/-0)
  • tests/hermes_cli/test_run_ledger_cli.py (added, +95/-0)
  • tests/hermes_state/test_resolve_resume_session_id.py (modified, +96/-46)
  • tests/plugins/memory/test_hindsight_provider.py (modified, +119/-1)
  • tests/run_agent/test_compression_boundary_hook.py (modified, +69/-0)
  • tests/run_agent/test_compression_split_persistence.py (added, +242/-0)
  • tests/run_agent/test_run_ledger_compression_capsule.py (added, +114/-0)
  • tests/run_agent/test_run_ledger_session_reset.py (added, +78/-0)
  • tests/run_agent/test_run_ledger_tool_events.py (added, +165/-0)
  • tests/scripts/test_backfill_sessions_from_json.py (added, +317/-0)
RAW_BUFFERClick to expand / collapse

Problem or use case

Long Hermes sessions can cross multiple context compactions. Before this work, resilience relied too heavily on lossy summaries and fragile session persistence/search behavior, making it harder for humans or fresh agents to resume long autonomous tasks safely.

Proposed solution

Improve long-session continuity across several layers:

  • Resolve compressed sessions to their latest continuation tip.
  • Provide a manual JSON transcript to SQLite/session-search backfill path.
  • Persist both sides of compression splits immediately.
  • Reject invalid or truncated compression summaries non-destructively.
  • Add durable run-ledger events, artifact manifests, and resume capsules.
  • Add read-only hermes runs retrieval CLI.
  • Bridge memory pre-compress checkpoints into compressor source material and Hindsight retention.
  • Add PR/issue/control-doc/evidence templates for long autonomous work.

Acceptance criteria

  • Resuming a compressed session ID lands on the latest compression continuation tip, not an arbitrary child.
  • Historical JSON transcripts with missing SQLite messages can be repaired manually with dry-run/idempotent behavior.
  • Compression split persists parent transcript and child summary immediately enough for resume/search recovery.
  • Invalid, truncated, or template-leaking summaries fail closed without destructive transcript replacement.
  • Long runs emit durable run-ledger events, artifacts, and resume capsules with read-only retrieval commands.
  • Memory providers can contribute bounded/redacted pre-compress checkpoint context; failures remain non-fatal.
  • Long autonomous work has reusable GitHub/control-doc/evidence templates.

TDD / validation plan

Implementation followed slice-by-slice TDD. Each code slice defined goal-to-test mappings, observed RED failures, implemented the minimal GREEN behavior, and ran focused plus adjacent regression tests.

Final local validation:

  • Focused impacted test set: 498 passed.
  • Ruff on touched code/tests: passed.
  • Workflow template YAML/JSON/schema validation: passed.
  • git diff --check origin/main...HEAD: passed.
  • OpenRouter second review: pass on slice reviews plus final consolidated context.

Durable context / evidence

Execution log/control doc: /mnt/data/hermes/plans/2026-05-10_033003-memory-across-compactions-execution.md Primary plan: /mnt/data/hermes/plans/2026-05-10_032152-memory-across-compactions.md Final review context: /mnt/data/hermes/plans/2026-05-10_033003-final-review-context.md

Worker contract / stop conditions

Scope: session continuity, compression persistence/validation, durable run-ledger recovery surfaces, memory pre-compress bridge, and workflow templates.

Stop/ask if any change requires live credential mutation, destructive production DB repair, or enabling new retention behavior outside tested local paths.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING