hermes - ✅(Solved) Fix Gateway session store leaks orphaned JSON files and desyncs sessions.json index [2 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20098Fetched 2026-05-06 06:38:48
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×3commented ×2cross-referenced ×2

Error Message

  • Session corruption caused bot to return truncated responses with error: invalid params, tool call id is invalid (2013)

Root Cause

Root Cause Hypothesis

Fix Action

Fix / Workaround

  • MCP server: Context7 (@upstash/context7-mcp) running at http://localhost:3456/mcp
  • Messaging platform: Telegram (bot token configured, user DM pairing authorized)
  • Session corruption caused bot to return truncated responses with error: invalid params, tool call id is invalid (2013)
  • Fix applied: deleted corrupted session JSON files and removed entries from sessions.json
  • A session corruption monitor cron job was added as a workaround (runs every 5 minutes)

PR fix notes

PR #20102: fix(gateway): reconcile orphaned session files on startup (#20098)

Description (problem / solution / changelog)

Summary

Fixes #20098

Root Cause

On every gateway restart, a new session JSON file is created in sessions_dir (e.g. ~/.hermes/sessions/session_20260505_063027_d6ec06.json) but old ones are never removed. After N restarts, sessions_dir accumulates N stale session files while sessions.json only tracks the current session. The index desync means session context from old sessions can contaminate new ones — manifesting as invalid tool call id errors when the gateway serves a new Telegram session but mistakenly locates an old CLI session's tool call IDs.

Fix

Add SessionStore.reconcile_orphaned_json_files():

def reconcile_orphaned_json_files(self) -> int:
    """Remove session JSON/JSONL files not tracked in sessions.json."""
    known_ids = {entry.session_id for entry in self._entries.values()}
    # scan sessions_dir, delete *.json / *.jsonl not in known_ids
    # skips: sessions.json itself, dot-prefix temp files, directories, other extensions

Called automatically during start_gateway() startup, immediately after suspend_recently_active(), so sessions_dir is clean before any new platform connections are established.

Example (issue reporter's case)

Before fix — 8 files, only 1 tracked:

session_20260505_053804_914eb0.json   ← orphan (removed)
session_20260505_054636_3956ed.json   ← orphan (removed)
session_20260505_055014_52e486.json   ← orphan (removed)
session_20260505_055040_43da0617.json ← orphan (removed)
session_20260505_061146_1acbd7.json   ← orphan (removed)
session_20260505_062453_2804db.json   ← orphan (removed)
session_20260505_063027_d6ec06.json   ← orphan (removed)
session_20260505_063204_f24f96d5.json ← LIVE (kept)
sessions.json                         ← index (never touched)

After fix — sessions_dir cleaned up on next gateway start; sessions.json stays consistent.

Tests

9 new tests in tests/gateway/test_session_orphan_reconciliation.py:

  • Orphaned .json removal
  • Orphaned .jsonl removal
  • No-op when all files are tracked
  • Index (sessions.json) never deleted
  • Dot-prefixed temp files never deleted
  • Multi-orphan batch (7 orphans at once — reproduces the reporter's scenario exactly)
  • Empty sessions_dir is safe
  • Unknown-extension files skipped (request_dump_* etc.)
  • Works when sessions.json does not exist yet (fresh install)

Changed files

  • gateway/run.py (modified, +11/-0)
  • gateway/session.py (modified, +68/-0)
  • tests/gateway/test_session_orphan_reconciliation.py (added, +213/-0)

PR #20462: fix(gateway): clean up orphaned session transcript files

Description (problem / solution / changelog)

Summary

Delete legacy JSONL transcript files when sessions are reset or pruned, and clean up orphaned transcripts on gateway startup.

Root Cause

SessionStore.get_or_create_session() replaces old SessionEntry objects when a session is reset (via reset policy, suspend, or force_new), but never deleted the corresponding transcript file ({session_id}.jsonl). Over multiple gateway restarts, orphaned transcript files accumulated in ~/.hermes/sessions/ while sessions.json only tracked the latest session per channel.

The existing prune_old_entries() method had the same gap — it removed entries from the index but left transcript files on disk.

Fix

  1. _delete_transcript(session_id) — safe helper that removes a JSONL file, ignoring missing files and permission errors
  2. get_or_create_session() — deletes the old transcript when a session is reset or replaced via force_new
  3. prune_old_entries() — deletes transcript files for pruned entries
  4. cleanup_orphaned_transcripts() — new method that scans sessions_dir for *.jsonl files with no matching session entry
  5. gateway/run.py — calls cleanup_orphaned_transcripts() on startup after stuck-loop detection

Regression Coverage

12 tests in test_session_transcript_cleanup.py:

  • _delete_transcript: existing file, missing file, permission error
  • get_or_create_session: old transcript deleted on reset, not deleted on reuse
  • prune_old_entries: transcript files deleted for pruned entries, handles missing files
  • cleanup_orphaned_transcripts: removes orphans, preserves active, ignores non-JSONL, handles empty dir, handles multiple orphans

Testing

tests/gateway/test_session_transcript_cleanup.py — 12 passed
tests/gateway/test_session_store_prune.py — 18 passed (no regressions)
tests/gateway/test_session.py — 63 passed (no regressions)
tests/gateway/test_session_hygiene.py — 18 passed (no regressions)
tests/gateway/test_session_state_cleanup.py — 11 passed (no regressions)

Fixes [Bug]: Gateway session store leaks orphaned JSON files and desyncs sessions.json index #20098

Changed files

  • gateway/platforms/whatsapp.py (modified, +4/-1)
  • gateway/run.py (modified, +8/-0)
  • gateway/session.py (modified, +56/-0)
  • tests/gateway/test_session_transcript_cleanup.py (added, +261/-0)
  • tests/tools/test_file_tools.py (modified, +37/-0)
  • tools/file_tools.py (modified, +6/-6)
RAW_BUFFERClick to expand / collapse

Bug Description

Gateway session storage creates orphaned JSON files in ~/.hermes/sessions/ on each restart, causing session index desynchronization and cross-session tool call ID contamination.

Steps to Reproduce

  1. Start Hermes gateway (e.g., hermes gateway start)
  2. Restart gateway multiple times during troubleshooting (e.g., hermes gateway restart)
  3. Observe: multiple orphaned session files accumulate in ~/.hermes/sessions/
  4. Check sessions.json index — only one session is tracked despite multiple files on disk
  5. Result: bot becomes unresponsive; API returns invalid tool call id errors

Observed Behavior

~.hermes/sessions/ on a fresh install after ~6 hours of setup/troubleshooting:

session_20260505_053804_914eb0.json (CLI session) session_20260505_054636_3956ed.json (CLI session) session_20260505_055014_52e486.json (CLI session) session_20260505_055040_43da0617.json (Telegram session) session_20260505_061146_1acbd7.json (Telegram session) session_20260505_062453_2804db.json (CLI session) session_20260505_063027_d6ec06.json (CLI session) session_20260505_063204_f24f96d5.json (Telegram session - current) sessions.json (index - only 1 entry)

  • 8 session files on disk, but only 1 entry in sessions.json
  • The gateway appears to create a new session file on each restart without cleaning up old ones
  • The sessions.json index never updates to reflect all files
  • No cross-session ID isolation: tool call IDs from old CLI sessions leak into new Telegram sessions

Expected Behavior

  • Gateway should either (a) reuse the same session file on restart, or (b) properly register new sessions in the index and clean up orphaned ones
  • The documented prune_sessions() cleanup mechanism (hermes_state.py) should handle this automatically
  • Sessions should be isolated: tool call IDs from CLI context should never contaminate Telegram context

Root Cause Hypothesis

The gateway has a parallel session store layer (gateway/session.py → SessionStore) using JSON files + sessions.json index, separate from the primary SQLite store (hermes_state.py). The JSON file session store:

  1. Creates new session files on each restart
  2. Does not register them in sessions.json (index shows only latest per channel)
  3. Does not clean up old session files
  4. Does not leverage the prune_sessions() cleanup mechanism

Environment

  • Hermes version: v0.12.0
  • Installation method: curl installer
  • Platform: WSL2 (systemd=true)
  • Model: MiniMax-M2.7 via https://api.minimax.io/anthropic
  • Gateway runtime: systemd user service (hermes-gateway.service)

Additional Context

  • MCP server: Context7 (@upstash/context7-mcp) running at http://localhost:3456/mcp
  • Messaging platform: Telegram (bot token configured, user DM pairing authorized)
  • Session corruption caused bot to return truncated responses with error: invalid params, tool call id is invalid (2013)
  • Fix applied: deleted corrupted session JSON files and removed entries from sessions.json
  • A session corruption monitor cron job was added as a workaround (runs every 5 minutes)

extent analysis

TL;DR

The gateway's session storage mechanism is not properly cleaning up orphaned JSON files, causing session index desynchronization and cross-session tool call ID contamination, which can be mitigated by ensuring the prune_sessions() cleanup mechanism is correctly implemented and utilized.

Guidance

  • Review the gateway/session.py and hermes_state.py files to ensure the prune_sessions() function is correctly called and implemented to clean up old session files.
  • Verify that the sessions.json index is being updated correctly to reflect all active sessions, not just the latest per channel.
  • Consider adding a check to prevent the creation of new session files on each restart if a valid session file already exists.
  • Investigate why the prune_sessions() mechanism is not being leveraged by the JSON file session store.

Example

No code snippet is provided due to the complexity of the issue and the need for a thorough review of the codebase.

Notes

The provided information suggests a potential issue with the session storage mechanism, but a thorough review of the code and implementation is necessary to determine the root cause and implement a fix.

Recommendation

Apply a workaround by regularly cleaning up orphaned session files and ensuring the sessions.json index is correctly updated, until a permanent fix can be implemented to address the underlying issue with the session storage mechanism.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING