hermes - 💡(How to fix) Fix 【Bug】Messages lost on process termination: no SIGTERM handler or turn-end persistence [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

import signal import sys

def _flush_on_exit(signum, frame): """Best-effort: persist in-flight messages before termination.""" try: for session_id, msgs in agent._sessions.items(): if msgs: state.replace_messages(session_id, msgs) except Exception: pass sys.exit(0)

signal.signal(signal.SIGTERM, _flush_on_exit) signal.signal(signal.SIGINT, _flush_on_exit)

Root Cause

Messages are only written to state.db in two narrow scenarios:

  1. Context compressionreplace_messages() rewrites compressed messages (but discards the originals it doesn't compress)
  2. Error recovery_persist_session() only fires on provider rate-limit/disconnect fallbacks

Normal turn completion (Turn ended: reason=text_response) does not persist any messages to state.db.

Additionally:

  • No SIGTERM/SIGINT signal handler to flush messages before exit
  • No atexit hook for emergency persistence
  • hermes_state.py has append_message() but it's only called in tests, never in production

Fix Action

Fixed

Code Example

import signal
import sys

def _flush_on_exit(signum, frame):
    """Best-effort: persist in-flight messages before termination."""
    try:
        for session_id, msgs in agent._sessions.items():
            if msgs:
                state.replace_messages(session_id, msgs)
    except Exception:
        pass
    sys.exit(0)

signal.signal(signal.SIGTERM, _flush_on_exit)
signal.signal(signal.SIGINT, _flush_on_exit)
RAW_BUFFERClick to expand / collapse

Problem

When a Hermes dashboard process is terminated (SIGTERM from a process manager, idle reclaim, broker restart, OOM kill, etc.), all conversation messages stored only in agent._messages (in-memory list) are lost permanently.

The state.db messages table often has message_count = 0 for sessions that were active and had hundreds of messages, because messages are never persisted during normal operation.

Root Cause

Messages are only written to state.db in two narrow scenarios:

  1. Context compressionreplace_messages() rewrites compressed messages (but discards the originals it doesn't compress)
  2. Error recovery_persist_session() only fires on provider rate-limit/disconnect fallbacks

Normal turn completion (Turn ended: reason=text_response) does not persist any messages to state.db.

Additionally:

  • No SIGTERM/SIGINT signal handler to flush messages before exit
  • No atexit hook for emergency persistence
  • hermes_state.py has append_message() but it's only called in tests, never in production

Reproduction

  1. Start a Hermes session, have a multi-turn conversation (20+ messages)
  2. Check SELECT message_count FROM sessions WHERE id='...' → shows 0
  3. Kill the Hermes process (kill <pid> or let broker reclaim it)
  4. New process starts, resumes session → messages are empty

Expected Behavior

Messages should survive process termination. Two approaches:

Option A: SIGTERM handler (minimal, low-risk)

import signal
import sys

def _flush_on_exit(signum, frame):
    """Best-effort: persist in-flight messages before termination."""
    try:
        for session_id, msgs in agent._sessions.items():
            if msgs:
                state.replace_messages(session_id, msgs)
    except Exception:
        pass
    sys.exit(0)

signal.signal(signal.SIGTERM, _flush_on_exit)
signal.signal(signal.SIGINT, _flush_on_exit)

Option B: Persist on turn end (more durable) After Turn ended, asynchronously write the new messages since last persist to state.db.

Environment

  • Hermes agent v0.9x, model: glm-5.1 via custom provider
  • Deployment: multi-user broker (each user gets a Hermes dashboard process on a port)
  • Broker kills processes on idle timeout (1800s) — this is the primary trigger in production

Impact

  • Data loss: Any unexpected process termination loses all messages since last compression
  • User trust: Users lose hours of conversation history with no recovery
  • Broker incompatibility: The current in-memory-only design assumes the process won't be externally terminated, which breaks under any process manager (systemd, broker, orchestrator)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix 【Bug】Messages lost on process termination: no SIGTERM handler or turn-end persistence [1 pull requests]