hermes - 💡(How to fix) Fix Stale background processes permanently block session idle/daily reset [4 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Session idle and daily reset (session_reset.mode: both) can be permanently blocked by background processes that Hermes started and forgot about. The session accumulates indefinitely until hitting compression threshold, with no way for the user or agent to detect or fix the situation.

Error Message

  1. Unify process visibility. The agent-facing process list tool should show session-scoped processes (or at least warn about them), not just task-scoped ones. If something is blocking session reset, the user and agent should be able to discover it.
  2. Consider auto-cleanup of background processes on daily reset. If at_hour fires and there are stale background processes, either kill them or at least warn the user via the home channel.
  • #28547 / #28596 — Guardrail: warn before /new when background tasks running (open, CLI-side only)

Root Cause

Root cause: two bugs

Fix Action

Fixed

Code Example

python3 -m http.server 8765 --directory ~/Downloads
   python3 -m http.server 8766 --directory ~/Desktop/Files/AI/ivi/design

---

# Session chain — no session_reset since May 18, only compression
20260518_084510compression (196 msgs)
20260518_181338compression (131 msgs)
20260518_214420compression (189 msgs)
20260520_105532active (new compressed child)

# Compare: sessions before May 18 ended normally
20260518_001320  → session_reset
20260517_202529  → session_reset
20260517_170318  → session_reset

# Agent cache evictions happened but did NOT trigger reset
2026-05-19 16:40:48 Agent cache idle-TTL evict (idle=3824s)
2026-05-19 18:46:09 Agent cache idle-TTL evict (idle=3736s)

# The two servers still alive as gateway children
PID 552  python3 -m http.server 8765  (started May 18)
PID 1863 python3 -m http.server 8766  (started May 18)

---

WARNING Failed to generate context summary: Codex auxiliary Responses stream exceeded 240.0s total timeout
RAW_BUFFERClick to expand / collapse

Summary

Session idle and daily reset (session_reset.mode: both) can be permanently blocked by background processes that Hermes started and forgot about. The session accumulates indefinitely until hitting compression threshold, with no way for the user or agent to detect or fix the situation.

Environment

  • Hermes Agent v0.14.0 (2026.5.16), 590 commits behind main
  • Platform: Telegram gateway on macOS
  • Model: GPT-5.5 via Codex device auth
  • Config: session_reset.mode: both, idle_minutes: 120, at_hour: 6

What happened

  1. On May 18, during a normal conversation, Hermes started two local preview servers via the terminal tool with background: true:

    python3 -m http.server 8765 --directory ~/Downloads
    python3 -m http.server 8766 --directory ~/Desktop/Files/AI/ivi/design

    These were for previewing HTML files Hermes had just created.

  2. Hermes never stopped these servers. They remained alive as children of the gateway process (PID 10037).

  3. Over the next 2+ days, the session never reset despite:

    • Multiple gaps exceeding idle_minutes: 120 (longest gap: ~36 hours between May 18 evening and May 20 morning)
    • Multiple passes of at_hour: 6 (6 AM daily reset)
    • The agent cache idle-TTL evicting the agent from RAM multiple times (logged as Agent cache idle-TTL evict)
  4. By May 20, the session had accumulated 190 messages across 4 compression-linked sessions spanning May 18–20, totaling ~982K input tokens in the DB.

  5. When the user sent a short message on May 20 at 10:19, the context was already ~200K+ tokens from carried-over history. After a few tool-heavy turns, it hit ~237K tokens and triggered preflight compression.

Root cause: two bugs

Bug 1: No TTL or staleness check on background processes

The reset guard in session.py (lines ~759, ~799) checks has_active_for_session(session_key) and skips reset if any process is registered. There is no maximum age, no heartbeat check, no distinction between "actively doing work" and "idle server listening on a port." A preview server started 3 days ago blocks reset identically to a task that started 30 seconds ago.

Bug 2: Agent-facing process list is task-scoped, gateway reset guard is session-scoped

When the user asked Hermes "can you see if there are any subprocesses running?", the agent's process list tool returned {"processes": []} because it filters by task_id. But ~/.hermes/processes.json still contained the two servers under the session key, and the gateway's has_active_for_session() still saw them.

This means:

  • The user cannot discover the problem by asking Hermes
  • The agent cannot discover the problem through its own tools
  • The gateway silently blocks reset with no log message explaining why
  • Only direct inspection of processes.json + ps reveals the truth

Evidence from logs

# Session chain — no session_reset since May 18, only compression
20260518_084510  → compression (196 msgs)
20260518_181338  → compression (131 msgs)
20260518_214420  → compression (189 msgs)
20260520_105532  → active (new compressed child)

# Compare: sessions before May 18 ended normally
20260518_001320  → session_reset
20260517_202529  → session_reset
20260517_170318  → session_reset

# Agent cache evictions happened but did NOT trigger reset
2026-05-19 16:40:48 Agent cache idle-TTL evict (idle=3824s)
2026-05-19 18:46:09 Agent cache idle-TTL evict (idle=3736s)

# The two servers still alive as gateway children
PID 552  python3 -m http.server 8765  (started May 18)
PID 1863 python3 -m http.server 8766  (started May 18)

Additional issue: compression summary timeout

When compression finally triggered at 10:47, the auxiliary summary generation on Codex timed out after 240s:

WARNING Failed to generate context summary: Codex auxiliary Responses stream exceeded 240.0s total timeout

Compression still "completed" via fallback (190→9 messages, ~237K→~24K tokens), but the summary was lost. This is likely related to #10719 / #11585 and the compression.abort_on_summary_failure option added in PR #28117.

Suggested fixes

  1. Add a max age / TTL for background processes in the reset guard. A process older than idle_minutes (or a separate configurable threshold) should not block session reset. Preview servers and forgotten dev tools should not keep a session alive forever.

  2. Unify process visibility. The agent-facing process list tool should show session-scoped processes (or at least warn about them), not just task-scoped ones. If something is blocking session reset, the user and agent should be able to discover it.

  3. Log why reset was skipped. When has_active_for_session prevents a reset, log it at INFO level with the process details. Currently the skip is completely silent.

  4. Consider auto-cleanup of background processes on daily reset. If at_hour fires and there are stale background processes, either kill them or at least warn the user via the home channel.

Related issues

  • #18516 — Gateway should support automatic session freshness resets (closed, discusses the reset-skip guard)
  • #26933 — Improved background process engine (visibility problem)
  • #28547 / #28596 — Guardrail: warn before /new when background tasks running (open, CLI-side only)
  • #1144 — Background processes lost on gateway restart (closed/fixed, adjacent)
  • #23975 — Context compression interrupted by gateway messages
  • #10719 / #11585 — Context compression drops turns when summary fails

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING