hermes - ✅(Solved) Fix [Bug]: Hindsight sync can race interpreter shutdown after successful one-shot CLI exit [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15073Fetched 2026-04-25 06:24:41
View on GitHub
Comments
0
Participants
1
Timeline
7
Reactions
0
Participants
Timeline (top)
labeled ×4cross-referenced ×3

Error Message

WARNING plugins.memory.hindsight: Hindsight sync failed: cannot schedule new futures after interpreter shutdown Traceback (most recent call last): File "/home/alex/.hermes/hermes-agent/plugins/memory/hindsight/init.py", line 920, in _sync _run_sync(client.aretain_batch( ... File ".../asyncio/base_events.py", line 830, in run_in_executor executor.submit(func, *args), loop=self) File ".../concurrent/futures/thread.py", line 169, in submit raise RuntimeError('cannot schedule new futures after interpreter shutdown') RuntimeError: cannot schedule new futures after interpreter shutdown ERROR asyncio: Unclosed client session client_session: <aiohttp.client.ClientSession object at 0x...>

Root Cause

The likely race is:

  1. sync_turn() launches a daemon hindsight-sync thread that performs _run_sync(client.aretain_batch(...)).
  2. Hermes teardown reaches shutdown() late in process exit.
  3. shutdown() only waits up to 5 seconds for self._sync_thread / self._prefetch_thread.
  4. If the retain thread is still active or starts additional async work after interpreter teardown begins, aiohttp eventually reaches run_in_executor(...) and fails with RuntimeError: cannot schedule new futures after interpreter shutdown.
  5. The failed retain path then leaves the aiohttp client session/connector unclosed, causing the follow-on warnings.

I also reproduced the timing problem with a controlled fake-client test locally: shutdown() returned after about 5 seconds while the sync thread was still alive, which strongly suggests the fixed 5-second join is not sufficient to guarantee clean exit.

This appears related to, but not identical with, #11923 / #14109 / #14605. Those focus on shared loop/session cleanup; this variant is specifically about CLI interpreter shutdown racing a still-active background retain thread.

PR fix notes

PR #15481: fix(gateway): pass session messages to shutdown_memory_provider (#15165)

Description (problem / solution / changelog)

Summary

  • Fixes #15165 Part A_cleanup_agent_resources previously invoked agent.shutdown_memory_provider() with no arguments, so every memory provider's on_session_end hook received an empty list. Providers with an early-return guard on empty input (Holographic, Hindsight, etc.) never extracted facts from the conversation.
  • Forward agent._session_messages — the transcript AIAgent maintains and refreshes every turn via _persist_session — so providers see the actual conversation instead of [].
  • Falls back to the legacy no-arg call whenever _session_messages is absent or not a list (test stubs built via object.__new__ or MagicMock) to keep every existing suite green.

The bug

Per #15165:

...any provider that tries to extract facts from the session's conversation gets an empty list. Providers like Holographic (on_session_end([])) have an early return guard... Every gateway restart wipes the session's conversational memory...

Users saw "抱歉,找不到相關的對話記錄" (roughly: "sorry, cannot find the corresponding conversation record") on the first turn after any gateway restart / idle expiry / session reset because no facts had ever been persisted.

The fix

AIAgent.shutdown_memory_provider already accepts messages: list = None (run_agent.py:4126), so this is a pure caller-side change in gateway/run.py:

session_messages = getattr(agent, "_session_messages", None)
if isinstance(session_messages, list):
    agent.shutdown_memory_provider(session_messages)
else:
    agent.shutdown_memory_provider()
  • _session_messages is set on AIAgent.__init__ (run_agent.py:1518) and refreshed at the end of every run_conversation turn (_persist_session, line 3264). By the time _cleanup_agent_resources fires, it holds the real transcript.
  • isinstance(..., list) discrimination is deliberate — it protects against MagicMock agents (whose attribute access auto-synthesises a child mock, not None) falling through and passing a bogus object to the provider's List[Dict] hook.
  • The try/except Exception: pass wrap is inherited; providers that raise never prevent close() from running.
  • Paths using skip_memory=True temporary agents (pre-reset memory flush, session hygiene auto-compress, /compress) are no-ops inside shutdown_memory_provider because self._memory_manager is None — so this change has no behaviour effect on them.

Scope

Part A only. Part B of #15165 (adding on_session_end to the Hindsight plugin) is a separate concern that benefits from this fix landing first — without Part A, a Hindsight on_session_end hook would still receive [].

Test plan

  • New regression suite: tests/gateway/test_shutdown_memory_provider_messages.py — 7 cases:
    • Populated _session_messages list is forwarded byte-for-byte
    • Empty list is still explicitly forwarded (matches pre-fix observable behaviour)
    • Agent without _session_messages falls back to no-arg call (stub compatibility)
    • MagicMock agent (non-list attribute) falls back to no-arg call
    • Provider exception is swallowed — close() still runs afterward
    • None agent is a no-op (idle-sweep race tolerance)
    • Agent without shutdown_memory_provider method still gets close()
  • Regression guard verified: reverted the fix → test_populated_messages_forwarded fails with Expected: mock([...]) / Actual: mock(); restored the fix → all 7 pass.
  • Full tests/gateway/ suite: 3703 passed, 9 pre-existing failures (dingtalk, matrix with missing mautrix, one approve-deny test — all fail identically on main, unrelated to this change).
  • Related suites pass clean (100/100): test_agent_cache.py, test_compress_command.py, test_background_command.py, test_flush_memory_stale_guard.py, test_session_boundary_hooks.py, test_compress_plugin_engine.py, test_clean_shutdown_marker.py.

Related

  • Fixes #15165 (Part A)
  • #7759 (closed) — /new / /reset memory commit, different code path
  • #14981 — on_session_finalize not fired on idle timeout (orthogonal)
  • #15073 — Hindsight sync races interpreter shutdown (orthogonal)

🤖 Generated with Claude Code

Changed files

  • gateway/run.py (modified, +15/-1)
  • tests/gateway/test_shutdown_memory_provider_messages.py (added, +148/-0)

Code Example

hermes chat -q "Reply with OK only." -Q

---

hermes chat -q "Reply with OK only." -Q

---

WARNING plugins.memory.hindsight: Hindsight sync failed: cannot schedule new futures after interpreter shutdown
Traceback (most recent call last):
  File "/home/alex/.hermes/hermes-agent/plugins/memory/hindsight/__init__.py", line 920, in _sync
    _run_sync(client.aretain_batch(
  ...
  File ".../asyncio/base_events.py", line 830, in run_in_executor
    executor.submit(func, *args), loop=self)
  File ".../concurrent/futures/thread.py", line 169, in submit
    raise RuntimeError('cannot schedule new futures after interpreter shutdown')
RuntimeError: cannot schedule new futures after interpreter shutdown
ERROR asyncio: Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x...>

---

2026-04-24 10:30:30,947 WARNING plugins.memory.hindsight: Hindsight sync failed: cannot schedule new futures after interpreter shutdown
2026-04-24 10:30:32,488 ERROR [20260424_103025_10670d] asyncio: Unclosed client session

---

Report     https://paste.rs/Vk9oT
agent.log  https://paste.rs/HJ3qW
RAW_BUFFERClick to expand / collapse

Bug Description

When Hermes is configured with the Hindsight memory provider, a simple one-shot CLI run can succeed and still emit Hindsight cleanup errors during process exit.

In my case, this reproduces with:

hermes chat -q "Reply with OK only." -Q

The command returns OK, but shutdown still logs:

  • Hindsight sync failed: cannot schedule new futures after interpreter shutdown
  • Unclosed client session
  • sometimes Unclosed connector

This looks distinct from the already-reported long-running gateway leak in #11923: here the failure happens on single-query CLI exit during interpreter teardown.

Before submitting, I checked the closest existing issues/PRs I could find:

  • #11923 [Bug]: Hindsight still leaks aiohttp ClientSession/connector after fix #4762
  • #14109 fix(hindsight): preserve shared event loop across provider shutdowns
  • #14605 fix(memory): close embedded Hindsight async client cleanly

I did not find an existing issue specifically for the cannot schedule new futures after interpreter shutdown shutdown race.

Steps to Reproduce

  1. Configure Hermes to use the Hindsight memory provider.
  2. Confirm Hermes is current enough to report Up to date via hermes version.
  3. Run:
    hermes chat -q "Reply with OK only." -Q
  4. Observe that the command prints OK and exits 0.
  5. Inspect ~/.hermes/logs/errors.log.

Expected Behavior

Hermes should fully drain or cancel Hindsight background retention work before interpreter teardown, and process exit should not emit Hindsight warnings or aiohttp resource-leak errors.

Actual Behavior

The CLI succeeds, but exit appends a traceback like this to ~/.hermes/logs/errors.log:

WARNING plugins.memory.hindsight: Hindsight sync failed: cannot schedule new futures after interpreter shutdown
Traceback (most recent call last):
  File "/home/alex/.hermes/hermes-agent/plugins/memory/hindsight/__init__.py", line 920, in _sync
    _run_sync(client.aretain_batch(
  ...
  File ".../asyncio/base_events.py", line 830, in run_in_executor
    executor.submit(func, *args), loop=self)
  File ".../concurrent/futures/thread.py", line 169, in submit
    raise RuntimeError('cannot schedule new futures after interpreter shutdown')
RuntimeError: cannot schedule new futures after interpreter shutdown
ERROR asyncio: Unclosed client session
client_session: <aiohttp.client.ClientSession object at 0x...>

Fresh repro from today also produced:

2026-04-24 10:30:30,947 WARNING plugins.memory.hindsight: Hindsight sync failed: cannot schedule new futures after interpreter shutdown
2026-04-24 10:30:32,488 ERROR [20260424_103025_10670d] asyncio: Unclosed client session

Affected Component

  • CLI (interactive chat / one-shot CLI)
  • Agent Core (conversation loop, memory shutdown/lifecycle)

Messaging Platform (if gateway-related)

  • N/A (CLI only)

Debug Report

hermes debug share output:

Report     https://paste.rs/Vk9oT
agent.log  https://paste.rs/HJ3qW

Environment

  • Operating System: Ubuntu 24.04.4 LTS
  • Python Version: Hermes runtime Python 3.11.15 (hermes version); system python3 is 3.12.3
  • Hermes Version: Hermes Agent v0.11.0 (2026.4.23), reports Up to date

Additional Logs / Traceback

Relevant current source/log locations:

  • plugins/memory/hindsight/__init__.py:905-933sync_turn() starts a daemon background thread and calls client.aretain_batch() via _run_sync(...)
  • plugins/memory/hindsight/__init__.py:1012-1039shutdown() joins background threads for only 5 seconds, then closes the client and stops the shared event loop
  • ~/.hermes/logs/errors.log:6429-6466 — latest local repro showing the interpreter-shutdown traceback followed by Unclosed client session

Root Cause Analysis

The likely race is:

  1. sync_turn() launches a daemon hindsight-sync thread that performs _run_sync(client.aretain_batch(...)).
  2. Hermes teardown reaches shutdown() late in process exit.
  3. shutdown() only waits up to 5 seconds for self._sync_thread / self._prefetch_thread.
  4. If the retain thread is still active or starts additional async work after interpreter teardown begins, aiohttp eventually reaches run_in_executor(...) and fails with RuntimeError: cannot schedule new futures after interpreter shutdown.
  5. The failed retain path then leaves the aiohttp client session/connector unclosed, causing the follow-on warnings.

I also reproduced the timing problem with a controlled fake-client test locally: shutdown() returned after about 5 seconds while the sync thread was still alive, which strongly suggests the fixed 5-second join is not sufficient to guarantee clean exit.

This appears related to, but not identical with, #11923 / #14109 / #14605. Those focus on shared loop/session cleanup; this variant is specifically about CLI interpreter shutdown racing a still-active background retain thread.

Proposed Fix

A safe fix likely needs one or more of these:

  1. Prevent new background Hindsight retain work from being scheduled once shutdown begins.
  2. Drain or cancel the active retain thread deterministically instead of relying on a daemon thread plus a fixed 5-second join timeout.
  3. Move Hindsight memory shutdown earlier in CLI/session teardown so cleanup completes before Python interpreter shutdown starts.
  4. Add a regression test that runs a one-shot CLI session with Hindsight enabled and asserts that exit does not log cannot schedule new futures after interpreter shutdown, Unclosed client session, or Unclosed connector.

Are you willing to submit a PR for this?

I have a local diagnosis and reproduction and can help test a fix, but I am not attaching a PR with this report.

extent analysis

TL;DR

Prevent new background Hindsight retain work from being scheduled once shutdown begins by modifying the shutdown() method to wait for the retain thread to finish or cancel it deterministically.

Guidance

  1. Review the shutdown() method: Examine the shutdown() method in plugins/memory/hindsight/__init__.py to understand how it currently handles the retain thread and consider modifications to ensure clean exit.
  2. Implement a deterministic shutdown: Instead of relying on a fixed 5-second join timeout, explore ways to drain or cancel the active retain thread deterministically, such as using a threading.Event to signal the thread to exit.
  3. Add a regression test: Create a test that runs a one-shot CLI session with Hindsight enabled and asserts that exit does not log cannot schedule new futures after interpreter shutdown, Unclosed client session, or Unclosed connector to ensure the fix is effective.
  4. Consider moving Hindsight memory shutdown earlier: Investigate the possibility of moving Hindsight memory shutdown earlier in CLI/session teardown to prevent the retain thread from being active during interpreter shutdown.

Example

import threading

# Create an event to signal the retain thread to exit
exit_event = threading.Event()

# In the retain thread
while not exit_event.is_set():
    # Perform retain work

# In the shutdown method
def shutdown():
    # Signal the retain thread to exit
    exit_event.set()
    # Wait for the retain thread to finish
    self._sync_thread.join()

Notes

The proposed fix may require additional modifications to the Hindsight memory provider and the CLI/session teardown process. It is essential to thoroughly test the changes to ensure they do not introduce new issues.

Recommendation

Apply a workaround by modifying the shutdown() method to prevent new background Hindsight retain work from being scheduled once shutdown begins, as this approach is more targeted and less likely to introduce new

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix [Bug]: Hindsight sync can race interpreter shutdown after successful one-shot CLI exit [1 pull requests, 1 participants]