hermes - 💡(How to fix) Fix RuntimeError: Timeout context manager should be used inside a task during aretain_batch when background event loop is silently replaced [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#17226Fetched 2026-04-29 06:36:38
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

Error Message

Full traceback (from our logs):

Traceback (most recent call last):
  File "/home/hermes/.hermes/plugins/hermes-memory-hindsight/retain_batch.py", line 89, in aretain_batch
    async with aiohttp.ClientTimeout(total=30):
  File "/home/hermes/.local/lib/python3.13/site-packages/aiohttp/helpers.py", line 762, in __aenter__
    raise RuntimeError(
        "Timeout context manager should be used inside a task")
RuntimeError: Timeout context manager should be used inside a task

Root Cause

The plugin maintains a module-global background event loop (__event_loop / _ensure_loop()) shared across all HindsightMemoryProvider instances. The _ensure_loop() function silently replaces this loop if it detects __event_loop is None (e.g. after an unhandled exception kills the loop thread).

However, the provider lazily initializes _client (an aiohttp.ClientSession) on its first async request. That session is bound to the loop that was running at creation time. When _ensure_loop() creates a new loop, subsequent retain() calls run on the new loop, but the cached _client still references the old loop.

Inside aretain_batch(), a Timeout is created with _client._loop (the old loop). When aretain_batch() is called from a coroutine running on the new loop, asyncio.get_running_loop() returns the new loop → the RuntimeError is raised.

Code Example

Traceback (most recent call last):
  File "/home/hermes/.hermes/plugins/hermes-memory-hindsight/retain_batch.py", line 89, in aretain_batch
    async with aiohttp.ClientTimeout(total=30):
  File "/home/hermes/.local/lib/python3.13/site-packages/aiohttp/helpers.py", line 762, in __aenter__
    raise RuntimeError(
        "Timeout context manager should be used inside a task")
RuntimeError: Timeout context manager should be used inside a task

---

import asyncio, aiohttp

loop1 = asyncio.new_event_loop()
client = aiohttp.ClientSession(loop=loop1)
loop1.run_until_complete(asyncio.sleep(0))
loop1.close()

loop2 = asyncio.new_event_loop()
asyncio.set_event_loop(loop2)

async def broken():
    timeout = aiohttp.ClientTimeout(total=5)
    # timeout._loop == loop1, but we're on loop2
    async with timeout:
        await client.get("http://localhost")

loop2.run_until_complete(broken())  # RuntimeError
RAW_BUFFERClick to expand / collapse

Bug Description

The Hindsight memory plugin intermittently fails with RuntimeError: Timeout context manager should be used inside a task during async retain operations. This occurs in long-running gateway processes (and more frequently under higher message volume, e.g. when cronjobs increase retain frequency).

Stack Trace

Full traceback (from our logs):

Traceback (most recent call last):
  File "/home/hermes/.hermes/plugins/hermes-memory-hindsight/retain_batch.py", line 89, in aretain_batch
    async with aiohttp.ClientTimeout(total=30):
  File "/home/hermes/.local/lib/python3.13/site-packages/aiohttp/helpers.py", line 762, in __aenter__
    raise RuntimeError(
        "Timeout context manager should be used inside a task")
RuntimeError: Timeout context manager should be used inside a task

Root Cause Analysis

The plugin maintains a module-global background event loop (__event_loop / _ensure_loop()) shared across all HindsightMemoryProvider instances. The _ensure_loop() function silently replaces this loop if it detects __event_loop is None (e.g. after an unhandled exception kills the loop thread).

However, the provider lazily initializes _client (an aiohttp.ClientSession) on its first async request. That session is bound to the loop that was running at creation time. When _ensure_loop() creates a new loop, subsequent retain() calls run on the new loop, but the cached _client still references the old loop.

Inside aretain_batch(), a Timeout is created with _client._loop (the old loop). When aretain_batch() is called from a coroutine running on the new loop, asyncio.get_running_loop() returns the new loop → the RuntimeError is raised.

Environment

  • hermes-agent version: main (2026-04-22)
  • hindsight-client: 0.4.2
  • hindsight-client-api: 0.3.1
  • aiohttp: 3.11.16
  • Python: 3.13.3
  • OS: Ubuntu 24.04

Reproduction Steps

  1. Start a long-running Hermes gateway with Hindsight memory enabled.
  2. Trigger enough retain operations that the background loop thread eventually crashes (or simulate by injecting an exception into the loop).
  3. Observe that _ensure_loop() creates a new loop thread.
  4. Subsequent retain()aretain_batch() calls fail with RuntimeError.

Minimal reproduction (standalone)

import asyncio, aiohttp

loop1 = asyncio.new_event_loop()
client = aiohttp.ClientSession(loop=loop1)
loop1.run_until_complete(asyncio.sleep(0))
loop1.close()

loop2 = asyncio.new_event_loop()
asyncio.set_event_loop(loop2)

async def broken():
    timeout = aiohttp.ClientTimeout(total=5)
    # timeout._loop == loop1, but we're on loop2
    async with timeout:
        await client.get("http://localhost")

loop2.run_until_complete(broken())  # RuntimeError

Related Issues / PRs

  • [Bug]: Hindsight still leaks aiohttp ClientSession/connector after fix #4762 #11923 — Hindsight leaks ClientSession/connector (same underlying session lifecycle problem)
  • fix(hindsight): preserve shared event loop across provider shutdowns #14109fix(hindsight): preserve shared event loop across provider shutdowns (closed, not merged)
  • fix(hindsight): drain retain queue cleanly on shutdown #17005fix(hindsight): drain retain queue cleanly on shutdown (open, comprehensive rewrite of retain path)
  • [Bug]: Hindsight sync can race interpreter shutdown after successful one-shot CLI exit #15073 / Hindsight provider can submit retain work during interpreter shutdown #15497 — Hindsight sync races interpreter shutdown (related but different symptom)

Potential Fix Directions

  1. Invalidate cached client when loop changes: In _ensure_loop(), when a new loop is created, also set _client = None on all live providers to force recreation with the new loop.

  2. Bind session to loop at call time: Make _get_or_create_client() recreate _client if it detects a loop mismatch (asyncio.get_running_loop() is not _client._loop).

  3. Accept PR fix(hindsight): drain retain queue cleanly on shutdown #17005: The single-writer queue model in #17005 may reduce loop pressure, but should be verified that it also handles session recreation on loop replacement.

Suggested Priority

P2 — Gateway crash is non-fatal (gateway auto-restarts), but Hindsight memory loss during active sessions degrades agent context quality.

Labels

bug, memory, hindsight, asyncio, good first issue


Reported by Hermes Agent diagnostic session. Stack trace and root cause confirmed from live logs on gateway 10.0.30.4.

extent analysis

TL;DR

The most likely fix is to invalidate the cached client when the event loop changes, by setting _client = None in _ensure_loop() and recreating it with the new loop.

Guidance

  • Identify the _ensure_loop() function and modify it to set _client = None when a new loop is created, forcing the recreation of the client with the new loop.
  • Alternatively, update the _get_or_create_client() method to recreate the _client if it detects a loop mismatch.
  • Verify that the fix works by running the minimal reproduction code and checking that the RuntimeError is no longer raised.
  • Consider accepting the PR fix(hindsight): drain retain queue cleanly on shutdown #17005 as a more comprehensive solution.

Example

def _ensure_loop():
    # ...
    if self.__event_loop is None:
        self.__event_loop = asyncio.new_event_loop()
        # Invalidate cached client
        self._client = None
    # ...

Notes

The provided minimal reproduction code demonstrates the issue and can be used to verify the fix. The suggested fix directions provide two possible solutions, and the PR fix(hindsight): drain retain queue cleanly on shutdown #17005 may offer a more comprehensive solution.

Recommendation

Apply the workaround by invalidating the cached client when the event loop changes, as it is a simpler and more targeted solution. This fix should prevent the RuntimeError and ensure that the Hindsight memory plugin works correctly even when the event loop is replaced.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix RuntimeError: Timeout context manager should be used inside a task during aretain_batch when background event loop is silently replaced [1 participants]