hermes - 💡(How to fix) Fix [Feature]: Add 3-second degradation timeout for external memory providers (fallback to builtin when hindsight is down)

StepCodex · 2026-05-30T07:30:10Z

[hermes] Problem When the hindsight memory daemon is unresponsive PostgreSQL crash, network issue, etc. , memory manager.py 's sync all and handle tool call bl… ## Fix / Workaround `agent/memory_manager.py`: - `sync_all()` — the provider loop (around line 380) - `handle_tool_call()` — the external provider dispatch (around line 470) ## Problem When the hindsight memory daemon is unresponsive (PostgreSQL crash, network issue, etc.), `memory_manager.py`'s `sync_all()` and `handle_tool_call()` block indefinitely on the external provider call. There is no timeout or fallback to the builtin memory provider. This causes the agent to freeze for 178-300s on every conversation turn, with no indication to the user that memory has degraded. ## Proposed Solution Wrap external (non-builtin) memory provider calls in a short timeout (3 seconds). If the provider times out, log a warning, increment a counter, and proceed without blocking. The builtin provider always executes directly (it's in-process and millisecond-fast). ### Key Design Decisions 1. **builtin provider is exempt** — it's always fast and local, no timeout needed 2. **Timeout, don't crash** — `TimeoutError` is caught and logged, not re-raised 3. **Stateless degradation** — next turn retries the external provider; if it recovers, it works again immediately 4. **Observability** — `_external_timeout_count` and `_external_timeout_name` track degradation for monitoring ## Affected Code `agent/memory_manager.py`: - `sync_all()` — the provider loop (around line 380) - `handle_tool_call()` — the external provider dispatch (around line 470) ## Reference Implementation A working implementation is available with timeout counters, logging, and ThreadPoolExecutor-based timeout wrapper. Key pattern: ```python # In sync_all(): wrap non-builtin providers if provider.name == 'builtin': provider.sync_turn(...) else: with ThreadPoolExecutor(max_workers=1) as pool: f = pool.submit(provider.sync_turn, ...) f.result(timeout=3) ``` Full reference: see comment in #35195. ## Impact - Hindsight failure → 3-second delay instead of 178-second freeze - User can keep using the agent with degraded memory (builtin still works) - Zero behavior change when hindsight is healthy ## Related - #35195 — hindsight-embed pg0 hardcoded openssl path (root cause of daemon failure) - This feature would make memory degradation a non-event for end users

hermes2026-05-30 07:30:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

#35195 — hindsight-embed pg0 hardcoded openssl path (root cause of daemon failure)
This feature would make memory degradation a non-event for end users

Fix Action

Fix / Workaround

agent/memory_manager.py:

sync_all() — the provider loop (around line 380)
handle_tool_call() — the external provider dispatch (around line 470)

Code Example

# In sync_all(): wrap non-builtin providers
if provider.name == 'builtin':
    provider.sync_turn(...)
else:
    with ThreadPoolExecutor(max_workers=1) as pool:
        f = pool.submit(provider.sync_turn, ...)
        f.result(timeout=3)

RAW_BUFFERClick to expand / collapse

Problem

When the hindsight memory daemon is unresponsive (PostgreSQL crash, network issue, etc.), memory_manager.py's sync_all() and handle_tool_call() block indefinitely on the external provider call. There is no timeout or fallback to the builtin memory provider.

This causes the agent to freeze for 178-300s on every conversation turn, with no indication to the user that memory has degraded.

Proposed Solution

Wrap external (non-builtin) memory provider calls in a short timeout (3 seconds). If the provider times out, log a warning, increment a counter, and proceed without blocking. The builtin provider always executes directly (it's in-process and millisecond-fast).

Key Design Decisions

builtin provider is exempt — it's always fast and local, no timeout needed
Timeout, don't crash — TimeoutError is caught and logged, not re-raised
Stateless degradation — next turn retries the external provider; if it recovers, it works again immediately
Observability — _external_timeout_count and _external_timeout_name track degradation for monitoring

Affected Code

agent/memory_manager.py:

sync_all() — the provider loop (around line 380)
handle_tool_call() — the external provider dispatch (around line 470)

Reference Implementation

A working implementation is available with timeout counters, logging, and ThreadPoolExecutor-based timeout wrapper. Key pattern:

# In sync_all(): wrap non-builtin providers
if provider.name == 'builtin':
    provider.sync_turn(...)
else:
    with ThreadPoolExecutor(max_workers=1) as pool:
        f = pool.submit(provider.sync_turn, ...)
        f.result(timeout=3)

Full reference: see comment in #35195.

Impact

Hindsight failure → 3-second delay instead of 178-second freeze
User can keep using the agent with degraded memory (builtin still works)
Zero behavior change when hindsight is healthy

#35195 — hindsight-embed pg0 hardcoded openssl path (root cause of daemon failure)
This feature would make memory degradation a non-event for end users

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering