hermes - 💡(How to fix) Fix [Feature]: Add 3-second degradation timeout for external memory providers (fallback to builtin when hindsight is down)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

  • #35195 — hindsight-embed pg0 hardcoded openssl path (root cause of daemon failure)
  • This feature would make memory degradation a non-event for end users

Fix Action

Fix / Workaround

agent/memory_manager.py:

  • sync_all() — the provider loop (around line 380)
  • handle_tool_call() — the external provider dispatch (around line 470)

Code Example

# In sync_all(): wrap non-builtin providers
if provider.name == 'builtin':
    provider.sync_turn(...)
else:
    with ThreadPoolExecutor(max_workers=1) as pool:
        f = pool.submit(provider.sync_turn, ...)
        f.result(timeout=3)
RAW_BUFFERClick to expand / collapse

Problem

When the hindsight memory daemon is unresponsive (PostgreSQL crash, network issue, etc.), memory_manager.py's sync_all() and handle_tool_call() block indefinitely on the external provider call. There is no timeout or fallback to the builtin memory provider.

This causes the agent to freeze for 178-300s on every conversation turn, with no indication to the user that memory has degraded.

Proposed Solution

Wrap external (non-builtin) memory provider calls in a short timeout (3 seconds). If the provider times out, log a warning, increment a counter, and proceed without blocking. The builtin provider always executes directly (it's in-process and millisecond-fast).

Key Design Decisions

  1. builtin provider is exempt — it's always fast and local, no timeout needed
  2. Timeout, don't crashTimeoutError is caught and logged, not re-raised
  3. Stateless degradation — next turn retries the external provider; if it recovers, it works again immediately
  4. Observability_external_timeout_count and _external_timeout_name track degradation for monitoring

Affected Code

agent/memory_manager.py:

  • sync_all() — the provider loop (around line 380)
  • handle_tool_call() — the external provider dispatch (around line 470)

Reference Implementation

A working implementation is available with timeout counters, logging, and ThreadPoolExecutor-based timeout wrapper. Key pattern:

# In sync_all(): wrap non-builtin providers
if provider.name == 'builtin':
    provider.sync_turn(...)
else:
    with ThreadPoolExecutor(max_workers=1) as pool:
        f = pool.submit(provider.sync_turn, ...)
        f.result(timeout=3)

Full reference: see comment in #35195.

Impact

  • Hindsight failure → 3-second delay instead of 178-second freeze
  • User can keep using the agent with degraded memory (builtin still works)
  • Zero behavior change when hindsight is healthy

Related

  • #35195 — hindsight-embed pg0 hardcoded openssl path (root cause of daemon failure)
  • This feature would make memory degradation a non-event for end users

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Feature]: Add 3-second degradation timeout for external memory providers (fallback to builtin when hindsight is down)