hermes - 💡(How to fix) Fix Embedded Hindsight retain reports failure while async retain later appears in recall

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When Hermes Agent uses embedded Hindsight in local_embedded mode backed by LM Studio, the hindsight_retain tool can report a hard failure to the agent even though the memory later appears in hindsight_recall.

This creates ambiguous memory state for the user: Hermes says retain failed or the server disconnected, but subsequent recall proves at least some of the facts were stored in Hindsight.

Error Message

Worker Dereks-MacBook-Pro waiting for 4 in-flight tasks slow llm call: scope=retain_extract_facts, model=lmstudio/google/gemma-4-e4b Worker shutdown timeout after 30.0s, cancelling remaining tasks asyncio.exceptions.CancelledError: Task cancelled, timeout graceful shutdown exceeded Received signal 15, shutting down...

Root Cause

This is especially confusing for a personal-assistant memory system. The user cannot tell whether a fact is safely stored, queued, dropped, or partially processed. In practice this led to repeated retry attempts, more queue pressure, and uncertainty about what the assistant actually remembered.

Code Example

Failed to store memory:
Failed to store memory: Server disconnected

---

Worker Dereks-MacBook-Pro waiting for 4 in-flight tasks
slow llm call: scope=retain_extract_facts, model=lmstudio/google/gemma-4-e4b
Worker shutdown timeout after 30.0s, cancelling remaining tasks
asyncio.exceptions.CancelledError: Task cancelled, timeout graceful shutdown exceeded
Received signal 15, shutting down...
RAW_BUFFERClick to expand / collapse

Summary

When Hermes Agent uses embedded Hindsight in local_embedded mode backed by LM Studio, the hindsight_retain tool can report a hard failure to the agent even though the memory later appears in hindsight_recall.

This creates ambiguous memory state for the user: Hermes says retain failed or the server disconnected, but subsequent recall proves at least some of the facts were stored in Hindsight.

Environment

  • Hermes Agent: v0.14.0 (2026.5.16)
  • OS: macOS
  • Hindsight mode: local_embedded
  • Hindsight API: 0.6.2
  • Local LLM provider: LM Studio on http://127.0.0.1:1234/v1
  • Hindsight model: google/gemma-4-e4b
  • Memory bank: hermes
  • Relevant Hindsight config on disk:
    • llm_max_concurrent: 3
    • worker_max_slots: 4
    • worker_retain_max_slots: 3
    • worker_consolidation_max_slots: 1
    • retain_async: true
    • timeout: 300

Symptoms

From a Hermes session, calling hindsight_retain returned errors like:

Failed to store memory:
Failed to store memory: Server disconnected

However, a later hindsight_recall for the same content returned facts from that failed retain attempt, including:

  • Claude as a thinking peer that pushes back on bad ideas
  • Dario refusing Claude use for autonomous lethal drone strike decisions
  • Direct Claude API / MCP / custom multi-agent orchestration as Anthropic positioning
  • Personal/career background facts from the retained content

So the user-facing tool result said failure, while Hindsight later behaved as if the content was retained or partially retained.

Relevant daemon behavior observed

The embedded Hindsight daemon logs showed in-flight LM Studio extraction and shutdown/cancellation behavior:

Worker Dereks-MacBook-Pro waiting for 4 in-flight tasks
slow llm call: scope=retain_extract_facts, model=lmstudio/google/gemma-4-e4b
Worker shutdown timeout after 30.0s, cancelling remaining tasks
asyncio.exceptions.CancelledError: Task cancelled, timeout graceful shutdown exceeded
Received signal 15, shutting down...

During some of these states:

  • pgrep showed a hindsight-api process
  • lsof -iTCP:9177 -sTCP:LISTEN showed no listener
  • curl http://127.0.0.1:9177/health returned connection refused
  • The daemon appeared to be respawned by the Hermes plugin

Expected behavior

The Hermes memory tool should distinguish these states instead of returning an opaque hard failure:

  1. Request accepted / queued for async retain
  2. Request timed out, but operation may still be running
  3. Daemon unavailable
  4. Extraction failed definitively
  5. Extraction completed with zero memory units
  6. Extraction completed successfully

If retain_async: true, ideally hindsight_retain should return an operation ID or queued status when Hindsight accepts the request, rather than forcing the agent/user to infer from a later recall.

Related config/env fragility

There also appears to be a live-config mismatch risk in embedded mode:

  • ~/.hermes/hindsight/config.json had the intended 4-slot/e4b config.
  • ~/.hindsight/profiles/hermes.env materializes only a subset of settings.
  • Worker/concurrency overrides depend on the parent Hermes process environment.
  • If ~/.hermes/.env changes mid-session, daemon respawns can inherit stale env until Hermes itself restarts.

This means the on-disk config can look correct while the live daemon is running older worker/concurrency settings. It would help if hermes memory status or the Hindsight plugin exposed the live daemon config and warned when it differs from config.json.

Why this matters

This is especially confusing for a personal-assistant memory system. The user cannot tell whether a fact is safely stored, queued, dropped, or partially processed. In practice this led to repeated retry attempts, more queue pressure, and uncertainty about what the assistant actually remembered.

Suggested fixes

  • Have the Hermes Hindsight wrapper surface async operation IDs / queued status when available.
  • Avoid reporting a definitive failure if the request may have been accepted by Hindsight and completed later.
  • Improve error messages when the daemon disconnects or shuts down mid-request.
  • Consider graceful draining or restart deferral while retain/consolidation operations are in-flight.
  • Expose live embedded daemon config in diagnostics/status, especially model, provider, worker slots, and concurrency settings.
  • Consider materializing all relevant worker/concurrency settings into the embedded profile env, or force the daemon to reload from config.json instead of relying on parent-process .env inheritance.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The Hermes memory tool should distinguish these states instead of returning an opaque hard failure:

  1. Request accepted / queued for async retain
  2. Request timed out, but operation may still be running
  3. Daemon unavailable
  4. Extraction failed definitively
  5. Extraction completed with zero memory units
  6. Extraction completed successfully

If retain_async: true, ideally hindsight_retain should return an operation ID or queued status when Hindsight accepts the request, rather than forcing the agent/user to infer from a later recall.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING