The Hermes memory tool should distinguish these states instead of returning an opaque hard failure: 1. Request accepted / queued for async retain 2. Request timed out, but operation may still be running 3. Daemon unavailable 4. Extraction failed definitively 5. Extraction completed with zero memory units 6. Extraction completed successfully If `retain_async: true`, ideally `hindsight_retain` should return an operation ID or queued status when Hindsight accepts the request, rather than forcing the agent/user to infer from a later recall.

hermes - 💡(How to fix) Fix Embedded Hindsight retain reports failure while async retain later appears in recall

hermes2026-05-20 04:01:10

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

When Hermes Agent uses embedded Hindsight in local_embedded mode backed by LM Studio, the hindsight_retain tool can report a hard failure to the agent even though the memory later appears in hindsight_recall.

This creates ambiguous memory state for the user: Hermes says retain failed or the server disconnected, but subsequent recall proves at least some of the facts were stored in Hindsight.

Error Message

Worker Dereks-MacBook-Pro waiting for 4 in-flight tasks slow llm call: scope=retain_extract_facts, model=lmstudio/google/gemma-4-e4b Worker shutdown timeout after 30.0s, cancelling remaining tasks asyncio.exceptions.CancelledError: Task cancelled, timeout graceful shutdown exceeded Received signal 15, shutting down...

Root Cause

This is especially confusing for a personal-assistant memory system. The user cannot tell whether a fact is safely stored, queued, dropped, or partially processed. In practice this led to repeated retry attempts, more queue pressure, and uncertainty about what the assistant actually remembered.

Code Example

Failed to store memory:
Failed to store memory: Server disconnected

---

Worker Dereks-MacBook-Pro waiting for 4 in-flight tasks
slow llm call: scope=retain_extract_facts, model=lmstudio/google/gemma-4-e4b
Worker shutdown timeout after 30.0s, cancelling remaining tasks
asyncio.exceptions.CancelledError: Task cancelled, timeout graceful shutdown exceeded
Received signal 15, shutting down...

RAW_BUFFERClick to expand / collapse

Summary

This creates ambiguous memory state for the user: Hermes says retain failed or the server disconnected, but subsequent recall proves at least some of the facts were stored in Hindsight.

Environment

Hermes Agent: v0.14.0 (2026.5.16)
OS: macOS
Hindsight mode: local_embedded
Hindsight API: 0.6.2
Local LLM provider: LM Studio on http://127.0.0.1:1234/v1
Hindsight model: google/gemma-4-e4b
Memory bank: hermes
Relevant Hindsight config on disk:
- llm_max_concurrent: 3
- worker_max_slots: 4
- worker_retain_max_slots: 3
- worker_consolidation_max_slots: 1
- retain_async: true
- timeout: 300

Symptoms

From a Hermes session, calling hindsight_retain returned errors like:

Failed to store memory:
Failed to store memory: Server disconnected

However, a later hindsight_recall for the same content returned facts from that failed retain attempt, including:

Claude as a thinking peer that pushes back on bad ideas
Dario refusing Claude use for autonomous lethal drone strike decisions
Direct Claude API / MCP / custom multi-agent orchestration as Anthropic positioning
Personal/career background facts from the retained content

So the user-facing tool result said failure, while Hindsight later behaved as if the content was retained or partially retained.

Relevant daemon behavior observed

The embedded Hindsight daemon logs showed in-flight LM Studio extraction and shutdown/cancellation behavior:

Worker Dereks-MacBook-Pro waiting for 4 in-flight tasks
slow llm call: scope=retain_extract_facts, model=lmstudio/google/gemma-4-e4b
Worker shutdown timeout after 30.0s, cancelling remaining tasks
asyncio.exceptions.CancelledError: Task cancelled, timeout graceful shutdown exceeded
Received signal 15, shutting down...

During some of these states:

pgrep showed a hindsight-api process
lsof -iTCP:9177 -sTCP:LISTEN showed no listener
curl http://127.0.0.1:9177/health returned connection refused
The daemon appeared to be respawned by the Hermes plugin

Expected behavior

The Hermes memory tool should distinguish these states instead of returning an opaque hard failure:

Request accepted / queued for async retain
Request timed out, but operation may still be running
Daemon unavailable
Extraction failed definitively
Extraction completed with zero memory units
Extraction completed successfully

If retain_async: true, ideally hindsight_retain should return an operation ID or queued status when Hindsight accepts the request, rather than forcing the agent/user to infer from a later recall.

Related config/env fragility

There also appears to be a live-config mismatch risk in embedded mode:

~/.hermes/hindsight/config.json had the intended 4-slot/e4b config.
~/.hindsight/profiles/hermes.env materializes only a subset of settings.
Worker/concurrency overrides depend on the parent Hermes process environment.
If ~/.hermes/.env changes mid-session, daemon respawns can inherit stale env until Hermes itself restarts.

This means the on-disk config can look correct while the live daemon is running older worker/concurrency settings. It would help if hermes memory status or the Hindsight plugin exposed the live daemon config and warned when it differs from config.json.

Why this matters

Suggested fixes

Have the Hermes Hindsight wrapper surface async operation IDs / queued status when available.
Avoid reporting a definitive failure if the request may have been accepted by Hindsight and completed later.
Improve error messages when the daemon disconnects or shuts down mid-request.
Consider graceful draining or restart deferral while retain/consolidation operations are in-flight.
Expose live embedded daemon config in diagnostics/status, especially model, provider, worker slots, and concurrency settings.
Consider materializing all relevant worker/concurrency settings into the embedded profile env, or force the daemon to reload from config.json instead of relying on parent-process .env inheritance.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The Hermes memory tool should distinguish these states instead of returning an opaque hard failure:

Request accepted / queued for async retain
Request timed out, but operation may still be running
Daemon unavailable
Extraction failed definitively
Extraction completed with zero memory units
Extraction completed successfully

If retain_async: true, ideally hindsight_retain should return an operation ID or queued status when Hindsight accepts the request, rather than forcing the agent/user to infer from a later recall.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Embedded Hindsight retain reports failure while async retain later appears in recall

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Symptoms

Relevant daemon behavior observed

Expected behavior

Related config/env fragility

Why this matters

Suggested fixes

FAQ

Expected behavior

Still need to ship something?

TRENDING