hermes - 💡(How to fix) Fix docs(hindsight-plugin): missing local-LLM concurrency warning in plugin README [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The Hermes Hindsight memory provider README at plugins/memory/hindsight/README.md does not warn users about Hindsight's default HINDSIGHT_API_LLM_MAX_CONCURRENT=32. When Hermes and Hindsight share the same local LLM endpoint (a common setup with a single llama-server instance serving both), this default saturates the endpoint's slot pool and starves Hermes of inference slots — appearing as a "frozen" conversation.

Error Message

The Hermes Hindsight memory provider README at plugins/memory/hindsight/README.md does not warn users about Hindsight's default HINDSIGHT_API_LLM_MAX_CONCURRENT=32. When Hermes and Hindsight share the same local LLM endpoint (a common setup with a single llama-server instance serving both), this default saturates the endpoint's slot pool and starves Hermes of inference slots — appearing as a "frozen" conversation.

Root Cause

The Hermes Hindsight memory provider README at plugins/memory/hindsight/README.md does not warn users about Hindsight's default HINDSIGHT_API_LLM_MAX_CONCURRENT=32. When Hermes and Hindsight share the same local LLM endpoint (a common setup with a single llama-server instance serving both), this default saturates the endpoint's slot pool and starves Hermes of inference slots — appearing as a "frozen" conversation.

Fix Action

Fixed

Code Example

echo 'HINDSIGHT_API_LLM_MAX_CONCURRENT=1' >> ~/.hermes/.env
# Restart Hermes so the daemon respawns with the new env.
RAW_BUFFERClick to expand / collapse

Summary

The Hermes Hindsight memory provider README at plugins/memory/hindsight/README.md does not warn users about Hindsight's default HINDSIGHT_API_LLM_MAX_CONCURRENT=32. When Hermes and Hindsight share the same local LLM endpoint (a common setup with a single llama-server instance serving both), this default saturates the endpoint's slot pool and starves Hermes of inference slots — appearing as a "frozen" conversation.

Reproduction

Hermes + Hindsight both pointing at one llama-server (3 slots) on port 8081. After several turns Hermes appears to freeze. Diagnostics show all 3 slots constantly is_processing: true, ~32 ESTABLISHED connections from hindsight-api.

Fix applied locally

echo 'HINDSIGHT_API_LLM_MAX_CONCURRENT=1' >> ~/.hermes/.env
# Restart Hermes so the daemon respawns with the new env.

With MAX_CONCURRENT=1, Hindsight holds at most one slot, leaving room for Hermes.

Requested action

Add a short "Local LLM Concurrency" subsection to plugins/memory/hindsight/README.md (as a child of the "Local Embedded LLM" section), pointing users at the env var with diagnostic commands. Link to upstream docs for full detail.

Happy to submit a PR if this approach is acceptable.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix docs(hindsight-plugin): missing local-LLM concurrency warning in plugin README [1 pull requests]