hermes - 💡(How to fix) Fix bug(honcho): self-hosted localhost setup silently fails — apiKey trap, no recursive fallback, 30s timeout, silent error masking

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Three independent issues in plugins/memory/honcho/ conspire to make multi-profile self-hosted Honcho setups silently broken. Each individually is small; together they took ~4 hours to diagnose in production because all failures collapse to "No relevant context found" in tool results, with the real cause only visible in journalctl.

Error Message

except Exception as e: logger.warning("Honcho dialectic query failed: %s", e) return ""

Root Cause

Three independent issues in plugins/memory/honcho/ conspire to make multi-profile self-hosted Honcho setups silently broken. Each individually is small; together they took ~4 hours to diagnose in production because all failures collapse to "No relevant context found" in tool results, with the real cause only visible in journalctl.

Fix Action

Fix / Workaround

Workaround used: copy apiKey into every hosts.hermes.<X> block. Brittle — every new profile must remember to do this.

InstancebaseUrlapiKey patternStatus
1 (native)http://localhost:8000hosts.hermes onlybroken — Issue 1 + Issue 2
2 (native)http://localhost:8000each hosts.hermes.<X>OK (lucky workaround)
3(Docker)http://api:8000top-levelOK (Issue 2 doesn't trigger)
4 (Docker)http://api:8000top-levelOK

Code Example

api_key = (
    host_block.get("apiKey")             # 1. hosts.hermes.<profile>.apiKey
    or raw.get("apiKey")                 # 2. top-level apiKey
    or os.environ.get("HONCHO_API_KEY")  # 3. env var
)

---

_is_local = resolved_base_url and (
    "localhost" in resolved_base_url
    or "127.0.0.1" in resolved_base_url
    or "::1" in resolved_base_url
)
if _is_local:
    _host_block = (_raw.get("hosts") or {}).get(config.host, {})
    _host_has_key = bool(_host_block.get("apiKey"))
    effective_api_key = config.api_key if _host_has_key else "local"
else:
    effective_api_key = config.api_key

---

WARNING plugins.memory.honcho.session: Honcho dialectic query failed: Request timed out after 30.0s

---

except Exception as e:
    logger.warning("Honcho dialectic query failed: %s", e)
    return ""

---

return f"[honcho_error: {type(e).__name__}]"

---

local_path = get_hermes_home() / "honcho.json"
if local_path.exists():
    return local_path
# ... falls back to default ~/.hermes/honcho.json
RAW_BUFFERClick to expand / collapse

Summary

Three independent issues in plugins/memory/honcho/ conspire to make multi-profile self-hosted Honcho setups silently broken. Each individually is small; together they took ~4 hours to diagnose in production because all failures collapse to "No relevant context found" in tool results, with the real cause only visible in journalctl.

Environment

  • Hermes Agent v0.14.0
  • Honcho self-hosted (Docker stack, AUTH_USE_AUTH=true, workspace-scoped JWTs)
  • Native systemd profile gateway services (hermes-gateway-<profile>.service)
  • baseUrl: http://localhost:8000

Issue 1: apiKey lookup has no recursive fallback into the hermes default block

plugins/memory/honcho/client.py:386:

api_key = (
    host_block.get("apiKey")             # 1. hosts.hermes.<profile>.apiKey
    or raw.get("apiKey")                 # 2. top-level apiKey
    or os.environ.get("HONCHO_API_KEY")  # 3. env var
)

There is no fallback to hosts.hermes.apiKey (the default-profile block). When a user runs hermes honcho setup and provides an apiKey, it's stored in hosts.hermes. Subsequent profile additions create hosts.hermes.<X> without apiKey, expecting inheritance from the default block. Inheritance never happens. Profile X authentication breaks silently.

Expected: apiKey should fall back through hosts.hermes.<X>hosts.hermes → top-level → env. The first three are all explicit user intent expressed in the same config file.

Issue 2: _is_local branch ignores top-level apiKey

client.py:758-771:

_is_local = resolved_base_url and (
    "localhost" in resolved_base_url
    or "127.0.0.1" in resolved_base_url
    or "::1" in resolved_base_url
)
if _is_local:
    _host_block = (_raw.get("hosts") or {}).get(config.host, {})
    _host_has_key = bool(_host_block.get("apiKey"))
    effective_api_key = config.api_key if _host_has_key else "local"
else:
    effective_api_key = config.api_key

For localhost baseUrl, effective_api_key is only honored when the host block specifically has apiKey. Top-level apiKey (which config.api_key correctly resolved via raw.get("apiKey") at line 387) is dropped on the floor and replaced with the placeholder string "local".

This is the opposite of documented inheritance order. For self-hosted Honcho with AUTH_USE_AUTH=true, the string "local" results in a 401 (Invalid JWT).

Workaround used: copy apiKey into every hosts.hermes.<X> block. Brittle — every new profile must remember to do this.

Suggested fix: trust config.api_key when it's non-empty regardless of _is_local. The "skip cloud key on local" intent is fine, but it should be triggered by an explicit "localOnly": true flag, not by URL string matching.

Issue 3: Default 30s HTTP timeout cuts off dialectic_query for reasoning_level≥medium

Direct dialectic chat over a peer with rich representation regularly takes 30–60s on reasoning_level=medium (gpt-5.5 backend over ~5–10 KB representation). Hermes default HTTP timeout is 30s:

WARNING plugins.memory.honcho.session: Honcho dialectic query failed: Request timed out after 30.0s

This creates a confusing failure pattern: honcho_search (representation lookup, no LLM) works because it's sub-second. honcho_reasoning fails. User reports "search works but reasoning doesn't" — leading the operator down the wrong debug path (peer-pair representation, observation isolation, etc.) when the real issue is just timeout.

Suggested fix:

  • Bump default timeout to 60s OR
  • Auto-stretch default timeout based on reasoning_level (e.g. min(60, 30*max(1, level_to_int)) ) OR
  • Surface timeout knob in hermes honcho setup wizard

Issue 4: Silent error masking in dialectic_query

session.py:dialectic_query (around line 590):

except Exception as e:
    logger.warning("Honcho dialectic query failed: %s", e)
    return ""

return "" causes tool output to render as "No relevant context found" — indistinguishable from a successful query returning no relevant data. Auth errors, timeouts, and "actually nothing matches" all look the same to both the LLM and the user.

Suggested fix: surface a non-empty error marker back to caller, e.g.:

return f"[honcho_error: {type(e).__name__}]"

so the LLM sees the error and can adapt, OR the operator sees something other than the "no-data" message and knows to check journalctl.

Issue 5: resolve_config_path() skips profiles/<name>/honcho.json

client.py:resolve_config_path:

local_path = get_hermes_home() / "honcho.json"
if local_path.exists():
    return local_path
# ... falls back to default ~/.hermes/honcho.json

systemd unit files generated by Hermes for --profile X do NOT set HERMES_HOME. So get_hermes_home() == ~/.hermes (default). The file ~/.hermes/profiles/X/honcho.json is never read.

This creates dead code in user setups — operators reasonably assume per-profile honcho.json overrides global, but actually only global is read. Per-profile files accumulate as zombie configs (we had 12 dead files in our first instance).

Suggested fix: Either (a) actually read per-profile honcho.json and merge with global, (b) emit a warning if profiles/<name>/honcho.json exists but is never loaded, or (c) document that profile-level configs require explicit HERMES_HOME override and emit warnings when they exist alongside an unset env var.

Full RCA from production incident

In our 4-Hermes-instance / 12-profile production setup these issues compounded:

InstancebaseUrlapiKey patternStatus
1 (native)http://localhost:8000hosts.hermes onlybroken — Issue 1 + Issue 2
2 (native)http://localhost:8000each hosts.hermes.<X>OK (lucky workaround)
3(Docker)http://api:8000top-levelOK (Issue 2 doesn't trigger)
4 (Docker)http://api:8000top-levelOK

For 9 days (2026-05-22 → 2026-05-31), 1 profiles silently failed all Honcho writes — 86 Failed to sync messages errors in journalctl, but tools and UI showed nothing. We had to backfill ~1000 messages from Hermes' local state.db afterwards (which fortunately preserves everything in SQLite).

Priority order from operator perspective

  1. Issue 3 (timeout) — single config-knob fix, immediate win
  2. Issue 4 (error masking) — small change, huge debuggability improvement
  3. Issue 1 (recursive apiKey fallback) — matches user expectation
  4. Issue 2 (_is_local trap) — surprising behavior, hard to debug
  5. Issue 5 (dead profile files) — discoverability/UX issue

Happy to provide more journalctl excerpts / config samples / repro repo if useful, and could submit PRs for #3 and #4 if there's interest.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix bug(honcho): self-hosted localhost setup silently fails — apiKey trap, no recursive fallback, 30s timeout, silent error masking