hermes - 💡(How to fix) Fix Doctor is passive — silent setup drift goes unnoticed for weeks [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#25098Fetched 2026-05-14 03:49:00
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×3

hermes doctor does an excellent job of detecting broken / missing components (memory provider not connected, plugin lib not installed, API keys missing, symlinks gone, system deps missing). The problem is that nothing ever invokes it. It only runs when the user remembers to type the command — which most users don't until they wonder why a feature stopped working. In practice this means stale configs, missing client libraries, expired keys, and disabled providers can sit broken for weeks before anyone notices. By the time they do, they often can't reconstruct what changed.

Root Cause

Hermes is positioned as an "agent that quietly improves over time" — a learning loop, persistent memory, autonomous skill creation. All of those rely on backend components that can degrade silently:

  • Memory provider stops syncing → "deepening model of who you are across sessions" silently stops deepening
  • Auxiliary client misconfigured → cost/title/embedding calls fail silently and either stall or get retried via fallback at higher cost
  • Plugin's Python dep removed by pip reinstall → feature that worked last week now silently no-ops
  • Cron scheduler tries to deliver to a Telegram chat whose token rotated → daily reports disappear and nobody notices for a month The current behaviour is worse than failing loud: features appear to be working (the YAML says so) while actually being inert.
RAW_BUFFERClick to expand / collapse

Summary

hermes doctor does an excellent job of detecting broken / missing components (memory provider not connected, plugin lib not installed, API keys missing, symlinks gone, system deps missing). The problem is that nothing ever invokes it. It only runs when the user remembers to type the command — which most users don't until they wonder why a feature stopped working. In practice this means stale configs, missing client libraries, expired keys, and disabled providers can sit broken for weeks before anyone notices. By the time they do, they often can't reconstruct what changed.

Concrete real-world example (today)

A user had memory.provider: honcho in config.yaml and a populated honcho: block. Honcho appeared "active" in the YAML and the gateway started without complaint. In reality:

  • ~/.hermes/honcho.json was empty (never initialized)
  • honcho-ai Python lib was not installed in the venv
  • hermes memory status showed Provider: (none — built-in only)
  • Conversations were not being synced for an unknown number of weeks hermes doctor immediately surfaced both problems: ◆ Memory Provider ✗ Honcho connection failed honcho-ai is required for Honcho integration. Install it with: pip install honcho-ai

Same for several other plugins (kanban, image_gen, browser, web tools, ...). Doctor knows. The user doesn't.

Expected behaviour

The agent should pro-actively alert the user when a configured-but-broken component is detected, at least once. Suggested layers (any one of them would help):

  1. On gateway start: run a fast subset of doctor checks. If a configured provider/plugin is broken, emit a WARNING log line and push a single notice through the active platform adapter (Telegram/Discord/Signal): "Heads-up: 2 components configured in config.yaml are not actually working. Run hermes doctor for details." (Throttled to once per gateway start, dedup by hash of the finding.)
  2. Daily cron job (built-in cron scheduler is already there): re-run doctor, only emit a notice if the result changed since last run.
  3. hermes status: surface a one-line summary ⚠ 2 doctor issues — run hermes doctor so users see it whenever they check anything.
  4. First-use of a degraded feature: when a feature whose backing config is broken is first invoked in a session, fail-loud with the specific doctor finding inline (not just a generic "tool not available"). The key principle: doctor's diagnostics already exist and are excellent — they just need to be surfaced without the user asking.

Why this matters

Hermes is positioned as an "agent that quietly improves over time" — a learning loop, persistent memory, autonomous skill creation. All of those rely on backend components that can degrade silently:

  • Memory provider stops syncing → "deepening model of who you are across sessions" silently stops deepening
  • Auxiliary client misconfigured → cost/title/embedding calls fail silently and either stall or get retried via fallback at higher cost
  • Plugin's Python dep removed by pip reinstall → feature that worked last week now silently no-ops
  • Cron scheduler tries to deliver to a Telegram chat whose token rotated → daily reports disappear and nobody notices for a month The current behaviour is worse than failing loud: features appear to be working (the YAML says so) while actually being inert.

Suggested implementation sketch

A new gateway.startup_health.py module that:

  • Runs the existing doctor subset that doesn't require the network (tools, plugins, providers, lib presence, symlinks)
  • Compares results against a state file ~/.hermes/.doctor_state.json to detect new issues (so re-warnings don't spam)
  • For each new or escalated , writes one log line at WARNING level and posts one short notice via _send_unsolicited_message_to_user() on the primary platform
  • For resolved issues, posts an "✓ resolved: …" notice once Total: ~150 LoC plus a small toggle in config (startup_health.enabled: true default). If desired I can prepare a PR — please confirm whether the team wants this surface (config-on by default, opt-out) or something more conservative (opt-in flag, no platform pings, log-only).

Environment

  • Hermes Agent v0.12.0 (2026.4.30) — known to be 1200+ commits behind main; checked the changelog and didn't see this specific behavior addressed in newer commits, but happy to be pointed at one if it exists.
  • Python 3.13, OpenAI SDK 2.33.0
  • Telegram + Ollama backend

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Doctor is passive — silent setup drift goes unnoticed for weeks [1 participants]