hermes - 💡(How to fix) Fix Doctor is passive — silent setup drift goes unnoticed for weeks [1 participants]

Orbitalo · 2026-05-13T16:26:48Z

[hermes] hermes doctor does an excellent job of detecting broken / missing components memory provider not connected, plugin lib not installed, API keys missing… `hermes doctor` does an excellent job of detecting broken / missing components (memory provider not connected, plugin lib not installed, API keys missing, symlinks gone, system deps missing). The problem is that **nothing ever invokes it**. It only runs when the user remembers to type the command — which most users don't until they wonder why a feature stopped working. In practice this means stale configs, missing client libraries, expired keys, and disabled providers can sit broken for **weeks** before anyone notices. By the time they do, they often can't reconstruct what changed. ## Summary `hermes doctor` does an excellent job of detecting broken / missing components (memory provider not connected, plugin lib not installed, API keys missing, symlinks gone, system deps missing). The problem is that **nothing ever invokes it**. It only runs when the user remembers to type the command — which most users don't until they wonder why a feature stopped working. In practice this means stale configs, missing client libraries, expired keys, and disabled providers can sit broken for **weeks** before anyone notices. By the time they do, they often can't reconstruct what changed. ## Concrete real-world example (today) A user had `memory.provider: honcho` in `config.yaml` and a populated `honcho:` block. Honcho appeared "active" in the YAML and the gateway started without complaint. In reality: - `~/.hermes/honcho.json` was empty (never initialized) - `honcho-ai` Python lib was not installed in the venv - `hermes memory status` showed `Provider: (none — built-in only)` - Conversations were **not** being synced for an unknown number of weeks `hermes doctor` immediately surfaced both problems: ◆ Memory Provider ✗ Honcho connection failed honcho-ai is required for Honcho integration. Install it with: pip install honcho-ai Same for several other plugins (kanban, image_gen, browser, web tools, ...). Doctor knows. The user doesn't. ## Expected behaviour The agent should pro-actively alert the user when a configured-but-broken component is detected, at least once. Suggested layers (any one of them would help): 1. **On gateway start**: run a fast subset of `doctor` checks. If a *configured* provider/plugin is broken, emit a `WARNING` log line **and** push a single notice through the active platform adapter (Telegram/Discord/Signal): *"Heads-up: 2 components configured in config.yaml are not actually working. Run `hermes doctor` for details."* (Throttled to once per gateway start, dedup by hash of the finding.) 2. **Daily cron job** (built-in cron scheduler is already there): re-run `doctor`, only emit a notice if the result *changed* since last run. 3. **`hermes status`**: surface a one-line summary `⚠ 2 doctor issues — run hermes doctor` so users see it whenever they check anything. 4. **First-use of a degraded feature**: when a feature whose backing config is broken is first invoked in a session, fail-loud with the specific doctor finding inline (not just a generic "tool not available"). The key principle: doctor's diagnostics already exist and are excellent — they just need to be **surfaced without the user asking**. ## Why this matters Hermes is positioned as an "agent that quietly improves over time" — a learning loop, persistent memory, autonomous skill creation. All of those rely on backend components that can degrade silently: - Memory provider stops syncing → "deepening model of who you are across sessions" silently stops deepening - Auxiliary client misconfigured → cost/title/embedding calls fail silently and either stall or get retried via fallback at higher cost - Plugin's Python dep removed by `pip` reinstall → feature that worked last week now silently no-ops - Cron scheduler tries to deliver to a Telegram chat whose token rotated → daily reports disappear and nobody notices for a month The current behaviour is *worse than failing loud*: features appear to be working (the YAML says so) while actually being inert. ## Suggested implementation sketch A new `gateway.startup_health.py` module that: - Runs the existing `doctor` subset that doesn't require the network (tools, plugins, providers, lib presence, symlinks) - Compares results against a state file `~/.hermes/.doctor_state.json` to detect *new* issues (so re-warnings don't spam) - For each *new* `✗` or *escalated* `⚠`, writes one log line at WARNING level and posts one short notice via `_send_unsolicited_message_to_user()` on the primary platform - For *resolved* issues, posts an "✓ resolved: …" notice once Total: ~150 LoC plus a small toggle in config (`startup_health.enabled: true` default). If desired I can prepare a PR — please confirm whether the team wants this surface (config-on by default, opt-out) or something more conservative (opt-in flag, no platform pings, log-only). ## Environment - Herm

hermes2026-05-13 16:26:48

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#25098•Fetched 2026-05-14 03:49:00

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Orbitalo

Participants

Orbitalo

Timeline (top)

labeled ×3

hermes doctor does an excellent job of detecting broken / missing components (memory provider not connected, plugin lib not installed, API keys missing, symlinks gone, system deps missing). The problem is that nothing ever invokes it. It only runs when the user remembers to type the command — which most users don't until they wonder why a feature stopped working. In practice this means stale configs, missing client libraries, expired keys, and disabled providers can sit broken for weeks before anyone notices. By the time they do, they often can't reconstruct what changed.

Root Cause

Hermes is positioned as an "agent that quietly improves over time" — a learning loop, persistent memory, autonomous skill creation. All of those rely on backend components that can degrade silently:

Memory provider stops syncing → "deepening model of who you are across sessions" silently stops deepening
Auxiliary client misconfigured → cost/title/embedding calls fail silently and either stall or get retried via fallback at higher cost
Plugin's Python dep removed by pip reinstall → feature that worked last week now silently no-ops
Cron scheduler tries to deliver to a Telegram chat whose token rotated → daily reports disappear and nobody notices for a month The current behaviour is worse than failing loud: features appear to be working (the YAML says so) while actually being inert.

RAW_BUFFERClick to expand / collapse

Summary

Concrete real-world example (today)

A user had memory.provider: honcho in config.yaml and a populated honcho: block. Honcho appeared "active" in the YAML and the gateway started without complaint. In reality:

~/.hermes/honcho.json was empty (never initialized)
honcho-ai Python lib was not installed in the venv
hermes memory status showed Provider: (none — built-in only)
Conversations were not being synced for an unknown number of weeks hermes doctor immediately surfaced both problems: ◆ Memory Provider ✗ Honcho connection failed honcho-ai is required for Honcho integration. Install it with: pip install honcho-ai

Same for several other plugins (kanban, image_gen, browser, web tools, ...). Doctor knows. The user doesn't.

Expected behaviour

The agent should pro-actively alert the user when a configured-but-broken component is detected, at least once. Suggested layers (any one of them would help):

On gateway start: run a fast subset of doctor checks. If a configured provider/plugin is broken, emit a WARNING log line and push a single notice through the active platform adapter (Telegram/Discord/Signal): "Heads-up: 2 components configured in config.yaml are not actually working. Run hermes doctor for details." (Throttled to once per gateway start, dedup by hash of the finding.)
Daily cron job (built-in cron scheduler is already there): re-run doctor, only emit a notice if the result changed since last run.
hermes status: surface a one-line summary ⚠ 2 doctor issues — run hermes doctor so users see it whenever they check anything.
First-use of a degraded feature: when a feature whose backing config is broken is first invoked in a session, fail-loud with the specific doctor finding inline (not just a generic "tool not available"). The key principle: doctor's diagnostics already exist and are excellent — they just need to be surfaced without the user asking.

Why this matters

Memory provider stops syncing → "deepening model of who you are across sessions" silently stops deepening
Auxiliary client misconfigured → cost/title/embedding calls fail silently and either stall or get retried via fallback at higher cost
Plugin's Python dep removed by pip reinstall → feature that worked last week now silently no-ops
Cron scheduler tries to deliver to a Telegram chat whose token rotated → daily reports disappear and nobody notices for a month The current behaviour is worse than failing loud: features appear to be working (the YAML says so) while actually being inert.

Suggested implementation sketch

A new gateway.startup_health.py module that:

Runs the existing doctor subset that doesn't require the network (tools, plugins, providers, lib presence, symlinks)
Compares results against a state file ~/.hermes/.doctor_state.json to detect new issues (so re-warnings don't spam)
For each new ✗ or escalated ⚠, writes one log line at WARNING level and posts one short notice via _send_unsolicited_message_to_user() on the primary platform
For resolved issues, posts an "✓ resolved: …" notice once Total: ~150 LoC plus a small toggle in config (startup_health.enabled: true default). If desired I can prepare a PR — please confirm whether the team wants this surface (config-on by default, opt-out) or something more conservative (opt-in flag, no platform pings, log-only).

Environment

Hermes Agent v0.12.0 (2026.4.30) — known to be 1200+ commits behind main; checked the changelog and didn't see this specific behavior addressed in newer commits, but happy to be pointed at one if it exists.
Python 3.13, OpenAI SDK 2.33.0
Telegram + Ollama backend

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Doctor is passive — silent setup drift goes unnoticed for weeks [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Concrete real-world example (today)

Expected behaviour

Why this matters

Suggested implementation sketch

Environment

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Doctor is passive — silent setup drift goes unnoticed for weeks [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Summary

Concrete real-world example (today)

Expected behaviour

Why this matters

Suggested implementation sketch

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING