hermes - 💡(How to fix) Fix Feature: `hermes doctor` — one-command system health diagnostics

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

| #28126 | sync_skills() crashes on external_dirs collision — only visible in traceback |

Root Cause

  1. Reduces issue triage cost — maintainers can ask for hermes doctor output instead of guessing
  2. Empowers self-service — users catch 80% of setup problems before filing issues
  3. Single source of truth — replaces scattered hermes config check, manual log inspection, and "is my setup correct?" questions in Discord
  4. Extensible — plugins can register their own health checks via entry point

Code Example

$ hermes doctor
Config        — valid, 3 profiles loaded
Credentials2/2 providers reachable (zhipu, openrouter)
Plugins5/6 loaded (clitoolkit: pandoc binary not found in PATH)
Skills2 name conflicts in external_dirs
Logging       — gateway.log missing component prefix "hermes_plugins"
ConnectivityFeishu WS connected, API server reachable on :8080
Database      — state.db healthy (47 sessions, 2.1 MB)
Cron4/4 jobs healthy, last run 12 min ago
Disk          — logs/ 340 MB (within limit)

2 warnings, 1 issue found. Run `hermes doctor --fix` for auto-remediation.

---

$ hermes doctor --json | jq ".checks.plugins"
{
  "status": "warning",
  "loaded": 5,
  "total": 6,
  "failures": [{"plugin": "clitoolkit", "reason": "pandoc binary not found"}]
}

---

# Pseudo-structure
CHECKS = [
    ("config",       check_config),
    ("credentials",  check_credentials),
    ("plugins",      check_plugins),
    ("skills",       check_skills),
    ("logging",      check_logging),
    ("connectivity", check_connectivity),
    ("database",     check_database),
    ("cron",         check_cron),
    ("disk",         check_disk),
]

def run_doctor(fix: bool = False, json_output: bool = False):
    results = [(name, fn()) for name, fn in CHECKS]
    # render table / JSON
RAW_BUFFERClick to expand / collapse

Problem

A significant portion of "it doesn't work" issues stem from silent failures that are invisible at the default log level. In the past week alone:

IssueSilent Failure
#28137discover_plugins() exceptions swallowed by logger.debug (invisible at INFO)
#28138Plugin log lines filtered out of gateway.log by prefix mismatch
#27931CI tests CANCELLED across all PRs — no local way to detect environment drift
#28126sync_skills() crashes on external_dirs collision — only visible in traceback

Users currently need to:

  1. Know which log file to check (agent.log? gateway.log? errors.log?)
  2. Run at DEBUG level (noisy, expensive)
  3. Interpret stack traces to find the root cause

This creates friction for both users (can't self-diagnose) and maintainers (issues lack diagnostic context).

Proposal

Add a hermes doctor CLI command that runs a suite of health checks and outputs a clear, actionable report:

$ hermes doctor
✓ Config        — valid, 3 profiles loaded
✓ Credentials   — 2/2 providers reachable (zhipu, openrouter)
⚠ Plugins       — 5/6 loaded (clitoolkit: pandoc binary not found in PATH)
⚠ Skills        — 2 name conflicts in external_dirs
✗ Logging       — gateway.log missing component prefix "hermes_plugins"
✓ Connectivity  — Feishu WS connected, API server reachable on :8080
✓ Database      — state.db healthy (47 sessions, 2.1 MB)
✓ Cron          — 4/4 jobs healthy, last run 12 min ago
✓ Disk          — logs/ 340 MB (within limit)

2 warnings, 1 issue found. Run `hermes doctor --fix` for auto-remediation.

Check Categories

CategoryWhat it checks
ConfigYAML valid, required fields present, profile paths exist
CredentialsAPI keys non-empty, providers reachable (lightweight ping)
PluginsAll discovered plugins loaded, failed ones show ImportError
Skillssync_skills dry-run, external_dirs conflicts, orphan symlinks
LoggingComponent prefixes match loaded plugins, log rotation configured
ConnectivityGateway WS connected, API server health endpoint
Databasestate.db integrity check, session count, size
CronScheduler running, jobs not stuck, last run within expected window
DiskLog directory size, temp files, old session cleanup
DependenciesRequired binaries in PATH (git, pandoc, etc.)

--fix flag (optional, Phase 2)

Auto-remediate where safe:

  • Add missing component prefixes to logging config
  • Clean stale session files
  • Re-link orphan skill symlinks
  • Reset stuck cron jobs

--json flag

Machine-readable output for programmatic consumption:

$ hermes doctor --json | jq ".checks.plugins"
{
  "status": "warning",
  "loaded": 5,
  "total": 6,
  "failures": [{"plugin": "clitoolkit", "reason": "pandoc binary not found"}]
}

Implementation Sketch

The command would live in hermes_cli/commands/doctor.py as a new CLI subcommand. Each check is an independent function returning {status, message, details} — easy to extend and test.

# Pseudo-structure
CHECKS = [
    ("config",       check_config),
    ("credentials",  check_credentials),
    ("plugins",      check_plugins),
    ("skills",       check_skills),
    ("logging",      check_logging),
    ("connectivity", check_connectivity),
    ("database",     check_database),
    ("cron",         check_cron),
    ("disk",         check_disk),
]

def run_doctor(fix: bool = False, json_output: bool = False):
    results = [(name, fn()) for name, fn in CHECKS]
    # render table / JSON

Why This Matters

  1. Reduces issue triage cost — maintainers can ask for hermes doctor output instead of guessing
  2. Empowers self-service — users catch 80% of setup problems before filing issues
  3. Single source of truth — replaces scattered hermes config check, manual log inspection, and "is my setup correct?" questions in Discord
  4. Extensible — plugins can register their own health checks via entry point

Related

  • #27931 (CI environment drift — no local detection)
  • #28137 (discover_plugins silent failure)
  • #28138 (plugin log prefix mismatch)
  • #28126 (sync_skills collision)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Feature: `hermes doctor` — one-command system health diagnostics