hermes - 💡(How to fix) Fix Dashboard gateway status shows "已停止" when gateway is actually running [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

The issue is a PID file path mismatch combined with a state override logic bug.

Fix Action

Fixed

Code Example

# Line 531-532: Local PID check
gateway_pid = get_running_pid()        # Reads /opt/data/gateway.pid -> None (file missing)
gateway_running = gateway_pid is not None  # False

# Line 535-541: Remote health probe (only if GATEWAY_HEALTH_URL configured)
if not gateway_running and _GATEWAY_HEALTH_URL:
    alive, remote_health_body = _probe_gateway_health(...)
    # Skipped if no GATEWAY_HEALTH_URL

# Line 563: Read runtime status
runtime = read_runtime_status()  # Reads /opt/data/gateway_state.json -> {gateway_state: "running"}

# Line 578-579: THE BUG
if not gateway_running:  # True (PID check failed)
    gateway_state = gateway_state if gateway_state in ("stopped", "startup_failed") else "stopped"
    # "running" is NOT in ("stopped", "startup_failed") -> forced to "stopped"

---

# Current (buggy):
if not gateway_running:
    gateway_state = gateway_state if gateway_state in ("stopped", "startup_failed") else "stopped"

# Suggested: Only override if the runtime status is also stale or absent
if not gateway_running and (runtime is None or runtime.get("gateway_state") in (None, "stopped")):
    gateway_state = "stopped"
RAW_BUFFERClick to expand / collapse

Bug Description

The Hermes dashboard shows "网关状态:已停止" (Gateway Status: Stopped) when the gateway is actually running normally with all platforms connected. After restarting the gateway via the dashboard, the status displays correctly.

Root Cause Analysis

The issue is a PID file path mismatch combined with a state override logic bug.

Environment

  • Deployment: Docker on QNAP NAS
  • HERMES_HOME: /opt/data
  • Gateway process: PID 7 (running, all platforms connected)
  • Dashboard and gateway: Same container

Key Findings

  1. PID file does not exist at the expected path:

    • Dashboard reads from /opt/data/gateway.pid (determined by HERMES_HOME=/opt/data)
    • This file does not exist
    • A stale PID file exists at /opt/data/.hermes/gateway.pid with PID 16132 (dead process from a previous run)
  2. State file exists with correct state:

    • /opt/data/gateway_state.json contains {"pid": 7, "gateway_state": "running", ...} — correct
    • /opt/data/.hermes/gateway_state.json contains stale data — but the dashboard doesn't read this path
  3. Dashboard detection logic (web_server.py:get_status()):

# Line 531-532: Local PID check
gateway_pid = get_running_pid()        # Reads /opt/data/gateway.pid -> None (file missing)
gateway_running = gateway_pid is not None  # False

# Line 535-541: Remote health probe (only if GATEWAY_HEALTH_URL configured)
if not gateway_running and _GATEWAY_HEALTH_URL:
    alive, remote_health_body = _probe_gateway_health(...)
    # Skipped if no GATEWAY_HEALTH_URL

# Line 563: Read runtime status
runtime = read_runtime_status()  # Reads /opt/data/gateway_state.json -> {gateway_state: "running"}

# Line 578-579: THE BUG
if not gateway_running:  # True (PID check failed)
    gateway_state = gateway_state if gateway_state in ("stopped", "startup_failed") else "stopped"
    # "running" is NOT in ("stopped", "startup_failed") -> forced to "stopped"

The runtime status file explicitly says "running", but because the local PID check failed (missing PID file), the state is unconditionally overwritten to "stopped".

Why it works after dashboard restart

Restarting via the dashboard calls gateway restart, which writes a new PID file at the correct path (/opt/data/gateway.pid), synchronizing the state files.

Steps to Reproduce

  1. Start Hermes in Docker with HERMES_HOME=/opt/data
  2. Let the gateway run normally
  3. Open the dashboard — gateway status shows "已停止"
  4. Click "重启网关" in the dashboard
  5. Refresh — status now shows "运行中"

Expected Behavior

The dashboard should correctly detect the running gateway by:

  1. Falling back to the runtime status file when the PID file is missing
  2. Not overriding a valid "running" state from gateway_state.json
  3. Or ensuring the gateway process creates the PID file at HERMES_HOME/gateway.pid

Actual Behavior

Gateway status shows "已停止" until a manual restart synchronizes the PID file.

Suggested Fix

The logic at web_server.py:578-579 should be less aggressive about overriding state:

# Current (buggy):
if not gateway_running:
    gateway_state = gateway_state if gateway_state in ("stopped", "startup_failed") else "stopped"

# Suggested: Only override if the runtime status is also stale or absent
if not gateway_running and (runtime is None or runtime.get("gateway_state") in (None, "stopped")):
    gateway_state = "stopped"

Alternatively, ensure the gateway process always creates HERMES_HOME/gateway.pid on startup.

Environment

  • Version: v0.13.0 (2026.5.7)
  • Deployment: Docker on QNAP NAS
  • HERMES_HOME: /opt/data
  • Platforms: QQBot, WeChat, Yuanbao (all connected)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Dashboard gateway status shows "已停止" when gateway is actually running [1 pull requests]