hermes - 💡(How to fix) Fix [Bug]: gateway restart self-kills: KillMode=mixed SIGKILLs the restart bash process

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Additional Logs / Traceback (optional)

Root Cause

Conducted multi-log correlation (gateway.log + agent.log + journalctl) across 78-minute outage window:
20:51:40 — restart triggered, drain begins
20:53:03 — systemctl stop called from bash child process inside gateway cgroup
20:53:38 — KillMode=mixed sends SIGKILL to entire cgroup, killing bash (PID 31774) before systemctl start executes
20:53:38 ~ 22:12:09 — gateway dead, zero log output, QQ offline
22:12:09 — manual restart recovers
Confirmed root cause: gateway/run.py restart logic runs the stop→start bash sequence inside the service cgroup, making it vulnerable to systemd's cgroup-wide SIGKILL during shutdown.

Code Example

Conducted multi-log correlation (gateway.log + agent.log + journalctl) across 78-minute outage window:
20:51:40 — restart triggered, drain begins
20:53:03 — systemctl stop called from bash child process inside gateway cgroup
20:53:38KillMode=mixed sends SIGKILL to entire cgroup, killing bash (PID 31774) before systemctl start executes
20:53:38 ~ 22:12:09 — gateway dead, zero log output, QQ offline
22:12:09 — manual restart recovers
Confirmed root cause: gateway/run.py restart logic runs the stop→start bash sequence inside the service cgroup, making it vulnerable to systemd's cgroup-wide SIGKILL during shutdown.

---

May 08 20:53:38 CHINAMI-7PRNFOR systemd[207]\: hermes-gateway.service: Killing process 31774 (bash) with signal SIGKILL.
May 08 20:53:38 CHINAMI-7PRNFOR systemd[207]\: hermes-gateway.service: Failed with result 'signal'.
Process 31774 is the bash child spawned by gateway/run.py to execute systemctl stop && sleep 3 && systemctl start. Killed mid-sequence before start.
RAW_BUFFERClick to expand / collapse

Bug Description

hermes gateway restart spawns a bash child systemctl stop && sleep 3 && systemctl start. This bash runs inside the gateway's cgroup. Systemd's KillMode=mixed sends SIGKILL to the entire cgroup after TimeoutStopSec, killing the bash before systemctl start executes. Gateway stops but never restarts.

Steps to Reproduce

1.hermes gateway restart 2.Wait for drain timeout 3.systemctl --user status hermes-gateway → dead 4.No auto-restart despite Restart=always

Expected Behavior

May 08 20:53:38 systemd[207]: hermes-gateway.service: Killing process 31774 (bash) with signal SIGKILL. May 08 20:53:38 systemd[207]: hermes-gateway.service: Failed with result 'signal'.

Actual Behavior

hermes gateway restart stops the gateway, which never starts again. qqbot disappears from dashboard. Restart=always doesn't help — systemctl stop is a manual stop, not a crash.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

WhatsApp

Debug Report

Conducted multi-log correlation (gateway.log + agent.log + journalctl) across 78-minute outage window:
20:51:40 — restart triggered, drain begins
20:53:03 — systemctl stop called from bash child process inside gateway cgroup
20:53:38 — KillMode=mixed sends SIGKILL to entire cgroup, killing bash (PID 31774) before systemctl start executes
20:53:38 ~ 22:12:09 — gateway dead, zero log output, QQ offline
22:12:09 — manual restart recovers
Confirmed root cause: gateway/run.py restart logic runs the stop→start bash sequence inside the service cgroup, making it vulnerable to systemd's cgroup-wide SIGKILL during shutdown.

Operating System

Ubuntu 24.04.4 LTS (WSL2)

Python Version

3.11.15

Hermes Version

v0.12.0 (2026.4.30)

Additional Logs / Traceback (optional)

May 08 20:53:38 CHINAMI-7PRNFOR systemd[207]\: hermes-gateway.service: Killing process 31774 (bash) with signal SIGKILL.
May 08 20:53:38 CHINAMI-7PRNFOR systemd[207]\: hermes-gateway.service: Failed with result 'signal'.
Process 31774 is the bash child spawned by gateway/run.py to execute systemctl stop && sleep 3 && systemctl start. Killed mid-sequence before start.

Root Cause Analysis (optional)

gateway/run.py executes the restart sequence via bash -c "systemctl stop && sleep 3 && systemctl start". This bash process inherits the gateway's cgroup. When shutdown exceeds TimeoutStopSec, KillMode=mixed sends SIGKILL to the entire cgroup — including the bash that was supposed to run systemctl start. Restart=always fails because systemctl stop is a deliberate stop, not a crash.

Proposed Fix (optional)

Replace the inline stop→start bash sequence with systemctl --user restart hermes-gateway. systemd manages restart outside the service cgroup, immune to self-kill.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING