hermes - 💡(How to fix) Fix macOS launchd: hermes gateway restart never invokes _graceful_restart_via_sigusr1, always takes non-drain SIGTERM path from fresh shells [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On macOS (launchd), hermes gateway restart invoked from a fresh shell never exercises the drain-aware SIGUSR1 path, even though that infrastructure exists in the same module and is used by both systemd_restart and the hermes update flow on macOS. In-flight agent runs are not drained; they receive SIGTERM and the gateway exits without the cleanup tail.

Repository state used for analysis: 973f27e (v0.14.0).

Error Message

Six hermes gateway restart invocations on 2026-05-17 (config edits to ~/.hermes/profiles/zeke/config.yaml) each produced the SIGTERM shutdown stanza in gateway.log:

Root Cause

systemd unconditionally takes the drain-aware path because it calls _graceful_restart_via_sigusr1 directly. No ancestor check.

Fix Action

Fixed

Code Example

def launchd_restart():
    ...
    pid = get_running_pid()
    if pid is not None and _request_gateway_self_restart(pid):
        print("✓ Service restart requested")
        return
    if pid is not None:
        try:
            terminate_pid(pid, force=False)  # SIGTERM, no drain wait
        ...
        if pid is not None:
            exited = _wait_for_gateway_exit(timeout=drain_timeout, force_after=None)
            ...
    subprocess.run(["launchctl", "kickstart", "-k", target], ...)

---

print(f"⏳ {scope_label} service restarting gracefully (PID {pid})...")
if _graceful_restart_via_sigusr1(pid, drain_timeout + 5):
    ... # reset-failed + start

---

WARNING gateway.run: Shutdown context: signal=SIGTERM under_systemd=yes parent_pid=1
                     parent_name=? loadavg_1m=2.84 parent_cmdline='(unknown)'

---

-        if pid is not None and _request_gateway_self_restart(pid):
-            print("✓ Service restart requested")
-            return
+        if pid is not None and _graceful_restart_via_sigusr1(pid, drain_timeout + 5):
+            subprocess.run(["launchctl", "kickstart", target], check=False, timeout=30)
+            print("✓ Service restart requested (drained)")
+            return
RAW_BUFFERClick to expand / collapse

Context

On macOS (launchd), hermes gateway restart invoked from a fresh shell never exercises the drain-aware SIGUSR1 path, even though that infrastructure exists in the same module and is used by both systemd_restart and the hermes update flow on macOS. In-flight agent runs are not drained; they receive SIGTERM and the gateway exits without the cleanup tail.

Repository state used for analysis: 973f27e (v0.14.0).

Code path

hermes_cli/gateway.py defines two SIGUSR1 helpers with different scopes:

  • _request_gateway_self_restart(pid) at L190: guarded by _is_pid_ancestor_of_current_process(pid) (L194). Intended for a gateway descendant asking its own ancestor to restart.
  • _graceful_restart_via_sigusr1(pid, drain_timeout) at L203: no ancestor check. Sends SIGUSR1 and waits up to drain_timeout for the gateway to exit code 75. This is the unconditional drain-aware helper.

launchd_restart() at L3013 calls only the ancestor-guarded variant:

def launchd_restart():
    ...
    pid = get_running_pid()
    if pid is not None and _request_gateway_self_restart(pid):
        print("✓ Service restart requested")
        return
    if pid is not None:
        try:
            terminate_pid(pid, force=False)  # SIGTERM, no drain wait
        ...
        if pid is not None:
            exited = _wait_for_gateway_exit(timeout=drain_timeout, force_after=None)
            ...
    subprocess.run(["launchctl", "kickstart", "-k", target], ...)

When hermes gateway restart is run from a normal shell, the calling Python process is not a descendant of the gateway, so _is_pid_ancestor_of_current_process returns False at L194, _request_gateway_self_restart returns False at L195, and execution falls straight through to terminate_pid(pid, force=False) (SIGTERM). No SIGUSR1 is ever sent; no drain handler runs.

Contrast systemd_restart() at L2558:

print(f"⏳ {scope_label} service restarting gracefully (PID {pid})...")
if _graceful_restart_via_sigusr1(pid, drain_timeout + 5):
    ... # reset-failed + start

systemd unconditionally takes the drain-aware path because it calls _graceful_restart_via_sigusr1 directly. No ancestor check.

The hermes update flow in hermes_cli/main.py (L8493, L8704) also calls _graceful_restart_via_sigusr1 directly for both service-managed and manual gateways on macOS — proving the drain works on launchd; it just isn't wired into the gateway restart command.

Observed behavior

Six hermes gateway restart invocations on 2026-05-17 (config edits to ~/.hermes/profiles/zeke/config.yaml) each produced the SIGTERM shutdown stanza in gateway.log:

WARNING gateway.run: Shutdown context: signal=SIGTERM under_systemd=yes parent_pid=1
                     parent_name=? loadavg_1m=2.84 parent_cmdline='(unknown)'

(parent_pid=1, parent_name=?, parent_cmdline='(unknown)' is the launching hermes CLI having already exited and been reparented to launchd before the dying gateway logs its shutdown context — misleading but not the bug.)

No Stopping gateway for restart... notification / Sent shutdown notification to active chat line appears in the same logs, because the SIGUSR1 → drain handler never ran.

Expected behavior

hermes gateway restart on launchd should match systemd's behavior: send SIGUSR1, wait for graceful drain (with the standard drain_timeout + slack), and only fall back to SIGTERM / launchctl kickstart -k if the drain doesn't complete.

Suggested fix

Replace the ancestor-guarded call in launchd_restart() with the unconditional helper, mirroring systemd_restart:

-        if pid is not None and _request_gateway_self_restart(pid):
-            print("✓ Service restart requested")
-            return
+        if pid is not None and _graceful_restart_via_sigusr1(pid, drain_timeout + 5):
+            subprocess.run(["launchctl", "kickstart", target], check=False, timeout=30)
+            print("✓ Service restart requested (drained)")
+            return

(The kickstart without -k re-spawns the LaunchAgent after the graceful exit-75, equivalent to systemd's reset-failed + start shortcut at L8528. Without it, KeepAlive.SuccessfulExit=false will eventually restart it, but only after launchd's throttle interval.)

This interacts with #11932 / PR #24993 (which adds kickstart after SIGUSR1 for the in-process self-restart case) and #25966 (drain timeout race on the SIGTERM fallback path). The fix here is upstream of both: in the fresh-shell case, the SIGUSR1 path never fires today, so neither the post-SIGUSR1 relaunch nor the SIGTERM-fallback timing matter.

Environment

  • macOS 13.x (Darwin 22.6.0)
  • hermes-agent v0.14.0 @ 973f27e
  • LaunchAgent ai.hermes.gateway-zeke, KeepAlive={SuccessfulExit=false}
  • agent.restart_drain_timeout: default

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

hermes gateway restart on launchd should match systemd's behavior: send SIGUSR1, wait for graceful drain (with the standard drain_timeout + slack), and only fall back to SIGTERM / launchctl kickstart -k if the drain doesn't complete.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING