hermes - 💡(How to fix) Fix [Bug]: /restart does not relaunch the gateway under macOS launchd [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Additional Logs / Traceback (optional)

Root Cause

In gateway/run.py (~line 9720), the gateway decides between two restart strategies:

_under_service = bool(os.environ.get("INVOCATION_ID"))  # systemd sets this
_in_container = os.path.exists("/.dockerenv") or os.path.exists("/run/.containerenv")
if _under_service or _in_container:
    self.request_restart(detached=False, via_service=True)
else:
    self.request_restart(detached=True, via_service=False)

INVOCATION_ID is set only by systemd. macOS launchd uses a different convention — it injects XPC_SERVICE_NAME and XPC_FLAGS into the environment of managed jobs but does not set INVOCATION_ID.

So under launchd, _under_service is False, the code takes the detached-subprocess branch, and request_restart(via_service=False) flows through to the exit path:

# gateway/run.py ~line 18162
if runner._restart_via_service:
    raise SystemExit(75)
return True

Because _restart_via_service=False, the SystemExit(75) branch is skipped, the function returns Truesys.exit(0). launchd's KeepAlive { SuccessfulExit: false } policy then refuses to relaunch a "successful" exit.

The detached-subprocess fallback (the branch the code does take) doesn't actually start a replacement process under launchd either, because launchd reparents the spawned subprocess and tears it down when the parent exits — same mechanism the _under_service block already documents for systemd KillMode=mixed.

Fix Action

Fixed

Code Example

{"tag": "asyncio.run.SystemExit", "code": 75}
{"tag": "gateway.start", "pid": <new>}

---

_under_service = bool(os.environ.get("INVOCATION_ID"))  # systemd sets this
_in_container = os.path.exists("/.dockerenv") or os.path.exists("/run/.containerenv")
if _under_service or _in_container:
    self.request_restart(detached=False, via_service=True)
else:
    self.request_restart(detached=True, via_service=False)

---

# gateway/run.py ~line 18162
if runner._restart_via_service:
    raise SystemExit(75)
return True

---

_under_service = bool(
    os.environ.get("INVOCATION_ID")        # systemd (Linux) sets this
    or os.environ.get("XPC_SERVICE_NAME")  # launchd (macOS) sets this
)
RAW_BUFFERClick to expand / collapse

Bug Description

On macOS, the gateway's /restart command (and any other code path that asks the gateway to relaunch via the service manager) does not actually trigger a launchd-driven restart. The gateway exits with code 0, launchd's KeepAlive { SuccessfulExit: false } policy treats that as "stopped successfully", and the gateway stays down until the user manually re-bootstraps it.

Same code path works correctly on Linux/systemd.

Steps to Reproduce

  1. Install the gateway as a launchd service (the standard macOS deployment via hermes gateway install).
  2. Confirm it's running: launchctl list ai.hermes.gateway shows a PID.
  3. Send /restart to the bot (or trigger any code path that calls _handle_restart_command).
  4. The gateway gracefully drains and exits.
  5. Wait — and observe that launchd does not relaunch it. launchctl list ai.hermes.gateway still references the previous (now-dead) PID and no new process spawns. Telegram / Discord / Feishu adapters all stay disconnected.

Expected vs Actual

Expected: After /restart, the gateway exits and launchd brings it right back up — same behaviour as systemd on Linux.

Actual: The gateway exits cleanly (code 0) and stays down. launchctl list shows the stale PID and no relaunch happens.

Operating System

macOS 15.4 (Darwin 25.4.0)

Python Version

3.11.14

Hermes Version

Working off main (HEAD 6a6766fb8).

Additional Logs / Traceback (optional)

~/.hermes/logs/gateway-exit-diag.log for a working systemd-style restart shows:

{"tag": "asyncio.run.SystemExit", "code": 75}
{"tag": "gateway.start", "pid": <new>}

For the failing macOS launchd /restart, the SystemExit-75 line is missing entirely — the gateway falls through to return Truesys.exit(0), and the next gateway.start entry only shows up much later when the user manually runs launchctl kickstart -k.

Root Cause Analysis

In gateway/run.py (~line 9720), the gateway decides between two restart strategies:

_under_service = bool(os.environ.get("INVOCATION_ID"))  # systemd sets this
_in_container = os.path.exists("/.dockerenv") or os.path.exists("/run/.containerenv")
if _under_service or _in_container:
    self.request_restart(detached=False, via_service=True)
else:
    self.request_restart(detached=True, via_service=False)

INVOCATION_ID is set only by systemd. macOS launchd uses a different convention — it injects XPC_SERVICE_NAME and XPC_FLAGS into the environment of managed jobs but does not set INVOCATION_ID.

So under launchd, _under_service is False, the code takes the detached-subprocess branch, and request_restart(via_service=False) flows through to the exit path:

# gateway/run.py ~line 18162
if runner._restart_via_service:
    raise SystemExit(75)
return True

Because _restart_via_service=False, the SystemExit(75) branch is skipped, the function returns Truesys.exit(0). launchd's KeepAlive { SuccessfulExit: false } policy then refuses to relaunch a "successful" exit.

The detached-subprocess fallback (the branch the code does take) doesn't actually start a replacement process under launchd either, because launchd reparents the spawned subprocess and tears it down when the parent exits — same mechanism the _under_service block already documents for systemd KillMode=mixed.

Proposed Fix

Extend the probe to recognise launchd:

_under_service = bool(
    os.environ.get("INVOCATION_ID")        # systemd (Linux) sets this
    or os.environ.get("XPC_SERVICE_NAME")  # launchd (macOS) sets this
)

XPC_SERVICE_NAME is set by launchd for every managed job (LimitLoadToSessionType does not affect this). I've verified it is present in the live gateway process on macOS 15.4. The variable is launchd-specific so it can't false-positive on a Linux box.

PR with the fix and a regression test: see linked PR.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING