hermes - ✅(Solved) Fix macOS: launchd_restart() returns early after SIGUSR1, leaving gateway permanently dead [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11932Fetched 2026-04-18 05:58:10
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
commented ×1cross-referenced ×1referenced ×1

On macOS, when hermes update is triggered from within the Gateway process tree (e.g., agent executing via terminal tool), launchd_restart() sends SIGUSR1 and returns immediately without waiting for the gateway to exit or issuing launchctl kickstart. The gateway exits with code 75, but launchd does not restart it, leaving the service permanently dead until manual intervention.

Root Cause

In hermes_cli/gateway.py, launchd_restart() has two code paths:

Path A (SIGUSR1): Triggered when the gateway PID is an ancestor of the current process. Sends SIGUSR1, prints "Service restart requested", then returns immediately — no wait for exit, no kickstart.

Path B (SIGTERM + kickstart): Triggered when the gateway PID is NOT an ancestor. Sends SIGTERM, waits for exit, then runs launchctl kickstart -k.

When Path A is taken, the gateway receives SIGUSR1 and begins a graceful shutdown (drain + exit code 75). However, since launchd_restart() already returned, nobody is responsible for restarting the service. macOS launchd does not automatically restart after exit(75) in this configuration — system logs show "pending spawn, domain in on-demand-only mode" with no follow-up WILL_SPAWN.

Fix Action

Fixed

PR fix notes

PR #11934: fix(gateway): remove early return in launchd_restart() for SIGUSR1 path

Description (problem / solution / changelog)

Summary

Fixes #11932

One-line fix: remove the early return after SIGUSR1 in launchd_restart() and change the second if pid is not None to elif, so both code paths converge on _wait_for_gateway_exit() + launchctl kickstart -k.

Before (buggy)

pid = get_running_pid()
if pid is not None and _request_gateway_self_restart(pid):  # SIGUSR1 path
    print("✓ Service restart requested")
    return                          # ← returns immediately, no kickstart
if pid is not None:                 # SIGTERM path
    terminate_pid(pid, force=False)
    # ... wait + kickstart
subprocess.run(["launchctl", "kickstart", "-k", target])

When the gateway PID is an ancestor of the calling process, SIGUSR1 is sent and launchd_restart() returns immediately. The gateway exits with code 75, but nobody calls kickstart, so launchd does not restart it. The service stays dead until manual hermes gateway start.

After (fixed)

pid = get_running_pid()
if pid is not None and _request_gateway_self_restart(pid):  # SIGUSR1 path
    print("✓ Service restart requested")                   # no return — falls through
elif pid is not None:                 # SIGTERM path (only if SIGUSR1 was NOT sent)
    terminate_pid(pid, force=False)
    # ... wait
subprocess.run(["launchctl", "kickstart", "-k", target])    # always executed

Both paths now reach kickstart -k, ensuring the gateway is always restarted.

Impact

  • SIGUSR1 path (ancestor process): Now correctly waits for exit and kickstarts. Previously broken — gateway died permanently.
  • SIGTERM path (non-ancestor): Unchanged behavior — already worked correctly.
  • Normal /update command: Unaffected — uses setsid to detach from gateway process tree, takes SIGTERM path.
  • Linux: Unaffected — uses systemd_restart(), separate code path.

Testing

  • Syntax verified via py_compile.
  • Local deployment tested: hermes update via agent terminal tool now correctly recovers gateway.
  • Existing tests unaffected (launchd is macOS-only, not covered by CI on Linux).

Related

  • Issue: #11932
  • Similar fix for Linux (systemd path): PR #9850

Changed files

  • hermes_cli/gateway.py (modified, +1/-2)
RAW_BUFFERClick to expand / collapse

Summary

On macOS, when hermes update is triggered from within the Gateway process tree (e.g., agent executing via terminal tool), launchd_restart() sends SIGUSR1 and returns immediately without waiting for the gateway to exit or issuing launchctl kickstart. The gateway exits with code 75, but launchd does not restart it, leaving the service permanently dead until manual intervention.

Root Cause

In hermes_cli/gateway.py, launchd_restart() has two code paths:

Path A (SIGUSR1): Triggered when the gateway PID is an ancestor of the current process. Sends SIGUSR1, prints "Service restart requested", then returns immediately — no wait for exit, no kickstart.

Path B (SIGTERM + kickstart): Triggered when the gateway PID is NOT an ancestor. Sends SIGTERM, waits for exit, then runs launchctl kickstart -k.

When Path A is taken, the gateway receives SIGUSR1 and begins a graceful shutdown (drain + exit code 75). However, since launchd_restart() already returned, nobody is responsible for restarting the service. macOS launchd does not automatically restart after exit(75) in this configuration — system logs show "pending spawn, domain in on-demand-only mode" with no follow-up WILL_SPAWN.

Reproduction

  1. Have Hermes gateway running on macOS with launchd.
  2. From a Telegram conversation, ask the agent to run hermes update directly via its terminal tool (NOT using the /update slash command).
  3. The agent process is a child of the gateway, so _is_pid_ancestor_of_current_process() returns True → Path A is taken.
  4. Gateway exits with code 75 → launchd does not restart → service stays dead.

Note: The normal /update command avoids this by spawning hermes update --gateway via setsid + start_new_session=True, which detaches from the gateway process tree and takes Path B. This bug only manifests when the update command runs inside the gateway process tree.

Contrast with Linux

PR #9850 (merged) fixed a similar issue for Linux by adding systemctl is-active health checks and retry logic after systemctl restart. The macOS launchd path was completely omitted from that fix.

Evidence

macOS system logs consistently show exit(75) followed by no restart:

  • "exited due to exit(75)"
  • "pending spawn, domain in on-demand-only mode: ai.hermes.gateway"
  • No WILL_SPAWN entry follows

In contrast, when the gateway is killed by an external signal (SIGTERM/SIGKILL from outside the process tree), launchd immediately issues WILL_SPAWN and the service recovers within seconds.

Suggested Fix

Remove the early return in Path A and let both paths converge on _wait_for_gateway_exit() + launchctl kickstart -k. This ensures the gateway is always restarted regardless of how the update was triggered.

extent analysis

TL;DR

The issue can be fixed by modifying the launchd_restart() function to remove the early return in Path A, ensuring that both code paths converge on _wait_for_gateway_exit() and launchctl kickstart -k to restart the gateway service.

Guidance

  • Review the hermes_cli/gateway.py file and locate the launchd_restart() function to understand the two code paths (Path A and Path B) and their differences.
  • Modify Path A to remove the early return after sending SIGUSR1, allowing the function to wait for the gateway exit and issue launchctl kickstart -k to restart the service.
  • Verify that the modified launchd_restart() function works as expected by reproducing the issue and checking the system logs for the WILL_SPAWN entry after the gateway exits.
  • Consider adding logging or debugging statements to monitor the execution of launchd_restart() and ensure that it takes the correct path based on the process tree.

Example

def launchd_restart():
    # ...
    if _is_pid_ancestor_of_current_process():
        # Path A: Send SIGUSR1 and wait for gateway exit
        os.kill(gateway_pid, signal.SIGUSR1)
        _wait_for_gateway_exit()
        subprocess.run(['launchctl', 'kickstart', '-k', 'ai.hermes.gateway'])
    else:
        # Path B: Send SIGTERM, wait for exit, and kickstart
        os.kill(gateway_pid, signal.SIGTERM)
        _wait_for_gateway_exit()
        subprocess.run(['launchctl', 'kickstart', '-k', 'ai.hermes.gateway'])

Notes

The suggested fix assumes that the issue is solely caused by the early return in Path A. However, additional testing and verification may be necessary to ensure that the modified launchd_restart() function works correctly in all scenarios.

Recommendation

Apply the workaround by modifying the launchd_restart() function to remove the early return in Path A, as this should ensure that the gateway service is always restarted regardless of how the update was triggered.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix macOS: launchd_restart() returns early after SIGUSR1, leaving gateway permanently dead [1 pull requests, 1 comments, 2 participants]