hermes - ✅(Solved) Fix [Bug]: `hermes cron list` falsely reports "Gateway is not running" on macOS (two-stage detection failure in find_gateway_pids) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#15225Fetched 2026-04-25 06:23:37
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×1

Error Message

Additional Logs / Traceback (optional)

"StandardErrorPath" = "/Users/<user>/.hermes/logs/gateway.error.log";

Root Cause

This issue is related to but not a duplicate of #9069 (same function, FreeBSD, different failure mode) and #9723 (same function, Docker, closed via commit c483b4c). The macOS case fails in a different way because both detection stages — launchctl and the ps fallback — fail for distinct reasons on Darwin.

Fix Action

Fixed

PR fix notes

PR #15318: fix(gateway): correct macOS gateway-pid detection (#15225)

Description (problem / solution / changelog)

What does this PR do?

Fixes `#15225`. `hermes cron list` falsely reports "Gateway is not running" on macOS even when launchd has the service loaded and cron jobs fire correctly.

Two independent bugs in `hermes_cli/gateway.py` each drop the macOS PID-detection path into an empty-result state. Fixing either alone clears the warning; both are real and both are worth fixing — the detector is the source for several other views (`cron list` warning, secondary `gateway status` check, `hermes update`'s broad sweep) and each should be resilient.

Bug 1 — `_get_service_pids()` mis-parses `launchctl list <label>`

macOS `launchctl list` has two output formats:

InvocationFormat
`launchctl list` (no label)tab-separated table: `PID\tStatus\tLabel`
`launchctl list <label>`plist-dict dump: `"PID" = 855;`, `"Label" = "ai.hermes.gateway";`, …

The old code always called `launchctl list <label>` but parsed with `string.split()` expecting the tab-separated format. On a real macOS install that means `parts[2]` on the label line is `'"ai.hermes.gateway";'` (quoted, semicolon'd) — so `parts[2] == label` never matched and no PID was ever extracted, even though the service was actively running.

Fix: extracted a `_parse_launchd_list_output(stdout, label)` helper that tries the plist-dict `"PID" = N;` regex first and falls back to the tab-separated path when no plist matches are found. Handles both formats so a future change to the caller can't silently re-break detection.

  • Regex anchored to `"PID"` key — sibling fields like `LastExitStatus` can't match.
  • PID 0 rejected — downstream `os.kill(0, …)` would affect the whole process group.
  • Whitespace-tolerant (`"PID" =`, `"PID"=`, `"PID" = `, leading tabs all match) — launchd's plist dumper is not a stable format across Apple releases.

Bug 2 — `_scan_gateway_pids()` passes `eww` to `ps`

Old: `["ps", "-A", "eww", "-o", "pid=,command="]`

  • Darwin: rejects `eww` as "illegal argument", exits 1. Empty `stdout`, no PIDs extracted (#15225).
  • FreeBSD: accepts `eww` but the embedded `e` attaches environment variables to the command column. `split(None, 1)` then picks up the first env var as the command (#9069). Env vars can include API keys — leaking them into any log line that echoes the command.

Fix: replaced with `["ps", "-A", "-ww", "-o", "pid=,command="]` — portable across Linux (procps), Darwin, FreeBSD, busybox. Drops env-var leakage as a side benefit.

Related Issue

Fixes #15225

Related: #9069 (same function, different FreeBSD failure mode), #9723 (same function, Docker, fixed via `c483b4c` for that narrower case).

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Test plan

  • 19 new tests in `tests/hermes_cli/test_gateway_pid_detection_macos.py` — all green on py3.11 venv
  • 1 pre-existing test in `test_gateway.py` updated to match the portable `-ww` invocation (explicit `# Note: since #15225` comment for future readers)
  • Full `tests/hermes_cli/test_gateway.py` suite still green (21 tests)
  • Verified regression guards: temporarily reverted Bug 1 and Bug 2 independently; the relevant test classes correctly failed with clear messages pointing at the regressed invariant. Restored fix → all 39 tests green.

Test coverage detail

`TestParseLaunchdListOutput` (8 cases) — exercises the new helper directly:

  • plist-dict output from the real #15225 repro → extracts PID 855
  • tab-separated `launchctl list` (no label) format → extracts correct PID, skips other services
  • unloaded-service `-\t0\tlabel` row → no PID extracted (no `int("-")` crash)
  • empty stdout → empty set (no crash)
  • defensive: both formats concatenated → plist-dict wins (label-scoped is more trustworthy)
  • PID 0 rejected
  • other-label row in tab-separated output → filtered out
  • 4 whitespace variants of the plist PID key

`TestGetServicePidsMacOS` (3 cases) — end-to-end macOS branch with mocked `subprocess.run`:

  • returns `{855}` for a running service (the #15225 repro state)
  • nonzero launchctl exit → empty set
  • missing `launchctl` binary → empty set (no `FileNotFoundError` propagation)

`TestPsInvocationPortability` (4 cases) — captures exact argv:

  • `"eww"` never on the command line (the core regression guard)
  • pins `["ps", "-A", "-ww", "-o", "pid=,command="]` shape
  • parses realistic Darwin-style ps output end-to-end, extracts gateway PID
  • nonzero returncode path returns `[]` without crashing

Not in scope

  • A broader refactor to pass the launchctl plist through a proper plist parser rather than a regex — the regex is narrow, anchored, and tolerant; bigger tooling is unnecessary for two well-defined keys.
  • Fixing the `gateway status` secondary check that shares this helper — it already reports correctly via a different code path (`launchctl list` in `gateway/status.py`). The detector fix clears the warning everywhere as a side effect.

Changed files

  • hermes_cli/gateway.py (modified, +75/-13)
  • tests/hermes_cli/test_gateway.py (modified, +3/-1)
  • tests/hermes_cli/test_gateway_pid_detection_macos.py (added, +314/-0)

Code Example

Not attached in this report. I can share `hermes debug share` output on request. The reproduction above and the root-cause analysis below should be sufficient to identify the bug from source.

---
RAW_BUFFERClick to expand / collapse

Bug Description

On macOS, hermes cron list always prints a false-positive warning:

⚠  Gateway is not running — jobs won't fire automatically.
   Start it with: hermes gateway install

even when the gateway is actively running under launchd and the cron jobs themselves continue to fire correctly.

The warning is purely a display bug in find_gateway_pids(). It has no effect on cron firing (a separate code path under gateway/run.py drives job execution), but it is misleading and may cause users to run hermes gateway install unnecessarily (which is a no-op when the plist is already current).

This issue is related to but not a duplicate of #9069 (same function, FreeBSD, different failure mode) and #9723 (same function, Docker, closed via commit c483b4c). The macOS case fails in a different way because both detection stages — launchctl and the ps fallback — fail for distinct reasons on Darwin.

Steps to Reproduce

  1. Install hermes-agent on macOS and configure a cron job (e.g. github-release-watch via the gateway).

  2. Start the gateway service: hermes gateway install (the plist at ~/Library/LaunchAgents/ai.hermes.gateway.plist is created and loaded; RunAtLoad=true so it starts automatically).

  3. Verify the gateway is actually running:

    $ launchctl list | grep hermes 855 0 ai.hermes.gateway $ hermes gateway status ✓ Service definition matches the current Hermes install ✓ Gateway service is loaded ... "PID" = 855;

  4. Run hermes cron list.

  5. Observe the warning appended to the job list:

    ⚠ Gateway is not running — jobs won't fire automatically.

Expected Behavior

No warning should appear when the gateway is actually running under launchd. hermes gateway status already correctly reports the service as loaded; hermes cron list's detector should agree.

Actual Behavior

The warning appears on every invocation of hermes cron list (and in other sites that call find_gateway_pids()), despite the gateway being confirmed running via launchctl list, ps -p <pid>, and a healthy gateway_state.json showing "gateway_state": "running" with a connected Telegram platform.

Cron firing itself is unaffected — jobs continue to fire on schedule and deliveries arrive via Telegram. Only the display is wrong.

Affected Component

CLI (interactive chat), Gateway (Telegram/Discord/Slack/WhatsApp)

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

Not attached in this report. I can share `hermes debug share` output on request. The reproduction above and the root-cause analysis below should be sufficient to identify the bug from source.

Operating System

macOS 15.2 (Darwin 25.4.0, Apple Silicon)

Python Version

3.11.15

Hermes Version

v0.10.0 (2026.4.16)

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

Two independent bugs cause find_gateway_pids() to return an empty list on macOS. Either bug alone would cause the macOS detection to fall through to the other stage, but because both stages are broken, the overall result is an empty list and the warning fires.

Bug 1: _get_service_pids() parses launchctl list <label> as tab-separated format

hermes_cli/gateway.py (around lines 82–102, in the is_macos() branch):

if is_macos():
    try:
        label = get_launchd_label()
        result = subprocess.run(
            ["launchctl", "list", label],
            capture_output=True, text=True, timeout=5,
        )
        if result.returncode == 0:
            # Output: "PID\tStatus\tLabel" header, then one data line
            for line in result.stdout.strip().splitlines():
                parts = line.split()
                if len(parts) >= 3 and parts[2] == label:
                    try:
                        pid = int(parts[0])
                        if pid > 0:
                            pids.add(pid)
                    except ValueError:
                        pass
    except (FileNotFoundError, subprocess.TimeoutExpired):
        pass

The inline comment says the expected format is "PID\tStatus\tLabel". That format is what launchctl list (with no label argument) produces. When called with a label argument, macOS launchctl instead returns a plist-dict:

$ launchctl list ai.hermes.gateway
{
    "StandardOutPath" = "/Users/<user>/.hermes/logs/gateway.log";
    "LimitLoadToSessionType" = "Aqua";
    "StandardErrorPath" = "/Users/<user>/.hermes/logs/gateway.error.log";
    "Label" = "ai.hermes.gateway";
    "OnDemand" = true;
    "LastExitStatus" = 0;
    "PID" = 855;
    "Program" = "/Users/<user>/.hermes/hermes-agent/venv/bin/python";
    ...
};
$ echo "exit=$?"
exit=0

line.split() on the "Label" = "ai.hermes.gateway"; line yields ['"Label"', '=', '"ai.hermes.gateway";']. parts[2] is '"ai.hermes.gateway";' (with quotes and trailing semicolon), which never equals the bare label string. No PID is ever extracted and _get_service_pids() returns an empty set.

Bug 2: ps -A eww -o pid=,command= is an illegal argument on macOS

hermes_cli/gateway.py (around lines 224–229, in the find_gateway_pids() non-Windows fallback):

result = subprocess.run(
    ["ps", "-A", "eww", "-o", "pid=,command="],
    capture_output=True, text=True, timeout=10,
)

Darwin's ps rejects eww as a positional argument:

$ ps -A eww -o pid=,command=
ps: illegal argument: eww
usage: ps [-AaCcEefhjlMmrSTvwXx] ...

$ echo "exit=$?"
exit=1

stdout is empty, the parse loop iterates nothing, and no PID is extracted.

This is related to but distinct from #9069 (FreeBSD: the command runs but the output contains an environment-variable prefix that breaks split(None, 1)) and #9723 (Docker with procps-ng: already addressed by c483b4c, but that fix does not touch the macOS code path).

Combined effect

StageExpected result on macOSActual result
_get_service_pids() via launchctl{855}set() (Bug 1)
find_gateway_pids() ps fallbackPIDs matching gateway patterns[] (Bug 2)
Final find_gateway_pids()[855][]
cron list warning checkno warningwarning printed

The same empty list propagates to other call sites of find_gateway_pids() (the cron list warning, the gateway status secondary check, and anywhere else that uses the helper to decide whether to skip further work).

Proposed Fix (optional)

I'm reporting this as an issue only — I don't plan to submit a PR for this one. Suggestions for whoever does:

For Bug 1 (_get_service_pids() launchctl parsing):

  • Either drop the label argument and run launchctl list (no args), which returns the documented PID\tStatus\tLabel tab-separated format that the current code already expects; then filter by parts[2] == label.
  • Or keep passing label but parse the plist-dict output properly — look for "PID" = NNN; lines with a simple regex, e.g. re.search(r'"PID"\s*=\s*(\d+)\s*;', result.stdout).
  • The first option is smaller and keeps the existing loop structure; the second is more robust if launchctl behavior diverges further.

For Bug 2 (ps fallback arguments):

  • ["ps", "-A", "-ww", "-o", "pid=,command="] works on macOS and still produces the wide, unabridged command column. Dropping the e (which attaches env vars) is arguably desirable anyway — see #9069 where the env-var prefix breaks FreeBSD parsing even when the command runs.
  • Alternatively, special-case macOS the same way the Windows branch is special-cased, using launchctl more aggressively (it already has the PID from Bug 1's fix).

Fixing only one of the two bugs would already resolve the macOS warning (either path alone is enough to detect the PID). Fixing both makes the detector resilient to a regression in either route.

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

To fix the false-positive warning on macOS, update the _get_service_pids() function to correctly parse the launchctl list output and modify the ps fallback arguments to be compatible with macOS.

Guidance

  • Update the _get_service_pids() function to either drop the label argument and parse the tab-separated output or keep the label argument and parse the plist-dict output using a regex.
  • Modify the ps fallback arguments to ["ps", "-A", "-ww", "-o", "pid=,command="] to make it compatible with macOS.
  • Test the changes on macOS to ensure the warning is resolved and the gateway is correctly detected.
  • Consider special-casing macOS to use launchctl more aggressively to make the detector more resilient.

Example

# Example of parsing plist-dict output using regex
import re

def _get_service_pids():
    # ...
    result = subprocess.run(
        ["launchctl", "list", label],
        capture_output=True, text=True, timeout=5,
    )
    if result.returncode == 0:
        pid_match = re.search(r'"PID"\s*=\s*(\d+)\s*;', result.stdout)
        if pid_match:
            pid = int(pid_match.group(1))
            # ...

Notes

The provided fix suggestions assume that the issue is solely related to the parsing of launchctl list output and the ps fallback arguments. Additional testing and debugging may be necessary to ensure the changes resolve the issue.

Recommendation

Apply the workaround by updating the _get_service_pids() function and modifying the ps fallback arguments, as this should resolve the false-positive warning on macOS.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING