hermes - ✅(Solved) Fix gateway run detects calling CLI process as running gateway instance (self false positive) [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

find_gateway_pids() in hermes_cli/gateway.py (and/or gateway/status.py) uses ps aux pattern matching against generic keywords like hermes, gateway, hermes_cli.main gateway. It does not filter out:

  1. The current process PID (os.getpid())
  2. The parent process PID (the shell / CLI that invoked hermes gateway run)
  3. Any process that is an ancestor in the current process tree

This is the same underlying pattern as #4402 (profile-agnostic PID matching) but manifests differently — instead of killing the wrong profile's gateway, it blocks startup entirely when invoked from CLI.

Fix Action

Workaround

Run from a separate terminal session (not the same one where the hermes CLI is active), or use hermes gateway run --replace (which introduces its own race condition per #11718).

PR fix notes

PR #13250: fix(gateway): exclude CLI invocations from _looks_like_gateway_process (Closes #13242)

Description (problem / solution / changelog)

Summary

Fixes the false-positive "gateway already running" detection when starting the gateway from an interactive CLI session (hermes gateway run / hermes gateway run --replace).

Closes #13242.

Root Cause

_looks_like_gateway_process() in gateway/status.py used a broad "hermes gateway" substring match that also matched CLI invocations like:

  • hermes gateway run
  • hermes gateway status
  • hermes gateway restart
  • etc.

When the gateway scans /proc/<pid>/cmdline to verify a PID-file entry, it would incorrectly classify the calling CLI process as a running gateway daemon, triggering the duplicate-instance guard and aborting startup.

Fix

Split the broad "hermes gateway" pattern into two checks:

  1. Precise gateway patternshermes_cli.main gateway, hermes_cli/main.py gateway, gateway/run.py — these always identify a real gateway process.
  2. CLI-subcommand exclusion — if "hermes gateway" is matched but followed by a known CLI subcommand (run, stop, status, restart, install, uninstall, start, setup, doctor), the process is classified as a CLI invocation and rejected.

This preserves backward compatibility for processes that legitimately match "hermes gateway" without subcommands while eliminating the self-detection false positive.

Changed Files

  • gateway/status.py_looks_like_gateway_process() (lines 114-146)

Testing

Manual test scenarios:

  • hermes gateway run — should NOT detect self as running
  • hermes gateway status — should NOT detect CLI as running gateway
  • Actual gateway process (python -m hermes_cli.main gateway run) — SHOULD still be detected
  • Stale PID file pointing to CLI process — should be cleaned up correctly

Changed files

  • gateway/status.py (modified, +23/-4)

PR #13273: fix(gateway): exclude CLI invoker ancestor chain in _append_unique_pid

Description (problem / solution / changelog)

Problem

`hermes gateway run` invoked from an interactive CLI session gets a self false-positive: the `ps aux` scan in `_scan_gateway_pids` matches generic substrings like `hermes_cli.main gateway`, which picks up the invoking CLI process AND its ancestor chain (parent shell, tmux session, launcher wrapper). The gateway then concludes it's 'already running' and shuts down immediately.

Reported by @yes999zc in #13242.

Fix

`_append_unique_pid` already filtered `os.getpid()` but not the ancestor chain. The repo already has an ancestor-detection helper `_is_pid_ancestor_of_current_process` used by `_request_gateway_self_restart`. Extend `_append_unique_pid` to reuse it — one gate, covers every call site (`find_gateway_pids`, service probes, `_scan_gateway_pids`).

Pre-implement audit

  • A (existing helper): `_is_pid_ancestor_of_current_process` already defined at `hermes_cli/gateway.py:150`. Reuse rather than re-walking the parent chain. ✓
  • B (shared callers): `_append_unique_pid` is called from `find_gateway_pids` (service + PID-file + ps-scan paths). Adding an extra exclusion strictly narrows what's returned — existing callers that wanted to kill/probe gateway processes still get real gateway PIDs, they just stop seeing their own invoker tree. Contract preserved. ✓
  • C (broader rival): No rival on #13242. ✓

Testing

  • New `test_append_unique_pid_excludes_current_process_ancestors` — monkeypatches the parent chain to return `5000 → 4000 → 1`, asserts that `_append_unique_pid` rejects both 5000 (current) and 4000 (parent), keeps only the unrelated 7777 gateway PID.
  • Touched the existing `test_find_gateway_pids_falls_back_to_pid_file_when_process_scan_fails` to stub `_is_pid_ancestor_of_current_process` (so the fallback path doesn't shell out to real `ps -o ppid=` for the fake service pid 321).
  • Full `test_gateway.py` + `test_gateway_service.py` suites pass (117/117).

Fixes #13242

Changed files

  • hermes_cli/gateway.py (modified, +8/-0)
  • tests/hermes_cli/test_gateway.py (modified, +32/-0)
RAW_BUFFERClick to expand / collapse

Bug Description

When running hermes gateway run (or hermes gateway run --replace) from an interactive CLI session, the gateway immediately detects itself as "already running" and shuts down. This happens because find_gateway_pids() scans ps aux output for processes matching hermes / gateway keywords, and the calling CLI process itself appears in the process list.

Steps to Reproduce

  1. Open a terminal / CLI session
  2. Run hermes gateway run (or hermes gateway run --replace)
  3. The gateway startup script runs find_gateway_pids() which scans ps aux
  4. The scan matches the current hermes CLI process (command line contains hermes)
  5. Gateway concludes "already running" and immediately shuts down

Expected Behavior

find_gateway_pids() should exclude the current process (and its parent process tree) from the scan. The PID detection should only find actual background gateway processes, not the CLI process that is initiating the start.

Actual Behavior

The CLI process is falsely detected as a running gateway, causing startup to fail with "Gateway already running" unless --replace is used (which itself has race condition issues, see #11718).

Root Cause

find_gateway_pids() in hermes_cli/gateway.py (and/or gateway/status.py) uses ps aux pattern matching against generic keywords like hermes, gateway, hermes_cli.main gateway. It does not filter out:

  1. The current process PID (os.getpid())
  2. The parent process PID (the shell / CLI that invoked hermes gateway run)
  3. Any process that is an ancestor in the current process tree

This is the same underlying pattern as #4402 (profile-agnostic PID matching) but manifests differently — instead of killing the wrong profile's gateway, it blocks startup entirely when invoked from CLI.

Workaround

Run from a separate terminal session (not the same one where the hermes CLI is active), or use hermes gateway run --replace (which introduces its own race condition per #11718).

Suggested Fix

  1. In find_gateway_pids() / _scan_gateway_pids(), exclude os.getpid() and all ancestor PIDs (walk up /proc/self/status PPID chain on Linux, use ps -o ppid= on macOS).
  2. Alternatively, only match processes with the specific gateway entry point pattern (e.g., gateway/run.py as the main script), not generic hermes keyword matches.

Environment

  • Platform: macOS / Linux (both affected, since both use ps aux scanning as a fallback)
  • Hermes Version: 0.8.0+

Related Issues

  • #4402 — gateway stop kills ALL profile gateways (same root cause: unscoped ps aux matching)
  • #11718 — --replace race condition
  • #12849 — macOS PID detection format issues
  • #6666 — Restart from Telegram session causes process death

extent analysis

TL;DR

Modify the find_gateway_pids() function to exclude the current process PID and its ancestors to prevent false detection of the gateway as "already running".

Guidance

  • Review the find_gateway_pids() function in hermes_cli/gateway.py and gateway/status.py to understand the current implementation of PID matching.
  • Implement a filter to exclude the current process PID (os.getpid()) and its ancestor PIDs from the scan results.
  • Consider using a more specific pattern match for the gateway entry point (e.g., gateway/run.py) instead of generic hermes keyword matches.
  • Test the modified function to ensure it correctly identifies running gateway processes and excludes the current CLI process.

Example

import os

def find_gateway_pids():
    # ...
    current_pid = os.getpid()
    # Exclude current process and its ancestors
    pids_to_exclude = get_ancestor_pids(current_pid)
    # ...
    return [pid for pid in pids if pid not in pids_to_exclude]

def get_ancestor_pids(pid):
    # Implement platform-specific logic to get ancestor PIDs
    # For Linux:
    # with open('/proc/self/status') as f:
    #     for line in f:
    #         if line.startswith('PPid:'):
    #             ppid = int(line.split()[1])
    #             # Recursively get ancestor PIDs
    #             yield ppid
    #             yield from get_ancestor_pids(ppid)
    pass  # Implement me

Notes

The provided example is incomplete and requires platform-specific implementation for getting ancestor PIDs. The get_ancestor_pids() function should be completed according to the target platform's process hierarchy.

Recommendation

Apply the suggested fix by modifying the find_gateway_pids() function to exclude the current process PID and its ancestors, as this approach directly addresses the root cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix gateway run detects calling CLI process as running gateway instance (self false positive) [2 pull requests]