hermes - 💡(How to fix) Fix [Bug]: macOS PID lock check fails — system process occupies same PID, causing zombie lock on gateway restart

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

In gateway/platforms/base.py, the _get_process_start_time() method on macOS returns None (no /proc filesystem), disabling the stale PID check entirely. The lock mechanism relies solely on psutil.pid_exists() which returns True for any process running on the PID slot — including macOS system processes.

Confirmed via investigation: PID 622 was occupied by com.apple.CloudDocs.iCloudDriveFileProvider (macOS system process) at the time of the second gateway start attempt.

RAW_BUFFERClick to expand / collapse

Bug Description

Gateway PID lock check on macOS fails when a non-Hermes system process occupies the same PID. On macOS, _get_process_start_time() returns None because /proc doesn't exist, causing stale lock detection to be disabled. When a system process (e.g., CloudDocs iCloud) occupies the PID, psutil.pid_exists() returns True but it is not a Hermes process — this creates a zombie lock. Subsequent gateway restarts see "bot token already in use (PID XXX)" and fail to connect Telegram, Feishu, and WeChat platforms.

Steps to Reproduce

  1. Start Hermes Gateway
  2. Gateway starts and registers platform tokens (Telegram, Feishu, WeChat)
  3. The original gateway process exits/crashes or gets SIGTERM
  4. macOS reuses the same PID for a system process (e.g., CloudDocs)
  5. Run hermes gateway restart
  6. Result: All platform connections fail with "already in use (PID XXX). Stop the other gateway first."

Expected Behavior

The gateway should detect that the PID is not actually running a Hermes process (e.g., by checking process name/cmdline) and either release the stale lock or cleanly replace it.

Actual Behavior

Gateway starts but only api_server connects successfully. Telegram, Feishu, and WeChat all fail with "bot token already in use (PID <system_pid>)". The gateway reports "Gateway running with 1 platform(s)" instead of 3+.

Root Cause Analysis

In gateway/platforms/base.py, the _get_process_start_time() method on macOS returns None (no /proc filesystem), disabling the stale PID check entirely. The lock mechanism relies solely on psutil.pid_exists() which returns True for any process running on the PID slot — including macOS system processes.

Confirmed via investigation: PID 622 was occupied by com.apple.CloudDocs.iCloudDriveFileProvider (macOS system process) at the time of the second gateway start attempt.

Environment

  • OS: macOS 26.2 (Sequoia)
  • Python: 3.11.15
  • Hermes: 0.13.0
  • CPU: Apple M3 Max

Proposed Fix

Add a process name/cmdline verification in addition to PID existence check:

  1. After psutil.pid_exists(pid) returns True, use psutil.Process(pid).name() or .exe() to verify the process is actually Hermes (e.g., matches Python/hermes binary name).
  2. If the process is not Hermes, treat the lock as stale and proceed with startup.
  3. On macOS, also check that the PID is in the process tree running under the current user with a recognizable hermes Python path.

This ensures the PID lock check is reliable on macOS where /proc is unavailable and PIDs are recycled to system processes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING