hermes - 💡(How to fix) Fix macOS gateway scoped locks can block adapters after PID reuse when start_time is null

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Because start_time was null, acquire_scoped_lock() could not detect PID reuse. Since os.kill(pid, 0) succeeded for the reused PID, Hermes treated the lock as live and refused to start the Weixin adapter.

Fix Action

Workaround

Manually remove the stale scoped lock file and restart the gateway:

rm ~/.local/state/hermes/gateway-locks/weixin-bot-token-<hash>.lock
hermes gateway restart

This restores the adapter, but it does not prevent recurrence if a future stale lock has start_time: null and the PID is reused.

Code Example

Weixin bot token already in use (PID 1379). Stop the other gateway first.

---

/System/Library/CoreServices/TextInputMenuAgent.app/Contents/MacOS/TextInputMenuAgent

---

{
  "pid": 1379,
  "kind": "hermes-gateway",
  "argv": ["/Users/.../.hermes/hermes-agent/hermes_cli/main.py", "gateway", "run", "--replace"],
  "start_time": null,
  "scope": "weixin-bot-token",
  "identity_hash": "567d27d5ea481343",
  "metadata": {"platform": "weixin"},
  "updated_at": "2026-05-24T22:15:49.471816+00:00"
}

---

[Weixin] Weixin bot token already in use (PID 1379). Stop the other gateway first.

---

ps -p <pid> -o lstart=

---

43 passed in 0.51s

---

rm ~/.local/state/hermes/gateway-locks/weixin-bot-token-<hash>.lock
hermes gateway restart
RAW_BUFFERClick to expand / collapse

Bug Description

On macOS, a stale machine-local gateway scoped lock can permanently block a platform adapter when the recorded PID is later reused by an unrelated system process.

Observed with the Weixin adapter, but the issue appears to be in the shared scoped-lock helper used by gateway platform token locks.

The gateway reported:

Weixin bot token already in use (PID 1379). Stop the other gateway first.

However PID 1379 was not a Hermes gateway process. It had been reused by macOS:

/System/Library/CoreServices/TextInputMenuAgent.app/Contents/MacOS/TextInputMenuAgent

The stale lock file looked like this:

{
  "pid": 1379,
  "kind": "hermes-gateway",
  "argv": ["/Users/.../.hermes/hermes-agent/hermes_cli/main.py", "gateway", "run", "--replace"],
  "start_time": null,
  "scope": "weixin-bot-token",
  "identity_hash": "567d27d5ea481343",
  "metadata": {"platform": "weixin"},
  "updated_at": "2026-05-24T22:15:49.471816+00:00"
}

Because start_time was null, acquire_scoped_lock() could not detect PID reuse. Since os.kill(pid, 0) succeeded for the reused PID, Hermes treated the lock as live and refused to start the Weixin adapter.

Steps to Reproduce

  1. Run Hermes gateway on macOS with a platform adapter that uses acquire_scoped_lock() (for example Weixin).
  2. Leave behind a scoped lock file where:
    • pid points to a previous gateway process PID
    • kind is hermes-gateway
    • start_time is null
  3. Let macOS reuse that PID for a non-Hermes process.
  4. Restart Hermes gateway.

Expected Behavior

Hermes should detect that the lock owner is stale or that the PID has been reused by a non-gateway process, clean/replace the stale scoped lock, and allow the adapter to connect.

Actual Behavior

Hermes treats the reused PID as a live gateway process and refuses to acquire the platform lock:

[Weixin] Weixin bot token already in use (PID 1379). Stop the other gateway first.

Environment

  • OS: macOS
  • Install path: ~/.hermes/hermes-agent
  • Gateway service: launchd
  • Affected lock dir: ~/.local/state/hermes/gateway-locks/
  • Branch observed locally: main, behind origin/main at the time of debugging

Suggested Fix

Two changes would prevent this class of issue:

  1. Make _get_process_start_time(pid) return a useful process start marker on macOS, e.g. via:

    ps -p <pid> -o lstart=

    This makes new lock records contain a non-null start_time, allowing PID reuse to be detected.

  2. Add a legacy-lock fallback for existing records with start_time: null: if the recorded PID is alive but the command line does not look like a Hermes gateway process, treat the scoped lock as stale instead of blocking the token indefinitely.

A local patch that fixed the observed issue added:

  • macOS fallback start marker using ps -p <pid> -o lstart=
  • command-line fallback using ps -p <pid> -o command= when /proc/<pid>/cmdline is unavailable
  • stale detection for legacy start_time: null records whose live PID does not look like a gateway process
  • a regression test covering a legacy Weixin lock whose PID is reused by TextInputMenuAgent

The local regression test passed:

43 passed in 0.51s

Workaround

Manually remove the stale scoped lock file and restart the gateway:

rm ~/.local/state/hermes/gateway-locks/weixin-bot-token-<hash>.lock
hermes gateway restart

This restores the adapter, but it does not prevent recurrence if a future stale lock has start_time: null and the PID is reused.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix macOS gateway scoped locks can block adapters after PID reuse when start_time is null