hermes - ✅(Solved) Fix os.kill(pid, 0) in gateway/status.py raises unhandled OSError on Windows [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14359Fetched 2026-04-24 06:17:39
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
labeled ×4commented ×1cross-referenced ×1

On Windows, when a stale gateway.pid file exists (e.g. after a non-graceful shutdown), restarting the gateway fails because os.kill(existing_pid, 0) raises OSError: [WinError 11] instead of ProcessLookupError.

Error Message

Add OSError to the exception tuple:

Root Cause

On Windows, when a stale gateway.pid file exists (e.g. after a non-graceful shutdown), restarting the gateway fails because os.kill(existing_pid, 0) raises OSError: [WinError 11] instead of ProcessLookupError.

Fix Action

Fixed

PR fix notes

PR #14364: fix(gateway): catch OSError from os.kill on Windows for stale PID detection

Description (problem / solution / changelog)

Problem

Fixes #14359.

On Windows, os.kill(existing_pid, 0) raises OSError([WinError 11]) (and other OSError variants) instead of ProcessLookupError when the process no longer exists. The current handler only catches ProcessLookupError and PermissionError:

# Before
except (ProcessLookupError, PermissionError):
    stale = True

This causes every gateway restart on Windows after a non-graceful shutdown (terminal close, crash, Ctrl+C) to fail with an unhandled OSError — requiring manual deletion of the PID file each time.

Root Cause

Windows does not raise ProcessLookupError (ESRCH) for dead PIDs in os.kill. It raises a platform-specific OSError. This is a known Python/Windows platform difference.

Fix

Add OSError to the exception tuple at the one affected call site:

# After
except (ProcessLookupError, PermissionError, OSError):
    stale = True

This matches the pattern already used at two other os.kill call sites in the same file (lines 522 and 633 in the current main), which already include OSError.

Changes

FileChange
gateway/status.pyAdd OSError to exception tuple at line 499
tests/test_gateway_status_pid_check.pyNew test file — 4 regression tests

Tests

New test file tests/test_gateway_status_pid_check.py covers:

  • ✅ Windows path: os.kill raises OSError(11, ...) → treated as stale (the regression)
  • ✅ Unix path: os.kill raises ProcessLookupError → treated as stale (non-regression)
  • PermissionError path — no unhandled exception (non-regression)
  • ✅ Missing PID file → None returned (baseline)
python -m pytest tests/test_gateway_status_pid_check.py -v

Changed files

  • gateway/status.py (modified, +1/-1)
  • tests/test_gateway_status_pid_check.py (added, +95/-0)

Code Example

try:
    os.kill(existing_pid, 0)
except (ProcessLookupError, PermissionError):
    stale = True

---

except (ProcessLookupError, PermissionError, OSError):
    stale = True
RAW_BUFFERClick to expand / collapse

Description

On Windows, when a stale gateway.pid file exists (e.g. after a non-graceful shutdown), restarting the gateway fails because os.kill(existing_pid, 0) raises OSError: [WinError 11] instead of ProcessLookupError.

Location

gateway/status.py, around line 364:

try:
    os.kill(existing_pid, 0)
except (ProcessLookupError, PermissionError):
    stale = True

Problem

On Windows, when the PID no longer exists, os.kill can raise a generic OSError (e.g. [WinError 11]), which is not caught by the current handler. This causes the gateway to fail to start until the PID file is manually deleted.

This issue recurs every time the gateway exits non-gracefully (closing terminal, system crash, unhandled Ctrl+C, etc.).

Suggested Fix

Add OSError to the exception tuple:

except (ProcessLookupError, PermissionError, OSError):
    stale = True

Note: ProcessLookupError is already a subclass of OSError, so this is a safe superset. If that feels too broad, an alternative is to check sys.platform == "win32" or catch OSError and filter by errno.

Environment

  • OS: Windows 10/11
  • Python: 3.x (venv)
  • Hermes Gateway: latest

extent analysis

TL;DR

Adding OSError to the exception tuple in gateway/status.py is likely to fix the issue with the gateway failing to start due to a stale gateway.pid file on Windows.

Guidance

  • Modify the exception handling in gateway/status.py around line 364 to include OSError as suggested: except (ProcessLookupError, PermissionError, OSError):
  • Consider filtering the OSError exception by errno to ensure only the specific Windows error is caught, if a broader exception handling is not desired.
  • Verify the fix by simulating a non-graceful shutdown and checking if the gateway can restart without manual intervention.
  • Review the gateway/status.py code for any other potential issues related to process management and error handling on Windows.

Example

try:
    os.kill(existing_pid, 0)
except (ProcessLookupError, PermissionError, OSError) as e:
    if isinstance(e, OSError) and e.errno == 11:  # WinError 11
        stale = True
    elif isinstance(e, (ProcessLookupError, PermissionError)):
        stale = True

Notes

This fix assumes that the OSError exception is specifically due to the stale gateway.pid file and not another issue. Additional logging or error handling may be necessary to ensure the gateway's stability.

Recommendation

Apply the suggested workaround by adding OSError to the exception tuple, as it is a safe and targeted fix for the described issue on Windows.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING