hermes - ✅(Solved) Fix dashboard: PermissionError on stale root-owned gateway.lock crashes /api/status with HTTP 500 [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#18935Fetched 2026-05-03 04:53:29
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Author
Participants
Timeline (top)
labeled ×4cross-referenced ×2commented ×1

Error Message

try: handle = open(resolved_lock_path, 'a+', encoding='utf-8') except PermissionError: logger.warning('gateway.lock at %s not accessible; removing stale lock file.', resolved_lock_path) try: resolved_lock_path.unlink() except OSError: pass return False

Root Cause

is_gateway_runtime_lock_active() in gateway/status.py opens the lock file with 'a+' (write mode) even when only checking lock status. If the file is owned by another user, this throws PermissionError with no handler.

Fix Action

Fix

Wrap the open() call in a PermissionError handler. Since the hermes user owns the parent directory ($HERMES_HOME), it can unlink() the stale file despite not owning it:

try:
    handle = open(resolved_lock_path, 'a+', encoding='utf-8')
except PermissionError:
    logger.warning('gateway.lock at %s not accessible; removing stale lock file.', resolved_lock_path)
    try:
        resolved_lock_path.unlink()
    except OSError:
        pass
    return False

This is self-healing: the stale file is removed on first poll and subsequent checks work normally.

PR fix notes

PR #18940: fix(gateway): handle PermissionError on stale root-owned gateway.lock

Description (problem / solution / changelog)

Problem

Fixes #18935: When the gateway container runs without entrypoint.sh (bypassing the gosu privilege drop), gateway.lock is created owned by root:root. After restoring the entrypoint and restarting, the dashboard (uid 10000) cannot open the lock file, causing PermissionError that propagates as HTTP 500 on /api/status.

The frontend shows: events feed disconnected — tool calls may not appear

Fix

Wrap the open(resolved_lock_path, "a+") call in is_gateway_runtime_lock_active() with a PermissionError handler. Since the hermes user owns the parent directory ($HERMES_HOME), it can unlink() the stale file:

try:
    handle = open(resolved_lock_path, "a+", encoding="utf-8")
except PermissionError:
    logger.warning(
        "gateway.lock at %s not accessible (PermissionError); "
        "removing stale lock file.",
        resolved_lock_path,
    )
    try:
        resolved_lock_path.unlink()
    except OSError:
        pass
    return False

This is self-healing: the stale file is removed on first poll and subsequent checks work normally.

Testing

  • Verified the code compiles (no syntax errors)
  • The fix returns False (no active lock) when the file is inaccessible, which is correct — if no process can hold the lock (file is stale), then the lock is not active
  • Related to #18936 (entrypoint privilege drop bypass)

Screenshots

N/A — backend fix only

Changed files

  • gateway/status.py (modified, +18/-1)

Code Example

try:
    handle = open(resolved_lock_path, 'a+', encoding='utf-8')
except PermissionError:
    logger.warning('gateway.lock at %s not accessible; removing stale lock file.', resolved_lock_path)
    try:
        resolved_lock_path.unlink()
    except OSError:
        pass
    return False
RAW_BUFFERClick to expand / collapse

Problem

When the hermes gateway container is started without entrypoint.sh (i.e. the gosu-based privilege drop is bypassed), gateway processes run as root and create gateway.lock owned by root:root 0600.

After correcting the entrypoint, the dashboard process (running as hermes, uid 10000) polls /api/status, which calls get_running_pid()is_gateway_runtime_lock_active()open(lock_path, 'a+'). Because the file is root-owned, this raises PermissionError.

There is no try/except around get_running_pid() in the /api/status handler (web_server.py ~line 518), so the exception propagates as HTTP 500. The SSE/events client receives the error response and the frontend shows:

events feed disconnected — tool calls may not appear

Reproduction

  1. Start hermes container without entrypoint.sh so it runs as root.
  2. Let it create gateway.lock (owned root:root).
  3. Fix the entrypoint (re-add entrypoint.sh), restart the container.
  4. Dashboard polls /api/statusPermissionError → HTTP 500 → events feed shows disconnected.

Root cause

is_gateway_runtime_lock_active() in gateway/status.py opens the lock file with 'a+' (write mode) even when only checking lock status. If the file is owned by another user, this throws PermissionError with no handler.

Fix

Wrap the open() call in a PermissionError handler. Since the hermes user owns the parent directory ($HERMES_HOME), it can unlink() the stale file despite not owning it:

try:
    handle = open(resolved_lock_path, 'a+', encoding='utf-8')
except PermissionError:
    logger.warning('gateway.lock at %s not accessible; removing stale lock file.', resolved_lock_path)
    try:
        resolved_lock_path.unlink()
    except OSError:
        pass
    return False

This is self-healing: the stale file is removed on first poll and subsequent checks work normally.

Related

  • PR #17887 fixes the related WebSocket 4403 issue with --insecure + reverse proxy
  • Issue #18415 tracks the WebSocket loopback rejection

extent analysis

TL;DR

To resolve the PermissionError issue, wrap the open() call in is_gateway_runtime_lock_active() with a PermissionError handler to remove the stale lock file.

Guidance

  • Verify that the hermes user has ownership of the parent directory ($HERMES_HOME) to ensure it can remove the stale lock file.
  • Implement the proposed fix by wrapping the open() call in a try-except block to catch PermissionError and remove the stale lock file using unlink().
  • Test the fix by reproducing the issue and verifying that the events feed no longer shows as disconnected after applying the fix.
  • Review related issues (PR #17887 and Issue #18415) to ensure that other potential issues with WebSocket connections are addressed.

Example

The provided code snippet demonstrates how to wrap the open() call in a try-except block to handle PermissionError:

try:
    handle = open(resolved_lock_path, 'a+', encoding='utf-8')
except PermissionError:
    logger.warning('gateway.lock at %s not accessible; removing stale lock file.', resolved_lock_path)
    try:
        resolved_lock_path.unlink()
    except OSError:
        pass
    return False

Notes

This fix assumes that the hermes user has ownership of the parent directory ($HERMES_HOME) and that removing the stale lock file is a safe and effective solution.

Recommendation

Apply the proposed workaround by implementing the try-except block to handle PermissionError and remove the stale lock file, as this provides a self-healing solution that resolves the issue without requiring further changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING