openclaw - 💡(How to fix) Fix openclaw gateway restart hangs on macOS with SMB-mounted volumes (lsof stat() timeout) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75767Fetched 2026-05-02 05:30:32
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
2
Author
Timeline (top)
commented ×1

Error Message

Force: Error: lsof permission denied while inspecting gateway port: ...

Root Cause

OpenClaw's port-checking logic uses lsof in 4 separate call sites across 3 files:

FileFunctionCall Type
ports-bfXSW6vy.jsreadUnixListenersasync runCommandWithTimeout
ports-CUmvj7Fu.jslistPortListenerssync execFileSync (used by --force)
restart-stale-pids-*.jsfindGatewayPidsOnPortSyncsync spawnSync
restart-stale-pids-*.jspollPortOncesync spawnSync

The command used is: lsof -nP -iTCP:{port} -sTCP:LISTEN -FpFc[n]

On macOS, lsof traverses all mounted filesystems to build its file descriptor table, even when the -i flag limits results to a single network port. When an SMB/NFS mount is unresponsive (e.g., NAS is busy with Time Machine backup, network congestion, disk sleep wake-up), lsof blocks for the SMB timeout (10-30s+) on each stat() call.

This affects every CLI operation that needs to inspect port 18789, causing the Gateway restart sequence to feel like it takes "10+ minutes" as it goes through multiple retry/timeout cycles.

Fix Action

Fix / Workaround

Attempted workaround: -b flag

The lsof -b flag (which skips blocking kernel calls like stat(), readlink()) was tested but does not resolve the issue — macOS lsof still traverses SMB mounts at the VFS layer regardless of -b.

Code Example

lsof: WARNING: can't stat() smbfs file system /Volumes/.timemachine/DS224plus._smb._tcp.local./XXXX/Mac_backup
      Output information may be incomplete.
      assuming "dev=3600007c" from mount table

---

Force: Error: lsof permission denied while inspecting gateway port: ...

---

// Instead of calling lsof, probe directly:
const net = require('net');
const server = net.createServer();
server.listen(port, '127.0.0.1', () => {
  server.close(); // port is free
});
server.on('error', (err) => {
  if (err.code === 'EADDRINUSE') /* port is busy */;
});
RAW_BUFFERClick to expand / collapse

Description

Environment

  • OS: macOS 15.x (Darwin 25.4.0, arm64)
  • Node: v25.8.2
  • OpenClaw: 2026.4.26 (be8c246)
  • Network mounts: Synology DS224plus via SMB, used for Time Machine backup
    • Mount path: /Volumes/.timemachine/DS224plus._smb._tcp.local./.../Mac_backup

Symptoms

Running openclaw gateway restart or openclaw gateway --force causes the process to hang for extended periods (30s to minutes). During the hang, the following warning repeats many times:

lsof: WARNING: can't stat() smbfs file system /Volumes/.timemachine/DS224plus._smb._tcp.local./XXXX/Mac_backup
      Output information may be incomplete.
      assuming "dev=3600007c" from mount table

Eventually it either times out or fails with:

Force: Error: lsof permission denied while inspecting gateway port: ...

Root Cause Analysis

OpenClaw's port-checking logic uses lsof in 4 separate call sites across 3 files:

FileFunctionCall Type
ports-bfXSW6vy.jsreadUnixListenersasync runCommandWithTimeout
ports-CUmvj7Fu.jslistPortListenerssync execFileSync (used by --force)
restart-stale-pids-*.jsfindGatewayPidsOnPortSyncsync spawnSync
restart-stale-pids-*.jspollPortOncesync spawnSync

The command used is: lsof -nP -iTCP:{port} -sTCP:LISTEN -FpFc[n]

On macOS, lsof traverses all mounted filesystems to build its file descriptor table, even when the -i flag limits results to a single network port. When an SMB/NFS mount is unresponsive (e.g., NAS is busy with Time Machine backup, network congestion, disk sleep wake-up), lsof blocks for the SMB timeout (10-30s+) on each stat() call.

This affects every CLI operation that needs to inspect port 18789, causing the Gateway restart sequence to feel like it takes "10+ minutes" as it goes through multiple retry/timeout cycles.

Attempted workaround: -b flag

The lsof -b flag (which skips blocking kernel calls like stat(), readlink()) was tested but does not resolve the issue — macOS lsof still traverses SMB mounts at the VFS layer regardless of -b.

Proposed fix

Replace lsof-based port checking with a pure Node.js approach:

// Instead of calling lsof, probe directly:
const net = require('net');
const server = net.createServer();
server.listen(port, '127.0.0.1', () => {
  server.close(); // port is free
});
server.on('error', (err) => {
  if (err.code === 'EADDRINUSE') /* port is busy */;
});

This would be immune to filesystem mount issues and also faster than spawning lsof.

Alternatively, at minimum, the readUnixListeners async code path already has a ss (socket statistics) fallback — extending this to the sync paths or making the sync paths async could help.

extent analysis

TL;DR

Replace lsof-based port checking with a pure Node.js approach to avoid hangs caused by unresponsive SMB mounts.

Guidance

  • Identify and review the four call sites in OpenClaw's codebase where lsof is used for port checking: ports-bfXSW6vy.js, ports-CUmvj7Fu.js, and two functions in restart-stale-pids-*.js.
  • Consider implementing the proposed Node.js approach using net.createServer() to probe ports directly, as shown in the example code snippet.
  • Alternatively, explore extending the ss fallback in the async readUnixListeners code path to the sync paths or converting the sync paths to async to mitigate the issue.
  • Verify the fix by running openclaw gateway restart or openclaw gateway --force and checking for the absence of hangs and lsof warnings.

Example

The proposed fix example code snippet is already provided in the issue:

const net = require('net');
const server = net.createServer();
server.listen(port, '127.0.0.1', () => {
  server.close(); // port is free
});
server.on('error', (err) => {
  if (err.code === 'EADDRINUSE') /* port is busy */;
});

This code can be used as a starting point for replacing lsof-based port checking.

Notes

The lsof -b flag does not resolve the issue on macOS, so alternative approaches like the proposed Node.js solution or extending the ss fallback are necessary.

Recommendation

Apply the proposed workaround by replacing lsof-based port checking with the pure Node.js approach, as it is immune to filesystem mount issues and potentially faster than spawning lsof.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix openclaw gateway restart hangs on macOS with SMB-mounted volumes (lsof stat() timeout) [1 comments, 2 participants]