openclaw - 💡(How to fix) Fix macOS: gateway PID tracking becomes stale after launchd KeepAlive respawn [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62385Fetched 2026-04-08 03:05:07
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Root Cause

The issue is a fundamental mismatch between two process lifecycle owners:

  1. launchd owns the process lifecycle via KeepAlive — it will always respawn the gateway after exit
  2. OpenClaw's PID tracking assumes it is the sole lifecycle owner — it writes the PID at start time and expects it to remain valid

When launchd respawns the process, the new instance does not update the PID file because:

  • The PID file was written by the launcher (or previous instance), not the gateway process itself
  • The respawned process has no knowledge that it should update a PID file
  • There is no mechanism to reconcile the PID file with the actual port owner

Code Example

<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.openclaw.myprofile</string>
  <key>ProgramArguments</key>
  <array>
    <string>/opt/homebrew/bin/openclaw</string>
    <string>gateway</string>
    <string>--port</string>
    <string>18789</string>
  </array>
  <key>KeepAlive</key>
  <true/>
  <key>RunAtLoad</key>
  <true/>
  <key>WorkingDirectory</key>
  <string>/Users/me/.openclaw-myprofile</string>
</dict>
</plist>

---

# Status reports "stopped" because PID 1000 no longer exists
$ openclaw gateway status
Gateway: stopped

# But the gateway IS running — a new process owns the port
$ lsof -iTCP:18789 -sTCP:LISTEN -P -n
node  2000  me  22u  IPv4  ...  TCP 127.0.0.1:18789 (LISTEN)

# Trying to start creates a process that immediately fails (port in use)
$ openclaw gateway start
# Writes PID 3000 to PID file, but PID 3000 exits immediately
# Actual listener is still PID 2000

# Only manual intervention fixes it
$ echo 2000 > ~/.openclaw-myprofile/gateway.pid
$ openclaw gateway status
Gateway: running (PID 2000)  # now correct
RAW_BUFFERClick to expand / collapse

Problem

When running the OpenClaw gateway as a macOS LaunchAgent with KeepAlive enabled, the gateway PID tracking file becomes stale after launchd respawns the process. This causes openclaw gateway status to report "stopped" when the gateway is actually running, and makes the stop/start cycle unreliable.

This issue persists on v2026.4.5 despite the excellent macOS gateway improvements in recent releases (#54801, #60085, #43766).

Steps to reproduce

  1. Configure a LaunchAgent plist for the gateway with KeepAlive: true:
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>com.openclaw.myprofile</string>
  <key>ProgramArguments</key>
  <array>
    <string>/opt/homebrew/bin/openclaw</string>
    <string>gateway</string>
    <string>--port</string>
    <string>18789</string>
  </array>
  <key>KeepAlive</key>
  <true/>
  <key>RunAtLoad</key>
  <true/>
  <key>WorkingDirectory</key>
  <string>/Users/me/.openclaw-myprofile</string>
</dict>
</plist>
  1. Load the LaunchAgent via launchctl bootstrap gui/<uid> /path/to/plist
  2. The gateway starts with PID 1000 and writes it to its PID/lock file
  3. Stop the gateway: openclaw gateway stop or kill 1000
  4. launchd detects the process exited and respawns it with a new PID (e.g., 2000)
  5. The PID file still contains PID 1000 (stale)

Observed behavior after respawn

# Status reports "stopped" because PID 1000 no longer exists
$ openclaw gateway status
Gateway: stopped

# But the gateway IS running — a new process owns the port
$ lsof -iTCP:18789 -sTCP:LISTEN -P -n
node  2000  me  22u  IPv4  ...  TCP 127.0.0.1:18789 (LISTEN)

# Trying to start creates a process that immediately fails (port in use)
$ openclaw gateway start
# Writes PID 3000 to PID file, but PID 3000 exits immediately
# Actual listener is still PID 2000

# Only manual intervention fixes it
$ echo 2000 > ~/.openclaw-myprofile/gateway.pid
$ openclaw gateway status
Gateway: running (PID 2000)  # now correct

Root cause analysis

The issue is a fundamental mismatch between two process lifecycle owners:

  1. launchd owns the process lifecycle via KeepAlive — it will always respawn the gateway after exit
  2. OpenClaw's PID tracking assumes it is the sole lifecycle owner — it writes the PID at start time and expects it to remain valid

When launchd respawns the process, the new instance does not update the PID file because:

  • The PID file was written by the launcher (or previous instance), not the gateway process itself
  • The respawned process has no knowledge that it should update a PID file
  • There is no mechanism to reconcile the PID file with the actual port owner

Impact

This affects any deployment using LaunchAgent/LaunchDaemon for supervised gateway management, which is the recommended macOS deployment pattern. The impact compounds with multiple gateway instances (different profiles on different ports), where manual PID correction becomes a recurring maintenance burden.

Failure modes:

  • status reports incorrect state → monitoring/alerting cannot be trusted
  • stop targets wrong or nonexistent PID → cannot cleanly stop the gateway
  • start after a "stopped" status fails silently because the port is occupied
  • Automation scripts that rely on status checks break

Notes on v2026.4.5 fixes

v2026.4.5 introduced significant macOS improvements:

  • PID recycling detection (#54801, #60085) — detects when a PID in the lock file belongs to a different process. This helps but does not solve the case where the old PID simply does not exist anymore (process exited, launchd spawned a new one with a different PID).
  • LaunchAgent recovery (#43766) — recovers installed-but-unloaded agents during start/restart. Great for unload scenarios but does not address the PID file staleness after respawn.
  • Supervised-exit delay — avoids launchd crash-loop unloads. Helpful for stability but orthogonal to PID tracking.

Suggested approaches

  1. Port-based liveness as primary check: Instead of relying solely on the PID file, check if the configured port is bound. If it is, resolve the owning PID via OS APIs (lsof / proc) and adopt it. This is resilient to any external process manager (launchd, systemd, supervisord).

  2. Self-written PID file: Have the gateway process write its own PID on startup (inside the process, not in the launcher). This way, launchd-respawned instances always write the correct PID.

  3. Lock file with port validation: On status check, if the PID in the lock file is dead but the port is bound, update the lock file with the actual owner PID and report "running".

  4. Document launchd interaction: At minimum, document the known interaction between KeepAlive and PID tracking, and recommend best practices (e.g., disable KeepAlive if using openclaw gateway start/stop directly).

Environment

  • OpenClaw v2026.4.5 (3e72c03)
  • macOS 15.x (Sequoia), Apple Silicon
  • LaunchAgent with KeepAlive enabled
  • Multiple gateway instances on different ports
  • Reproduced consistently across v2026.4.1 through v2026.4.5

extent analysis

TL;DR

Update the OpenClaw gateway to use a port-based liveness check as the primary method for determining its status, rather than relying solely on the PID file.

Guidance

  1. Implement port-based liveness check: Modify the openclaw gateway status command to check if the configured port is bound, and if so, resolve the owning PID via OS APIs (lsof / proc) to determine the gateway's status.
  2. Update PID file handling: Consider having the gateway process write its own PID on startup, ensuring that launchd-respawned instances always write the correct PID.
  3. Validate port ownership on status check: When checking the gateway's status, verify if the PID in the lock file is dead but the port is bound, and update the lock file with the actual owner PID if necessary.
  4. Document launchd interaction: Document the known interaction between KeepAlive and PID tracking, and provide best practices for using openclaw gateway start/stop with LaunchAgent.

Example

# Example of using lsof to resolve the owning PID of a bound port
lsof -iTCP:18789 -sTCP:LISTEN -P -n

Notes

The suggested approaches aim to address the fundamental mismatch between launchd's process lifecycle management and OpenClaw's PID tracking. However, the optimal solution may depend on the specific requirements and constraints of the OpenClaw gateway and its deployment environment.

Recommendation

Apply the workaround of using a port-based liveness check as the primary method for determining the gateway's status, as it provides a more robust and resilient solution to the PID file staleness issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING