openclaw - ✅(Solved) Fix [Bug] Gateway restart fails - stale process misdetection [1 pull requests, 3 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#50074Fetched 2026-04-08 00:59:25
View on GitHub
Comments
3
Participants
2
Timeline
9
Reactions
0
Author
Timeline (top)
commented ×3referenced ×3labeled ×2cross-referenced ×1

Gateway restart fails due to stale process misdetection and race condition

Root Cause

Gateway restart fails due to stale process misdetection and race condition

Fix Action

Fix / Workaround

Workaround: Use separate commands openclaw gateway stop then openclaw gateway start

PR fix notes

PR #50097: fix: avoid killing current gateway pid in stale-restart retry path

Description (problem / solution / changelog)

Summary

  • prevent stale-pid cleanup from terminating the currently running gateway pid during restart recovery
  • preserve stale cleanup + restart retry behavior for genuinely stale listeners
  • add a regression test for mixed stale lists (self pid + stale pid)

Changes

  • filter stale PID candidates before termination in daemon restart flow
  • exclude runtime pid from kill candidates when runtime is currently running
  • add lifecycle test verifying only non-self stale PID is terminated

Testing

  • npm install
  • npm exec -- vitest run src/cli/daemon-cli/lifecycle.test.ts
  • result: 9 tests passed

Fixes #50074

Changed files

  • src/cli/daemon-cli/lifecycle.test.ts (modified, +23/-0)
  • src/cli/daemon-cli/lifecycle.ts (modified, +10/-3)

Code Example

01:12:04 Gateway starting, PID 95920
01:12:09 Found stale gateway process(es) after restart: 95920. Cleaning up...
01:12:22 Gateway starting, PID 93984
01:12:23 Gateway starting, PID 83788
01:12:27 Found stale gateway process(es): 93984.
01:12:27 Stopping stale process(es) and retrying restart...
01:12:46 Gateway restart timed out after 60s waiting for health checks.
01:12:46 Port 18789 is already in use.
Gateway failed to start: gateway already running (pid XXX); lock timeout after 5000ms

---
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Summary

Gateway restart fails due to stale process misdetection and race condition

Steps to reproduce

  1. Run openclaw gateway restart
  2. Observe logs or wait for timeout

Expected behavior

Stop old process → wait for clean exit → start new process → verify health

Actual behavior

  1. Gateway starts, gets PID 95920
  2. CLI detects "stale process 95920" (the one it just started!)
  3. Multiple new Gateway processes are started
  4. After ~60s timeout: "Gateway restart timed out"
  5. Port 18789 is already in use

Key log excerpt:

01:12:04 Gateway starting, PID 95920
01:12:09 Found stale gateway process(es) after restart: 95920. Cleaning up...
01:12:22 Gateway starting, PID 93984
01:12:23 Gateway starting, PID 83788
01:12:27 Found stale gateway process(es): 93984.
01:12:27 Stopping stale process(es) and retrying restart...
01:12:46 Gateway restart timed out after 60s waiting for health checks.
01:12:46 Port 18789 is already in use.
Gateway failed to start: gateway already running (pid XXX); lock timeout after 5000ms

OpenClaw version

2026.3.61d171a

Operating system

Windows 11 (Windows_NT 10.0.22631)

Install method

No response

Model

N/A (Gateway issue, not model-related)

Provider / routing chain

N/A (Gateway issue, not provider-related)

Config file / key location

No response

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: Gateway users on Windows with scheduled task Severity: High (blocks gateway restart) Frequency: 100% repro Consequence: Gateway cannot be restarted via CLI; scheduled task fails

Workaround: Use separate commands openclaw gateway stop then openclaw gateway start

Additional information

No response

extent analysis

Fix Plan

To resolve the gateway restart issue due to stale process misdetection and race condition, we need to modify the restart logic to properly wait for the old process to exit before starting a new one. Here are the steps:

  • Modify the openclaw gateway restart command to:
    1. Stop the gateway process.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Stop old process → wait for clean exit → start new process → verify health

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING