openclaw - ✅(Solved) Fix [Bug]: Windows gateway self-restart enters infinite retry loop — stale process never killed [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60878Fetched 2026-04-08 02:46:14
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
closed ×1cross-referenced ×1

On Windows, the in-process self-restart path (triggerOpenClawRestartrelaunchGatewayScheduledTask) fails to kill the old gateway process before launching the new one. The new gateway instance cannot bind port 18789, producing an infinite retry loop:

[gateway] already running under schtasks; waiting 5000ms before retrying startup

The root cause is that findGatewayPidsOnPortSync() in src/infra/restart-stale-pids.ts returns [] immediately on win32, so cleanStaleGatewayProcessesSync() never finds or terminates stale gateway processes.

Note: openclaw daemon restart is unaffected because it uses a separate code path (restartScheduledTask()terminateScheduledTaskGatewayListeners()) that correctly uses the Windows-aware findVerifiedGatewayListenerPidsOnPortSync().

Root Cause

The root cause is that findGatewayPidsOnPortSync() in src/infra/restart-stale-pids.ts returns [] immediately on win32, so cleanStaleGatewayProcessesSync() never finds or terminates stale gateway processes.

Fix Action

Fix / Workaround

  • Affected: All Windows users using the schtasks daemon supervisor with config-triggered or SIGUSR1 in-process restarts
  • Severity: High — gateway becomes permanently stuck, requires manual intervention (taskkill or Task Scheduler restart)
  • Frequency: 100% reproducible on any Windows self-restart trigger
  • Workaround: Use openclaw daemon restart (which uses a different code path that works correctly)

PR fix notes

PR #60480: fix: implement Windows stale gateway process cleanup before restart

Description (problem / solution / changelog)

Summary

Fixes #60878

findGatewayPidsOnPortSync() in src/infra/restart-stale-pids.ts returned [] immediately on Windows (process.platform === 'win32'), causing cleanStaleGatewayProcessesSync() to silently skip killing old gateway processes during self-restart via the schtasks supervisor path (triggerOpenClawRestart).

This caused an infinite retry loop on Windows:

[gateway] already running under schtasks; waiting 5000ms before retrying startup

The new gateway instance could never bind port 18789 because the old process was never terminated.

Root Cause

The self-restart path on Windows (triggerOpenClawRestart -> relaunchGatewayScheduledTask) calls cleanStaleGatewayProcessesSync() before spawning the schtasks restart script. But since findGatewayPidsOnPortSync returns [] on Windows, no stale processes are ever found or killed. The new schtasks-launched gateway then races the old one for the port and enters an unbounded retry loop.

Note: openclaw daemon restart works correctly because it uses the restartScheduledTask() path in service.ts, which properly calls terminateScheduledTaskGatewayListeners() with the Windows-aware findVerifiedGatewayListenerPidsOnPortSync(). Only the in-process self-restart path was broken.

Changes

New: src/infra/windows-port-pids.ts

Extracted all Windows-specific port scanning and process-args helpers from gateway-processes.ts into a shared module with configurable timeoutMs parameter. This:

  • Breaks the circular import between restart-stale-pids.ts and gateway-processes.ts (both now import from windows-port-pids.ts instead of from each other)
  • Fixes poll budget overshoot: Windows poll calls use POLL_SPAWN_TIMEOUT_MS (400ms) instead of the 5000ms default, keeping each poll within the waitForPortFreeSync 2s budget

src/infra/restart-stale-pids.ts

  • findGatewayPidsOnPortSync: On win32, discovers listening PIDs via readWindowsListeningPidsOnPortSync + verifies each with readWindowsProcessArgsSync / isGatewayArgv
  • pollPortOnceWindows: Uses readWindowsListeningPidsOnPortSync(port, 400) — just checks if port has any listener, no verification needed (stale gateway already killed)
  • terminateStaleProcessesSync: Add terminateStaleProcessesWindows() using taskkill.exe (graceful /T first, then /F force-kill) instead of SIGTERM/SIGKILL

src/infra/gateway-processes.ts

  • Delegates Windows helpers to the new windows-port-pids.ts module
  • Removes ~100 lines of inlined Windows functions
  • Keeps findVerifiedGatewayListenerPidsOnPortSync public API unchanged

src/infra/restart-stale-pids.test.ts

  • Mocks windows-port-pids.js (port scanning + process args) for the win32 platform-mock tests
  • Updated win32 tests verify delegation to readWindowsListeningPidsOnPortSync and readWindowsProcessArgsSync
  • Tests use real isGatewayArgv for full integration coverage

Testing

  • Lightly tested: verified fix resolves the infinite restart loop on Windows 11
  • Confirmed openclaw daemon restart and openclaw gateway call health work after fix
  • Existing Unix tests unaffected — all 38 restart-stale-pids tests pass
  • All 6 gateway-processes tests pass after refactor
  • Updated test mocks verify win32 delegation with real isGatewayArgv validation

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/infra/gateway-processes.ts (modified, +5/-108)
  • src/infra/restart-stale-pids.test.ts (modified, +262/-9)
  • src/infra/restart-stale-pids.ts (modified, +151/-3)
  • src/infra/windows-port-pids.ts (added, +151/-0)

Code Example

[gateway] already running under schtasks; waiting 5000ms before retrying startup

---

# Gateway log output during the infinite loop:
[gateway] already running under schtasks; waiting 5000ms before retrying startup
[gateway] already running under schtasks; waiting 5000ms before retrying startup
[gateway] already running under schtasks; waiting 5000ms before retrying startup
...
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

On Windows, the in-process self-restart path (triggerOpenClawRestartrelaunchGatewayScheduledTask) fails to kill the old gateway process before launching the new one. The new gateway instance cannot bind port 18789, producing an infinite retry loop:

[gateway] already running under schtasks; waiting 5000ms before retrying startup

The root cause is that findGatewayPidsOnPortSync() in src/infra/restart-stale-pids.ts returns [] immediately on win32, so cleanStaleGatewayProcessesSync() never finds or terminates stale gateway processes.

Note: openclaw daemon restart is unaffected because it uses a separate code path (restartScheduledTask()terminateScheduledTaskGatewayListeners()) that correctly uses the Windows-aware findVerifiedGatewayListenerPidsOnPortSync().

Steps to reproduce

  1. Install OpenClaw on Windows with the schtasks-based daemon supervisor.
  2. Start the gateway normally (openclaw daemon start).
  3. Trigger an in-process self-restart (e.g., config change that fires triggerOpenClawRestart, or SIGUSR1-equivalent restart).
  4. Observe the new gateway instance failing to start, retrying in a loop every 5 seconds.

Expected behavior

The self-restart path should:

  1. Detect the old gateway process listening on port 18789.
  2. Kill it using taskkill.exe (graceful /T, then forced /F).
  3. Wait for the port to be released.
  4. Launch the new gateway, which binds successfully.

Actual behavior

findGatewayPidsOnPortSync() returns [] on Windows (early return, no port inspection), so cleanStaleGatewayProcessesSync() is a no-op. The old gateway keeps running, the new one cannot bind the port, and the schtasks supervisor enters an unbounded 5-second retry loop that never resolves.

OpenClaw version

2026.4.3 (and earlier — the return [] for win32 has been present since the function was introduced)

Operating system

Windows 11

Install method

npm global

Model

N/A — affects all configurations

Provider / routing chain

N/A — affects all configurations

Additional provider/model setup details

No response

Logs, screenshots, and evidence

# Gateway log output during the infinite loop:
[gateway] already running under schtasks; waiting 5000ms before retrying startup
[gateway] already running under schtasks; waiting 5000ms before retrying startup
[gateway] already running under schtasks; waiting 5000ms before retrying startup
...

Impact and severity

  • Affected: All Windows users using the schtasks daemon supervisor with config-triggered or SIGUSR1 in-process restarts
  • Severity: High — gateway becomes permanently stuck, requires manual intervention (taskkill or Task Scheduler restart)
  • Frequency: 100% reproducible on any Windows self-restart trigger
  • Workaround: Use openclaw daemon restart (which uses a different code path that works correctly)

Additional information

Proposed fix: #60480

The fix:

  1. Extracts Windows port/process helpers into a shared src/infra/windows-port-pids.ts module with configurable timeoutMs
  2. Makes findGatewayPidsOnPortSync discover + verify Windows gateway PIDs via PowerShell/netstat
  3. Adds pollPortOnceWindows with a 400ms budget-compliant timeout for port-free polling
  4. Adds terminateStaleProcessesWindows using taskkill.exe (graceful /T then forced /F)
  5. Breaks the circular import between restart-stale-pids.ts and gateway-processes.ts

extent analysis

TL;DR

The most likely fix involves updating the findGatewayPidsOnPortSync function to correctly discover and verify Windows gateway PIDs, and then terminate stale processes using taskkill.exe.

Guidance

  • Review the proposed fix in #60480, which extracts Windows port/process helpers into a shared module and updates the findGatewayPidsOnPortSync function to work correctly on Windows.
  • Verify that the pollPortOnceWindows function is implemented with a suitable timeout to avoid excessive port-free polling.
  • Test the updated terminateStaleProcessesWindows function to ensure it correctly terminates stale gateway processes using taskkill.exe.
  • Check for any circular imports between restart-stale-pids.ts and gateway-processes.ts and refactor the code to break these imports.

Example

No code example is provided as the fix involves updating existing functions and implementing new ones, which is clearly outlined in the proposed fix #60480.

Notes

The fix should be applied to the src/infra/restart-stale-pids.ts file and may require additional changes to other related files. It's essential to thoroughly test the updated code to ensure it works correctly on Windows and does not introduce any new issues.

Recommendation

Apply the workaround by using openclaw daemon restart until the proposed fix in #60480 is implemented and verified to work correctly. This will allow the gateway to restart correctly on Windows without getting stuck in an infinite retry loop.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The self-restart path should:

  1. Detect the old gateway process listening on port 18789.
  2. Kill it using taskkill.exe (graceful /T, then forced /F).
  3. Wait for the port to be released.
  4. Launch the new gateway, which binds successfully.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Windows gateway self-restart enters infinite retry loop — stale process never killed [1 pull requests, 1 participants]