openclaw - 💡(How to fix) Fix [Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63491Fetched 2026-04-09 07:53:09
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

On Windows with the gateway installed as a Scheduled Task, openclaw gateway restart can repeatedly time out with:

  • Timed out after 60s waiting for gateway port 18789 to become healthy
  • Service runtime: status=unknown
  • Port 18789 is already in use

This environment appears to hit more than one problem at once:

  1. A known local loopback probe false negative on restart (ws ... code=1008 reason=connect failed / device-required)
  2. Cron/job/session state corruption after restart (runningAtMs / stale cron session state)
  3. An additional post-ready instability where the gateway can log ready (...) and even bind 18789, but /health and / later stop responding or the port becomes free again

I am filing this because the first two have close neighbors in existing issues/PRs, but I have not found a single Windows issue that covers the full combined behavior end-to-end.

Root Cause

I am filing this because the first two have close neighbors in existing issues/PRs, but I have not found a single Windows issue that covers the full combined behavior end-to-end.

Fix Action

Fix / Workaround

I also locally patched the CLI to treat loopback HTTP /health and local 1008 policy closes as healthy enough for restart probing, which reduced one class of false negatives, but did not eliminate the post-ready instability.

Code Example

2026-04-09T09:55:37.924+08:00 [gateway/ws] closed before connect ... code=1008 reason=connect failed
2026-04-09T10:02:48.014+08:00 Timed out after 60s waiting for gateway port 18789 to become healthy.
2026-04-09T10:02:48.045+08:00 Service runtime: status=unknown
2026-04-09T10:02:48.049+08:00 Gateway port 18789 status: free.
2026-04-09T10:05:23.293+08:00 [gateway] ready (0 plugins, 27.5s)
2026-04-09T10:05:28.021+08:00 [cron] cron: started
RAW_BUFFERClick to expand / collapse

[Bug]: Windows Scheduled Task gateway restart/health becomes inconsistent after ready; mixes known probe false negatives with cron/session stale state and post-ready HTTP loss

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

On Windows with the gateway installed as a Scheduled Task, openclaw gateway restart can repeatedly time out with:

  • Timed out after 60s waiting for gateway port 18789 to become healthy
  • Service runtime: status=unknown
  • Port 18789 is already in use

This environment appears to hit more than one problem at once:

  1. A known local loopback probe false negative on restart (ws ... code=1008 reason=connect failed / device-required)
  2. Cron/job/session state corruption after restart (runningAtMs / stale cron session state)
  3. An additional post-ready instability where the gateway can log ready (...) and even bind 18789, but /health and / later stop responding or the port becomes free again

I am filing this because the first two have close neighbors in existing issues/PRs, but I have not found a single Windows issue that covers the full combined behavior end-to-end.

OpenClaw version

OpenClaw 2026.4.8 (9ece252)

Operating system

Windows (PowerShell 5.1.22621.4249)

Install method

npm global install + openclaw gateway install / Scheduled Task

Model

bailian/qwen3.5-plus

Provider / routing chain

Ali / Bailian

Additional provider/model setup details

  • Node.js upgraded to v22.22.2
  • Repro observed both before and after upgrade from 2026.4.5 to 2026.4.8
  • Repro observed with normal config and also with external channels/providers largely disabled during bisecting

Steps to reproduce

  1. Install/run gateway on Windows via Scheduled Task
  2. Have existing cron jobs in ~/.openclaw/cron/jobs.json
  3. Run openclaw gateway restart
  4. Observe one or more of the following sequences:

Sequence A:

  • CLI waits 60s and prints timeout
  • log shows local WS probe closed with 1008 / connect failed
  • gateway may actually already be alive

Sequence B:

  • gateway reaches:
    • starting HTTP server...
    • ready (... plugins, ...s)
    • cron: started
  • but http://127.0.0.1:18789/health and / later time out or the port becomes free again

Sequence C:

  • cron jobs recover from UI edits/restart into stale state
  • previously seen local failures included TypeError: Cannot read properties of undefined (reading 'runningAtMs')
  • stale runningAtMs / stale cron session state prevented clean recovery without manual intervention

Expected behavior

  • openclaw gateway restart should succeed when the restarted local gateway is already healthy enough to reject unauthenticated loopback probes
  • Scheduled Task runtime and port ownership should stay consistent
  • Cron startup should not preserve impossible stale running state
  • Once the gateway logs ready (...), /health and / should remain responsive instead of later hanging or disappearing

Actual behavior

Observed across repeated runs on 2026-04-08 and 2026-04-09:

  • openclaw gateway restart times out after 60s
  • logs show loopback WS probe closure:
    • code=1008 reason=connect failed
    • cause":"device-required"
  • sometimes port 18789 is reported busy while runtime status is unknown
  • sometimes gateway logs ready (...) and later port 18789 becomes free again
  • sometimes /health is briefly reachable, then later times out
  • cron previously failed with missing or stale runningAtMs-related state

Representative log lines:

2026-04-09T09:55:37.924+08:00 [gateway/ws] closed before connect ... code=1008 reason=connect failed
2026-04-09T10:02:48.014+08:00 Timed out after 60s waiting for gateway port 18789 to become healthy.
2026-04-09T10:02:48.045+08:00 Service runtime: status=unknown
2026-04-09T10:02:48.049+08:00 Gateway port 18789 status: free.
2026-04-09T10:05:23.293+08:00 [gateway] ready (0 plugins, 27.5s)
2026-04-09T10:05:28.021+08:00 [cron] cron: started

Related issues / likely overlap

  • #48771 and PR #48801: Windows/local restart false negative when loopback WS probe is closed with 1008 / connect failed / device required
  • #44920: stale cron runningAtMs after restart
  • #59511: local http://127.0.0.1:18789/health not usable after gateway run
  • #60295: different OS, but similar “restart times out while service state/port ownership is inconsistent”

What I found during local debugging

I did substantial local debugging because the machine was stuck in production use:

  • upgraded OpenClaw from 2026.4.5 to 2026.4.8
  • upgraded Node.js to 22.22.2
  • isolated/remediated several local issues:
    • old incompatible channel config fields after upgrade
    • untracked local plugin auto-loading
    • stale cron job/session state
  • after that cleanup, the remaining issue was still reproducible:
    • gateway reaches ready (...)
    • HTTP health/UI later become unreachable or unstable

I also locally patched the CLI to treat loopback HTTP /health and local 1008 policy closes as healthy enough for restart probing, which reduced one class of false negatives, but did not eliminate the post-ready instability.

That suggests there may still be a deeper Windows gateway/runtime bug after startup, beyond the already-known restart probe issue.

Impact and severity

High for Windows users relying on Scheduled Task mode:

  • restart automation becomes unreliable
  • control UI availability becomes inconsistent
  • cron jobs can be left in broken/stale state after restart cycles
  • users may see a mixture of “service is up”, “service is unknown”, and “port is free” across the same debugging session

Logs, screenshots, and evidence

I can provide:

  • full openclaw-2026-04-08.log / openclaw-2026-04-09.log
  • openclaw gateway restart terminal output
  • openclaw gateway status --json output from both healthy and unhealthy moments
  • details of the stale cron/session state observed in ~/.openclaw/cron/jobs.json and session index cleanup

Additional information

If helpful, I can also open a follow-up issue with a narrower repro focused only on:

  1. Windows Scheduled Task + restart probe false negative
  2. Cron stale runningAtMs / session state after restart
  3. Post-ready HTTP hang / port disappearance

because on this machine they appeared stacked together.

extent analysis

TL;DR

The most likely fix for the inconsistent Windows Scheduled Task gateway restart and health issues is to address the known local loopback probe false negatives, cron/session state corruption, and post-ready HTTP instability through a combination of code changes, configuration adjustments, and potential updates to the OpenClaw version.

Guidance

  1. Investigate and apply fixes from related issues: Review and apply relevant fixes or workarounds from issues #48771, #48801, #44920, #59511, and #60295 to address the loopback probe false negatives, cron stale state, and HTTP health instability.
  2. Verify cron job and session state management: Ensure that cron jobs and session states are properly managed and cleaned up after restarts to prevent stale states from causing issues.
  3. Monitor and adjust the gateway's restart probing: Consider adjusting the restart probing mechanism to treat loopback HTTP /health and local 1008 policy closes as healthy enough for restart probing, as the user's local patch suggests this reduces false negatives.
  4. Upgrade OpenClaw and Node.js: Although the issue persists after upgrading to OpenClaw 2026.4.8 and Node.js 22.22.2, continue to monitor for updates that may address the underlying issues.

Example

No specific code snippet is provided due to the complexity and variability of the issue, but users can explore adjusting the restart probing logic as mentioned in the guidance section.

Notes

The issue seems to be a combination of multiple problems, making it challenging to provide a single, definitive fix. The guidance provided aims to help mitigate or workaround the known issues, but further investigation and potential updates to OpenClaw or its dependencies may be necessary for a complete resolution.

Recommendation

Apply workaround: Given the complexity and the fact that the issue involves multiple known problems, applying the workarounds and fixes from related issues and adjusting the configuration as suggested seems to be the most practical approach at this time.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • openclaw gateway restart should succeed when the restarted local gateway is already healthy enough to reject unauthenticated loopback probes
  • Scheduled Task runtime and port ownership should stay consistent
  • Cron startup should not preserve impossible stale running state
  • Once the gateway logs ready (...), /health and / should remain responsive instead of later hanging or disappearing

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING