openclaw - 💡(How to fix) Fix Regression: Windows gateway probe/connect timeout returns in 2026.4.26 [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74279Fetched 2026-04-30 06:26:16
View on GitHub
Comments
2
Participants
3
Timeline
6
Reactions
4
Timeline (top)
cross-referenced ×3commented ×2closed ×1

Root Cause

Working workaround:

  • Downgrade/pin OpenClaw to 2026.4.23.
  • Use channels status --deep / channels status --probe as the watchdog health check instead of gateway probe, because gateway probe was a false negative.
  • Keep gateway handshake timeout env vars high on Windows.
  • Keep the broken bundled provider extensions out of the load path until their packaging/export issue is fixed.

Fix Action

Fix / Workaround

Environment:

  • Windows: Microsoft Windows NT 10.0.26200.0
  • Install: npm global
  • OpenClaw latest tested: 2026.4.26
  • Stable workaround: 2026.4.23 (a979721)
  • Node runtime used by gateway: v22.22.2
  • Codex CLI: 0.125.0
  • Gateway: local, loopback, token auth, port 18789

Working workaround:

  • Downgrade/pin OpenClaw to 2026.4.23.
  • Use channels status --deep / channels status --probe as the watchdog health check instead of gateway probe, because gateway probe was a false negative.
  • Keep gateway handshake timeout env vars high on Windows.
  • Keep the broken bundled provider extensions out of the load path until their packaging/export issue is fixed.
RAW_BUFFERClick to expand / collapse

This looks like a regression of the Windows gateway probe / connect-timeout class of failures that was previously marked fixed around 2026.3.22.

Related locked/closed issues:

  • #45560: gateway probe timeout while gateway healthy
  • #49726: connect challenge / handshake timeout under gateway load

Environment:

  • Windows: Microsoft Windows NT 10.0.26200.0
  • Install: npm global
  • OpenClaw latest tested: 2026.4.26
  • Stable workaround: 2026.4.23 (a979721)
  • Node runtime used by gateway: v22.22.2
  • Codex CLI: 0.125.0
  • Gateway: local, loopback, token auth, port 18789

Symptoms on 2026.4.26:

  • Gateway process starts and listens on 127.0.0.1:18789.
  • Logs show http server listening / gateway ready.
  • openclaw gateway probe --timeout 60000 reports loopback connect timeout / unreachable.
  • Gateway logs show handshake-timeout with lastFrameMethod:"connect".
  • This persisted with Telegram disabled and with minimal/near-zero plugins.
  • In contrast, Control UI/webchat and runtime RPC paths could still work, so gateway probe became an unreliable health source.
  • A watchdog relying on gateway probe repeatedly restarted an otherwise usable gateway.

Additional Windows-specific finding:

  • Bundled provider extensions anthropic and lmstudio attempted to load even though they were disabled in config.
  • They failed with ERR_PACKAGE_PATH_NOT_EXPORTED from their bundled @mariozechner/pi-ai dependency.
  • Those load failures appeared to block startup/handshake long enough to trigger more WS handshake timeouts.
  • Quarantining those two broken bundled extension directories plus using longer OPENCLAW_HANDSHAKE_TIMEOUT_MS / OPENCLAW_CONNECT_CHALLENGE_TIMEOUT_MS helped stabilize the local install.

Working workaround:

  • Downgrade/pin OpenClaw to 2026.4.23.
  • Use channels status --deep / channels status --probe as the watchdog health check instead of gateway probe, because gateway probe was a false negative.
  • Keep gateway handshake timeout env vars high on Windows.
  • Keep the broken bundled provider extensions out of the load path until their packaging/export issue is fixed.

Why I think this should be treated as a regression:

  • The previously fixed handshake/probe failure is reproducible again on a newer Windows build.
  • The strongest signal is the same contradiction described in prior issues: gateway/channel runtime healthy and Telegram working, while gateway probe reports timeout/unreachable.
  • There may be two interacting issues: the CLI/probe WS handshake race, plus disabled/broken provider extension loading causing event-loop stalls during gateway/connect handling.

Sanitized key signatures:

  • gateway ready
  • gateway probe: Local loopback ws://127.0.0.1:18789 -> Connect: failed - timeout
  • gateway log: handshake timeout ... lastFrameMethod=connect
  • provider load errors: anthropic/lmstudio failed to load ... ERR_PACKAGE_PATH_NOT_EXPORTED ... @mariozechner/pi-ai

I can provide more sanitized logs if useful.

extent analysis

TL;DR

Downgrade OpenClaw to version 2026.4.23 and adjust the watchdog health check to use channels status --deep or channels status --probe instead of gateway probe to mitigate the regression.

Guidance

  • The issue seems to be related to a regression in the Windows gateway probe/connect-timeout class of failures, which was previously fixed.
  • The gateway probe command is reporting a loopback connect timeout/unreachable, while the gateway logs show a handshake-timeout with lastFrameMethod:"connect".
  • Disabling or quarantining the broken bundled provider extensions (anthropic and lmstudio) and increasing the handshake timeout environment variables (OPENCLAW_HANDSHAKE_TIMEOUT_MS and OPENCLAW_CONNECT_CHALLENGE_TIMEOUT_MS) may help stabilize the local install.
  • Using a longer timeout for the gateway probe command or switching to a different health check method may also help mitigate the issue.

Notes

The root cause of the regression is not entirely clear, but it appears to be related to the interaction between the CLI/probe WS handshake and the loading of disabled/broken provider extensions. Further investigation and sanitized logs may be necessary to fully resolve the issue.

Recommendation

Apply the workaround by downgrading OpenClaw to version 2026.4.23 and adjusting the watchdog health check, as this has been shown to stabilize the local install and mitigate the regression.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Regression: Windows gateway probe/connect timeout returns in 2026.4.26 [2 comments, 3 participants]