openclaw - ✅(Solved) Fix CLI → Gateway WebSocket connections timeout in v2026.3.12+ (Regression) [1 pull requests, 4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#51438Fetched 2026-04-08 01:11:15
View on GitHub
Comments
4
Participants
4
Timeline
7
Reactions
0
Timeline (top)
commented ×4labeled ×2subscribed ×1

After upgrading from v2026.3.11 to v2026.3.12 or v2026.3.13, all CLI commands that connect to the gateway (openclaw logs, openclaw status, etc.) fail with WebSocket connection timeouts in the 3–10 second range. Downgrading to v2026.3.11 resolves the issue immediately. The 3-second WebSocket handshake timeout introduced in PR #44089 is too aggressive for resource-constrained environments.

Error Message

Root cause: PR #44089 ("Gateway/WebSocket: add per-connection handshake timeout") changed DEFAULT_HANDSHAKE_TIMEOUT_MS from 10_000 (10s, v2026.3.11) to 3_000 (3s, v2026.3.12) in src/gateway/server-constants.ts.

Version table:

VersionDEFAULT_HANDSHAKE_TIMEOUT_MSCLI → Gateway
2026.3.810,000works
2026.3.1110,000works
2026.3.123,000timeout (3–10s)
2026.3.133,000timeout (3–10s)

Error seen: gateway connect failed: Error: gateway closed (1000): Gateway not reachable. Is it running and accessible?

There is no production environment variable to override this — only OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS which requires VITEST=1.

References:

Root Cause

Root cause: PR #44089 ("Gateway/WebSocket: add per-connection handshake timeout") changed DEFAULT_HANDSHAKE_TIMEOUT_MS from 10_000 (10s, v2026.3.11) to 3_000 (3s, v2026.3.12) in src/gateway/server-constants.ts.

Fix Action

Fix / Workaround

  1. Install OpenClaw v2026.3.11 → openclaw logs --follow works fine
  2. Upgrade to v2026.3.12 (or v2026.3.13) → CLI commands timeout with "gateway closed (1000)" error
  3. Downgrade back to v2026.3.11 → works again

Affected: Users on resource-constrained hardware (VMs, low-end devices) running OpenClaw CLI commands against the gateway Severity: High (completely blocks CLI access to the gateway — openclaw logs, status, and all other commands fail) Frequency: 100% — every CLI command that connects to the gateway is affected Consequence: Cannot use openclaw logs, openclaw status, openclaw logs --follow, or any CLI command. Forces immediate downgrade to v2026.3.11 as the only workaround.

PR fix notes

PR #44089: Hardening: tighten preauth WebSocket handshake limits

Description (problem / solution / changelog)

Summary

  • Problem: unauthenticated WebSocket clients could keep handshake sockets open for the full timeout window and send large pre-auth connect frames into application-layer parsing.
  • Why it matters: this increases pre-auth resource exposure on unsupported publicly exposed deployments.
  • What changed: shorten the unauthenticated handshake timeout and reject oversized pre-auth payloads before JSON parsing/auth logic runs.
  • What did NOT change (scope boundary): authenticated session behavior and supported deployment guidance.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #N/A
  • Related #GHSA-jv4g-m82p-2j93
  • Related #GHSA-xwx2-ppv2-wx98

User-visible / Behavior Changes

Unauthenticated WebSocket handshakes now time out faster, and oversized pre-auth frames are closed with 1009 before application-layer auth handling.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22 / pnpm
  • Model/provider: N/A
  • Integration/channel (if any): Gateway WebSocket server
  • Relevant config (redacted): default unauthenticated gateway WebSocket path

Steps

  1. Open a gateway WebSocket connection without authenticating.
  2. Observe the connect.challenge and leave the socket idle.
  3. Repeat with a 4 MiB connect frame before pairing.

Expected

  • Idle unauthenticated sockets should close quickly.
  • Oversized pre-auth frames should be rejected before JSON parse/auth dispatch.

Actual

  • Before this change, idle sockets stayed open for about 10 seconds and large pre-auth frames reached application-layer parsing.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: reproduced unauthenticated connect.challenge, measured idle close timing, reproduced a 4 MiB pre-auth frame reaching auth handling before the fix, and reran the new regression tests.
  • Edge cases checked: authenticated flow remains unchanged by the pre-auth size gate; normal small pre-auth messages still parse.
  • What you did not verify: throughput/perf impact under production load.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly: revert this branch.
  • Files/config to restore: gateway handshake constants and pre-auth message handling.
  • Known bad symptoms reviewers should watch for: legitimate pre-auth clients with unusually large connect payloads closing with 1009.

Risks and Mitigations

  • Risk: clients with oversized pre-auth payloads may now fail earlier.
    • Mitigation: the limit only applies before authentication and is set high enough for normal handshake payloads.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/gateway/server-constants.ts (modified, +2/-1)
  • src/gateway/server-runtime-state.ts (modified, +2/-2)
  • src/gateway/server.preauth-hardening.test.ts (added, +77/-0)
  • src/gateway/server/ws-connection/message-handler.ts (modified, +39/-1)

Code Example

Root cause: PR #44089 ("Gateway/WebSocket: add per-connection handshake timeout") changed DEFAULT_HANDSHAKE_TIMEOUT_MS from 10_000 (10s, v2026.3.11) to 3_000 (3s, v2026.3.12) in src/gateway/server-constants.ts.

Version table:
| Version | DEFAULT_HANDSHAKE_TIMEOUT_MS | CLIGateway |
|---|---|---|
| 2026.3.8 | 10,000 | works |
| 2026.3.11 | 10,000 | works |
| 2026.3.12 | 3,000 | timeout (3–10s) |
| 2026.3.13 | 3,000 | timeout (3–10s) |

Error seen:
gateway connect failed: Error: gateway closed (1000):
Gateway not reachable. Is it running and accessible?

There is no production environment variable to override this — only OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS which requires VITEST=1.

References:
- PR #44089: https://github.com/openclaw/openclaw/pull/44089
- v2026.3.12 release: https://github.com/openclaw/openclaw/releases/tag/v2026.3.12
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Summary

After upgrading from v2026.3.11 to v2026.3.12 or v2026.3.13, all CLI commands that connect to the gateway (openclaw logs, openclaw status, etc.) fail with WebSocket connection timeouts in the 3–10 second range. Downgrading to v2026.3.11 resolves the issue immediately. The 3-second WebSocket handshake timeout introduced in PR #44089 is too aggressive for resource-constrained environments.

Steps to reproduce

  1. Install OpenClaw v2026.3.11 → openclaw logs --follow works fine
  2. Upgrade to v2026.3.12 (or v2026.3.13) → CLI commands timeout with "gateway closed (1000)" error
  3. Downgrade back to v2026.3.11 → works again

Expected behavior

CLI commands (openclaw logs, openclaw status, openclaw logs --follow, etc.) connect to the gateway and execute successfully, as they did in v2026.3.11 and earlier versions.

Actual behavior

Commands hang for 3–10 seconds then fail with: gateway connect failed: Error: gateway closed (1000): Gateway not reachable. Is it running and accessible?

OpenClaw version

2026.3.11 (works) / 2026.3.12 and 2026.3.13 (broken)

Operating system

Linux (VM with limited resources — not a performant Mac Mini or MacBook)

Install method

npm global (npm install -g)

Model

N/A (CLI → gateway connection issue, no model involved)

Provider / routing chain

Local connection (CLI and gateway on same machine, loopback)

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Root cause: PR #44089 ("Gateway/WebSocket: add per-connection handshake timeout") changed DEFAULT_HANDSHAKE_TIMEOUT_MS from 10_000 (10s, v2026.3.11) to 3_000 (3s, v2026.3.12) in src/gateway/server-constants.ts.

Version table:
| Version | DEFAULT_HANDSHAKE_TIMEOUT_MS | CLI → Gateway |
|---|---|---|
| 2026.3.8 | 10,000 | works |
| 2026.3.11 | 10,000 | works |
| 2026.3.12 | 3,000 | timeout (3–10s) |
| 2026.3.13 | 3,000 | timeout (3–10s) |

Error seen:
gateway connect failed: Error: gateway closed (1000):
Gateway not reachable. Is it running and accessible?

There is no production environment variable to override this — only OPENCLAW_TEST_HANDSHAKE_TIMEOUT_MS which requires VITEST=1.

References:
- PR #44089: https://github.com/openclaw/openclaw/pull/44089
- v2026.3.12 release: https://github.com/openclaw/openclaw/releases/tag/v2026.3.12

Impact and severity

Affected: Users on resource-constrained hardware (VMs, low-end devices) running OpenClaw CLI commands against the gateway Severity: High (completely blocks CLI access to the gateway — openclaw logs, status, and all other commands fail) Frequency: 100% — every CLI command that connects to the gateway is affected Consequence: Cannot use openclaw logs, openclaw status, openclaw logs --follow, or any CLI command. Forces immediate downgrade to v2026.3.11 as the only workaround.

Additional information

Regression: Last known good version 2026.3.11, first known bad version 2026.3.12.

Proposed fix: Add a production-usable environment variable to override the handshake timeout (e.g. OPENCLAW_HANDSHAKE_TIMEOUT_MS), so users on constrained hardware are not locked out. A more conservative default (e.g. 10s) would also be more forgiving across diverse deployment environments.

extent analysis

Fix Plan

To resolve the WebSocket connection timeout issue, we will introduce a new environment variable OPENCLAW_HANDSHAKE_TIMEOUT_MS that allows users to override the default handshake timeout.

Step-by-Step Solution

  1. Add environment variable: Introduce a new environment variable OPENCLAW_HANDSHAKE_TIMEOUT_MS in src/gateway/server-constants.ts.
  2. Update default timeout: Consider increasing the default handshake timeout to a more conservative value, such as 10 seconds.
  3. Apply environment variable: Use the OPENCLAW_HANDSHAKE_TIMEOUT_MS environment variable to override the default handshake timeout in the WebSocket connection setup.

Example Code

// src/gateway/server-constants.ts
export const DEFAULT_HANDSHAKE_TIMEOUT_MS = 10_000; // Increased default timeout

// src/gateway/websocket.ts
import { DEFAULT_HANDSHAKE_TIMEOUT_MS } from './server-constants';

const handshakeTimeoutMs = parseInt(process.env.OPENCLAW_HANDSHAKE_TIMEOUT_MS, 10) || DEFAULT_HANDSHAKE_TIMEOUT_MS;

// Use handshakeTimeoutMs in WebSocket connection setup

Verification

To verify the fix, set the OPENCLAW_HANDSHAKE_TIMEOUT_MS environment variable to a value greater than the default (e.g., 30 seconds) and run the OpenClaw CLI commands. The commands should no longer timeout and should execute successfully.

Extra Tips

  • Consider adding documentation for the new environment variable to help users understand how to override the default handshake timeout.
  • Review the default handshake timeout value to ensure it is suitable for diverse deployment environments.
  • Test the fix thoroughly to ensure it resolves the issue without introducing new problems.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

CLI commands (openclaw logs, openclaw status, openclaw logs --follow, etc.) connect to the gateway and execute successfully, as they did in v2026.3.11 and earlier versions.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING