openclaw - ✅(Solved) Fix CLI gateway handshake timeout too short for cold-start module compilation [1 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#51469Fetched 2026-04-08 01:10:47
View on GitHub
Comments
3
Participants
3
Timeline
13
Reactions
0
Author
Timeline (top)
referenced ×8commented ×3cross-referenced ×1subscribed ×1

CLI commands that connect to the gateway (e.g. openclaw cron list) fail with gateway closed (1000 normal closure): no close reason on systems where Node.js ESM module compilation takes longer than the gateway's 3-second WebSocket handshake timeout.

Error Message

gateway connect failed: Error: gateway closed (1000): Error: gateway closed (1000 normal closure): no close reason Gateway target: ws://127.0.0.1:18789

Root Cause

Through debugging, the following timeline was identified:

  1. T+0ms: CLI creates WebSocket connection to gateway
  2. T+~1ms: Gateway accepts TCP connection, sends connect.challenge event
  3. T+~8ms: Node.js setImmediate fires — event loop appears free
  4. T+3000ms: Gateway's handshake timer expires (3s default), closes WebSocket with code 1000
  5. T+~12000ms: CLI's event loop finally processes WebSocket open and message events
  6. CLI calls sendConnect()request("connect") → finds WebSocket already in CLOSING state → error

The ~12-second gap between WebSocket creation and event processing is caused by Node.js ESM module compilation blocking the event loop. The CLI's large bundled dist files (with 41 dynamic import() calls) take significant time to compile on cold start.

The gateway's DEFAULT_HANDSHAKE_TIMEOUT_MS = 3000 (in src/gateway/server-constants.ts) is insufficient for CLI clients that experience this cold-start delay.

Key evidence:

  • Standalone WebSocket test connects in ~120ms (no module compilation overhead)
  • Inside the CLI process, the same connection takes ~12 seconds
  • setInterval(100ms) set right after client.start() doesn't fire its first tick until +12 seconds later
  • Webchat is unaffected because browser JS is pre-compiled/bundled

Fix Action

Fix / Workaround

Current Workaround

Patching the compiled dist files (gateway-cli-*.js) to change DEFAULT_HANDSHAKE_TIMEOUT_MS from 3e3 to 15e3, then restarting the gateway. This fix is lost on every openclaw update.

PR fix notes

PR #51503: gateway: add gateway.handshakeTimeoutMs config and raise default to 15s

Description (problem / solution / changelog)

Summary

Fixes #51469

  • Raise DEFAULT_PREAUTH_HANDSHAKE_TIMEOUT_MS from 10s to 15s to cover CLI cold-start ESM compilation that can block the event loop for 12s+
  • Add gateway.handshakeTimeoutMs config field (range 1000-120000) so operators can tune the timeout without env vars
  • Extend getPreauthHandshakeTimeoutMsFromEnv with a gatewayConfig parameter and a clear 4-layer priority chain: OPENCLAW_HANDSHAKE_TIMEOUT_MS env → test env var (VITEST only, no minimum) → gateway.handshakeTimeoutMs config → 15s default
  • Add parseTimeoutOverride / validateConfigTimeout helpers for range validation (env vars silently fall back; config uses zod hard errors)

Files changed (9)

FileChange
src/config/types.gateway.tsAdd handshakeTimeoutMs?: number to GatewayConfig
src/gateway/handshake-timeouts.tsRaise default to 15s, add validation helpers, extend getPreauthHandshakeTimeoutMsFromEnv with config support
src/config/zod-schema.tsAdd z.number().int().min(1000).max(120000).optional()
src/config/schema.labels.tsAdd label
src/config/schema.help.tsAdd help text
src/gateway/server/ws-connection.tsPass loadConfig().gateway into getPreauthHandshakeTimeoutMsFromEnv
src/gateway/server.auth.shared.tsRe-export DEFAULT_PREAUTH_HANDSHAKE_TIMEOUT_MS and getPreauthHandshakeTimeoutMsFromEnv
src/config/config-misc.test.ts6 schema validation tests
src/gateway/server.auth.default-token.suite.ts6 priority/fallback/bounds tests

Test plan

  • pnpm test -- src/config/config-misc.test.ts - 6 new schema tests pass
  • pnpm test -- src/gateway/server.auth.default-token.test.ts - 6 new priority tests pass
  • pnpm test -- src/config/schema.help.quality.test.ts - label/help parity verified
  • pnpm format - clean
  • pnpm tsgo - clean (no new errors; pre-existing msteams/telegram/skills errors unrelated)
  • pnpm lint - clean
  • pnpm check - all checks pass

AI Disclosure

  • AI-assisted: This PR was developed with Claude Code (issue analysis, implementation, tests, and review responses)
  • Testing level: Fully tested — 12 new unit tests covering schema validation, priority chain, fallback behavior, and bounds enforcement
  • Understanding: The author has reviewed and understands all changes; the design was iterated through multiple rounds of manual code review before implementation

🤖 Generated with Claude Code

Changed files

  • docs/.generated/config-baseline.json (modified, +28/-10)
  • docs/.generated/config-baseline.jsonl (modified, +3/-2)
  • src/config/config-misc.test.ts (modified, +65/-0)
  • src/config/schema.help.ts (modified, +2/-0)
  • src/config/schema.labels.ts (modified, +1/-0)
  • src/config/types.gateway.ts (modified, +8/-0)
  • src/config/zod-schema.ts (modified, +1/-0)
  • src/gateway/handshake-timeouts.test.ts (modified, +10/-2)
  • src/gateway/handshake-timeouts.ts (modified, +61/-10)
  • src/gateway/server.auth.default-token.suite.ts (modified, +116/-2)
  • src/gateway/server.auth.shared.ts (modified, +4/-1)
  • src/gateway/server/ws-connection.ts (modified, +5/-1)

Code Example

gateway connect failed: Error: gateway closed (1000):
Error: gateway closed (1000 normal closure): no close reason
Gateway target: ws://127.0.0.1:18789

---

{
  "gateway": {
    "handshakeTimeoutMs": 15000
  }
}

---

const getHandshakeTimeoutMs = () => {
  // Config file
  const configValue = config.gateway?.handshakeTimeoutMs;
  if (typeof configValue === 'number' && configValue > 0) return configValue;
  // Env var override
  const envValue = Number(process.env.OPENCLAW_HANDSHAKE_TIMEOUT_MS);
  if (Number.isFinite(envValue) && envValue > 0) return envValue;
  // Default
  return DEFAULT_HANDSHAKE_TIMEOUT_MS; // 15000
};
RAW_BUFFERClick to expand / collapse

Summary

CLI commands that connect to the gateway (e.g. openclaw cron list) fail with gateway closed (1000 normal closure): no close reason on systems where Node.js ESM module compilation takes longer than the gateway's 3-second WebSocket handshake timeout.

Environment

  • OpenClaw: 2026.3.13 (61d171a)
  • Node.js: v22.22.1
  • OS: Ubuntu Linux (systemd service)
  • Gateway mode: local (loopback)

Steps to Reproduce

  1. Install openclaw, configure gateway in local mode
  2. Run openclaw cron list (or any CLI command that connects to the gateway)
  3. Observe the error:
gateway connect failed: Error: gateway closed (1000):
Error: gateway closed (1000 normal closure): no close reason
Gateway target: ws://127.0.0.1:18789

Note: openclaw gateway status (which uses RPC probe, not WebSocket handshake) works fine. Webchat also works fine.

Root Cause Analysis

Through debugging, the following timeline was identified:

  1. T+0ms: CLI creates WebSocket connection to gateway
  2. T+~1ms: Gateway accepts TCP connection, sends connect.challenge event
  3. T+~8ms: Node.js setImmediate fires — event loop appears free
  4. T+3000ms: Gateway's handshake timer expires (3s default), closes WebSocket with code 1000
  5. T+~12000ms: CLI's event loop finally processes WebSocket open and message events
  6. CLI calls sendConnect()request("connect") → finds WebSocket already in CLOSING state → error

The ~12-second gap between WebSocket creation and event processing is caused by Node.js ESM module compilation blocking the event loop. The CLI's large bundled dist files (with 41 dynamic import() calls) take significant time to compile on cold start.

The gateway's DEFAULT_HANDSHAKE_TIMEOUT_MS = 3000 (in src/gateway/server-constants.ts) is insufficient for CLI clients that experience this cold-start delay.

Key evidence:

  • Standalone WebSocket test connects in ~120ms (no module compilation overhead)
  • Inside the CLI process, the same connection takes ~12 seconds
  • setInterval(100ms) set right after client.start() doesn't fire its first tick until +12 seconds later
  • Webchat is unaffected because browser JS is pre-compiled/bundled

Suggested Fix

  1. Increase the default handshake timeout from 3s to at least 15s (CLI cold start can take 12+ seconds)
  2. Make the timeout configurable via openclaw.json:
{
  "gateway": {
    "handshakeTimeoutMs": 15000
  }
}
  1. Allow env var override (without requiring VITEST):
const getHandshakeTimeoutMs = () => {
  // Config file
  const configValue = config.gateway?.handshakeTimeoutMs;
  if (typeof configValue === 'number' && configValue > 0) return configValue;
  // Env var override
  const envValue = Number(process.env.OPENCLAW_HANDSHAKE_TIMEOUT_MS);
  if (Number.isFinite(envValue) && envValue > 0) return envValue;
  // Default
  return DEFAULT_HANDSHAKE_TIMEOUT_MS; // 15000
};

Current Workaround

Patching the compiled dist files (gateway-cli-*.js) to change DEFAULT_HANDSHAKE_TIMEOUT_MS from 3e3 to 15e3, then restarting the gateway. This fix is lost on every openclaw update.

extent analysis

Fix Plan

To resolve the issue, we will implement the following steps:

  • Increase the default handshake timeout from 3s to at least 15s
  • Make the timeout configurable via openclaw.json
  • Allow env var override

Code Changes

We will update the src/gateway/server-constants.ts file to include the new default handshake timeout and add a function to retrieve the handshake timeout from the config file or env var.

// src/gateway/server-constants.ts
export const DEFAULT_HANDSHAKE_TIMEOUT_MS = 15000;

const getHandshakeTimeoutMs = () => {
  // Config file
  const configValue = config.gateway?.handshakeTimeoutMs;
  if (typeof configValue === 'number' && configValue > 0) return configValue;
  // Env var override
  const envValue = Number(process.env.OPENCLAW_HANDSHAKE_TIMEOUT_MS);
  if (Number.isFinite(envValue) && envValue > 0) return envValue;
  // Default
  return DEFAULT_HANDSHAKE_TIMEOUT_MS;
};

We will also update the openclaw.json file to include the handshake timeout configuration:

{
  "gateway": {
    "handshakeTimeoutMs": 15000
  }
}

Verification

To verify that the fix worked, we can run the openclaw cron list command and check that it no longer fails with the gateway closed (1000 normal closure): no close reason error.

Extra Tips

  • Make sure to update the openclaw.json file with the correct handshake timeout configuration.
  • If you want to override the handshake timeout using an env var, set the OPENCLAW_HANDSHAKE_TIMEOUT_MS env var before running the openclaw command.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix CLI gateway handshake timeout too short for cold-start module compilation [1 pull requests, 3 comments, 3 participants]