openclaw - ✅(Solved) Fix CLI WebSocket probe has no backoff on device-required rejection [1 pull requests, 2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63427Fetched 2026-04-09 07:53:52
View on GitHub
Comments
2
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
commented ×2cross-referenced ×1referenced ×1

Fix Action

Fixed

PR fix notes

PR #63446: fix(gateway): short-circuit repeated device-required probes in probeGateway

Description (problem / solution / changelog)

Summary

When many short-lived processes probe the same unpaired gateway in a burst — e.g. openclaw status invoked by cron sessions, monitoring scripts, or any wrapper that re-spawns the CLI — each new process starts a fresh GatewayClient with no memory of prior rejections. Every probe opens a WebSocket, hits the DEVICE_IDENTITY_REQUIRED handshake rejection, and logs a handshake failed cause=device-required entry.

Reporter in #63427 observed 1127 rejections in 24h from 73 separate sessions on a single unpaired gateway.

Fixes #63427

Caller chain in production

openclaw status  (cli/daemon-cli/status.gather.ts:442)
  -> inspectGatewayRestart
     -> inspectGatewayPortHealth
        -> confirmGatewayReachable  (restart-health.ts:72)
           -> probeGateway          (gateway/probe.ts:43)

GatewayClient itself already has exponential backoff AND pauses reconnect on DEVICE_IDENTITY_REQUIRED, but each probe creates a fresh GatewayClient instance, so the backoff never accumulates across invocations. That's what makes this invisible to single-client logic.

Fix

Track recent device-required rejections per URL at module scope in src/gateway/probe.ts:

  • DEVICE_REQUIRED_FAILURE_THRESHOLD = 3
  • DEVICE_REQUIRED_TTL_MS = 5 * 60_000 (5 min window)

After 3 device-required rejections within the TTL window, subsequent probes for the same URL short-circuit with a synthetic rejected result (same code: 1008, reason: \"device identity required\", plus a hint explaining the short-circuit) without creating another WebSocket. Any successful probe clears the cache entry so a newly-paired gateway resumes normal probing immediately.

The cache layer sits inside probeGateway() itself, so every caller (restart-health.ts, lifecycle.ts, and any future callers) gets the same protection without duplicating the logic.

Why at this layer (not in confirmGatewayReachable)

An alternative would be to add \"device\" to the looksLikeAuthClose keyword list in restart-health.ts so device-required closes are treated as "reachable but unauthenticated". That fixes the semantic issue but does not reduce log noise — each probe still opens a WebSocket and still trips the server-side handshake failed log. The noise is the actual user pain in #63427, so the cache has to live at the probe boundary.

Tests

Added a describe(\"device-required short-circuit cache (#63427)\") block in src/gateway/probe.test.ts with 5 cases:

  1. First 3 device-required probes hit the real GatewayClient mock (no premature short-circuit).
  2. 4th probe short-circuits without constructing a client — asserted by gatewayClientState.options === null after reset.
  3. Non-device-required closes (e.g. \"pairing required\") do NOT populate the cache even after 5 rejections — the 6th probe still hits the real client.
  4. A successful probe clears the cache so the next rejection cycle starts fresh on the same URL.
  5. Per-URL caches are independent — one unpaired gateway does not block probes to another.

Test helper __resetDeviceRequiredCacheForTests() is exported to reset the module-level cache between cases. It's namespaced with __ to mark it as internal / test-only.

Notes

  • Credit to @djimit for the detailed RCA in the issue and follow-up comment; the updated RCA ("each cron session creates a fresh GatewayClient") is exactly right and drove this fix.
  • Reporter's patched dist/probe-tkuAqDVF.js is the compiled output of src/gateway/probe.ts — this PR fixes it at source.
  • The reporter's "bonus suggestion" of an unauthenticated /health HTTP endpoint is out of scope for this PR; the cache alone removes the log-noise pain without broadening the gateway attack surface.

Changed files

  • src/gateway/probe.test.ts (modified, +113/-1)
  • src/gateway/probe.ts (modified, +98/-0)

Code Example

{"subsystem":"gateway/ws","cause":"device-required","handshake":"failed","durationMs":27,"lastFrameType":"req","lastFrameMethod":"connect","client":"cli","mode":"probe"}

---

const delay = Math.min(1000 * Math.pow(2, attempt), 30000); // 1s to 30s cap
RAW_BUFFERClick to expand / collapse

Problem

The OpenClaw CLI client probes the gateway WebSocket without exponential backoff. When the CLI lacks device pairing, the gateway rejects with cause: "device-required" and the CLI retries at ~1 request/second indefinitely.

Observed Behavior

  • 1,127 WebSocket connection rejections in 24 hours
  • All from client: "cli", mode: "probe", cause: "device-required"
  • Retry rate: ~1/sec in a tight loop with no delay
  • Pattern: continuous for hours until process exits

Log Sample

{"subsystem":"gateway/ws","cause":"device-required","handshake":"failed","durationMs":27,"lastFrameType":"req","lastFrameMethod":"connect","client":"cli","mode":"probe"}

Suggested Fix

Add exponential backoff to the CLI WebSocket retry loop:

const delay = Math.min(1000 * Math.pow(2, attempt), 30000); // 1s to 30s cap

Also add a max retry count (e.g., 10 attempts) then stop probing.

Bonus Suggestion

Add an unauthenticated /health HTTP GET endpoint as an alternative to WS probes for liveness checks. This avoids the WS auth handshake entirely for simple health checks.

Environment

  • OpenClaw version: 2026.4.x
  • OS: Windows 10/11
  • Gateway mode: local

extent analysis

TL;DR

Implement exponential backoff in the CLI WebSocket retry loop to prevent indefinite retries when the gateway rejects the connection due to lack of device pairing.

Guidance

  • Add exponential backoff to the WebSocket retry loop using a formula like const delay = Math.min(1000 * Math.pow(2, attempt), 30000) to introduce a delay between retries.
  • Implement a max retry count (e.g., 10 attempts) to prevent indefinite retries.
  • Consider adding an unauthenticated /health HTTP GET endpoint as an alternative to WebSocket probes for liveness checks.
  • Verify the fix by monitoring the retry rate and ensuring it decreases over time.

Example

let attempt = 0;
const maxAttempts = 10;
const maxDelay = 30000;

function retryWebSocketConnection() {
  if (attempt >= maxAttempts) {
    // Stop probing after max attempts
    return;
  }
  
  const delay = Math.min(1000 * Math.pow(2, attempt), maxDelay);
  attempt++;
  
  // Retry WebSocket connection after delay
  setTimeout(retryWebSocketConnection, delay);
}

Notes

This solution assumes that the attempt variable is properly reset when a successful connection is established. Additionally, the maxAttempts and maxDelay values can be adjusted based on the specific requirements of the application.

Recommendation

Apply the workaround by implementing exponential backoff and a max retry count in the CLI WebSocket retry loop, as this will prevent indefinite retries and reduce the load on the gateway.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING