openclaw - 💡(How to fix) Fix Gateway WebSocket connections leak - CLOSE_WAIT/FIN_WAIT_2 zombie connections cause crashes [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56215Fetched 2026-04-08 01:43:29
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Author
Timeline (top)
commented ×1

Error Message

  1. Gateway eventually crashes (no error output to stderr)
  2. Error logging via stderr redirect: gateway.cmd 2>> gateway-error.log
  • No error output captured in stderr during crashes

Root Cause

This is a socket connection management bug in the Gateway:

  1. WebSocket clients (main OpenClaw process, PID 1424) request connection close
  2. Gateway WebSocket handler does not properly respond to close events
  3. Connections remain in CLOSE_WAIT/FIN_WAIT_2 states
  4. Zombie connections accumulate over time
  5. Eventually causes Gateway instability and crashes

Fix Action

Fix / Workaround

Workaround (Implemented)

  1. Gateway Guardian watchdog process (auto-restart within 30 seconds)
  2. Daily scheduled restart at 04:00 AM to clear zombie connections
  3. Error logging via stderr redirect: gateway.cmd 2>> gateway-error.log
RAW_BUFFERClick to expand / collapse

GitHub Issue Report: Gateway Socket Connection Leak

Bug Title

Gateway WebSocket connections leak - CLOSE_WAIT/FIN_WAIT_2 zombie connections accumulate and cause crashes

Environment

  • OpenClaw version: 2026.3.24
  • Node.js version: v24.14.0
  • Platform: Windows 10 (x64)
  • Gateway port: 18789

Problem Description

The Gateway process accumulates zombie socket connections in CLOSE_WAIT and FIN_WAIT_2 states over time. These connections are not properly closed by the Gateway, leading to resource leak and eventual crashes.

Evidence

Connection State Analysis (observed on 2026-03-28)

StateCountDescription
ESTABLISHED2Normal active connections
LISTENING2Gateway listening on IPv4 and IPv6
CLOSE_WAIT4Zombie connections - remote closed, Gateway didn't close
FIN_WAIT_24Zombie connections - Gateway initiated close, remote not responding
TIME_WAIT4Recently closed connections (normal)

Zombie connection ratio: 66% of non-listening connections

Crash Pattern

Gateway crashes at irregular intervals (1-6 hours), requiring manual or automated restart.

Crash times observed on 2026-03-28:

  • 00:28, 01:22-01:23, 07:53, 09:00, 10:09, 13:06

Memory Behavior

  • Gateway memory fluctuates between 600-1500 MB
  • After restart: ~600 MB
  • After 1-2 hours: ~800-1400 MB
  • Not a memory leak per se, but connection accumulation correlates with crashes

Root Cause Analysis

This is a socket connection management bug in the Gateway:

  1. WebSocket clients (main OpenClaw process, PID 1424) request connection close
  2. Gateway WebSocket handler does not properly respond to close events
  3. Connections remain in CLOSE_WAIT/FIN_WAIT_2 states
  4. Zombie connections accumulate over time
  5. Eventually causes Gateway instability and crashes

Impact

  • Gateway crashes unpredictably (1-6 hour intervals)
  • Requires external watchdog process to restart
  • Users experience service interruption until restart completes

Reproduction Steps

  1. Start OpenClaw Gateway: openclaw gateway --port 18789
  2. Connect multiple clients (Discord, Feishu, etc.)
  3. Monitor connections: netstat -ano | findstr 18789
  4. Observe CLOSE_WAIT/FIN_WAIT_2 connections accumulating over time
  5. Gateway eventually crashes (no error output to stderr)

Expected Behavior

Gateway should properly close WebSocket connections when:

  • Client requests close
  • Connection timeout occurs
  • Keep-alive fails

Workaround (Implemented)

  1. Gateway Guardian watchdog process (auto-restart within 30 seconds)
  2. Daily scheduled restart at 04:00 AM to clear zombie connections
  3. Error logging via stderr redirect: gateway.cmd 2>> gateway-error.log

Suggested Fix

Review WebSocket connection lifecycle handling in:

  • gateway-runtime-DkLKThnW.js
  • server-C8VdPOMv.js
  • WebSocket close event handlers

Ensure proper socket cleanup on:

  • Client-initiated close
  • Server-initiated close
  • Connection timeout/keep-alive failure

Additional Information

  • No error output captured in stderr during crashes
  • System has ample memory (16GB, 7GB free), not memory exhaustion
  • Node.js heap limit: 4.28GB (not exceeded)

Reporter: OpenClaw user Date: 2026-03-28

extent analysis

Fix Plan

To address the socket connection leak, we need to ensure proper WebSocket connection lifecycle handling. The following steps will help resolve the issue:

  • Review and update the WebSocket close event handlers in gateway-runtime-DkLKThnW.js and server-C8VdPOMv.js to properly close connections.
  • Implement connection timeout and keep-alive failure handling to prevent zombie connections.

Example code snippet to handle WebSocket close events:

// Handle client-initiated close
ws.on('close', () => {
  // Clean up socket resources
  ws.terminate();
});

// Handle server-initiated close
ws.on('error', (error) => {
  // Clean up socket resources
  ws.terminate();
});

// Implement connection timeout (e.g., 30 seconds)
const timeout = 30000;
ws.on('pong', () => {
  // Reset timeout
  clearTimeout(timeoutId);
  timeoutId = setTimeout(() => {
    // Clean up socket resources on timeout
    ws.terminate();
  }, timeout);
});
  • Update the WebSocket connection establishment code to set up the close event handlers and implement connection timeout/keep-alive failure handling.

Verification

To verify the fix, follow these steps:

  1. Restart the Gateway process.
  2. Connect multiple clients and monitor connections using netstat -ano | findstr 18789.
  3. Observe the connections and verify that CLOSE_WAIT/FIN_WAIT_2 connections are properly closed.
  4. Test the connection timeout and keep-alive failure handling by simulating a client disconnection or network failure.

Extra Tips

  • Regularly review and update the WebSocket connection lifecycle handling code to ensure proper socket cleanup.
  • Consider implementing additional logging and monitoring to detect and respond to connection leaks or other issues.
  • Review the Node.js documentation for best practices on handling WebSocket connections and implementing connection timeout/keep-alive failure handling.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway WebSocket connections leak - CLOSE_WAIT/FIN_WAIT_2 zombie connections cause crashes [1 comments, 2 participants]