## Expected Behaviour - A WebSocket client connecting to the local gateway control port should wait for the gateway to be ready, or - The `error` event should be handled gracefully (retry/backoff) rather than bubbling up as an unhandled exception that kills the process. ## Related Symptom: Deep Security Audit Probe Timeout `openclaw security audit --deep` reports: WARN gateway.probe_failed Gateway probe failed (deep) timeout This appears to share the same root cause: the loopback port `18789` intermittently refuses connections or fails to respond to the local probe.

openclaw - ✅(Solved) Fix [Bug]: OpenClaw: Crash loop on plugin config reload (ECONNREFUSED on loopback port 18789) [1 pull requests, 1 participants]

openclaw2026-04-10 08:41:18

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#64201•Fetched 2026-04-11 06:15:56

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jpdeuster

Participants

jpdeuster

Timeline (top)

labeled ×2cross-referenced ×1

OpenClaw enters a crash loop for ~6 minutes (≈25 restarts at 15s intervals) whenever the plugins.allow or plugins.entries.* configuration is changed at runtime. The container self-recovers thanks to restart: unless-stopped, but during the loop the gateway is unreachable.

Environment

Image: ghcr.io/hostinger/hvps-openclaw:latest
Image revision: 69fa0d8 (created 2026-03-02)
Host OS: Ubuntu 24.04.4 LTS
Docker compose: v5.0.2
Reverse proxy: Traefik (HTTPS via Let's Encrypt)
Active plugins: telegram only

Error Message

Trigger event in logs (Berlin time, UTC+02:00)

Config change detected at 10:12:57 (plugins.allow modified — unused channel plugins removed). Reload deferred until in-flight ops complete, then SIGUSR1 sent at 10:17:17:

2026-04-10T10:12:57.766+02:00 [reload] config change detected; evaluating reload (plugins.allow) 2026-04-10T10:12:57.791+02:00 [reload] config change requires gateway restart (plugins.allow) — deferring until 2 operation(s), 1 reply(ies), 1 embedded run(s) complete 2026-04-10T10:13:58.876+02:00 [reload] config change detected; evaluating reload (plugins.entries.whatsapp.enabled, plugins.entries.discord.enabled, plugins.entries.slack.enabled, plugins.entries.nostr.enabled, plugins.entries.googlechat.enabled) 2026-04-10T10:17:17.891+02:00 [reload] all operations and replies completed; restarting gateway now 2026-04-10T10:17:17.893+02:00 [gateway] signal SIGUSR1 received 2026-04-10T10:17:17.895+02:00 [gateway] received SIGUSR1; restarting

Crash on every restart attempt

[gateway] Starting OpenClaw gateway... node:events:497 throw er; // Unhandled 'error' event ^ Error: connect ECONNREFUSED 127.0.0.1:18789 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16) Emitted 'error' event on WebSocket instance at: at emitErrorAndClose (/hostinger/node_modules/ws/lib/websocket.js:1046:13) at ClientRequest. (/hostinger/node_modules/ws/lib/websocket.js:886:5) at ClientRequest.emit (node:events:519:28) at emitErrorEvent (node:_http_client:107:11) at Socket.socketErrorListener (node:_http_client:574:5) at Socket.emit (node:events:519:28) at emitErrorNT (node:internal/streams/destroy:170:8) at emitErrorCloseNT (node:internal/streams/destroy:129:3) at process.processTicksAndRejections (node:internal/process/task_queues:90:21) { errno: -111, code: 'ECONNREFUSED', syscall: 'connect', address: '127.0.0.1', port: 18789 } Node.js v22.22.0

Container restart loop (`docker events`)

25 die/start cycles between 10:17 and 10:23, ~15s apart:

2026-04-10T08:17:20Z die 2026-04-10T08:17:20Z start 2026-04-10T08:17:36Z die 2026-04-10T08:17:37Z start 2026-04-10T08:17:48Z die 2026-04-10T08:17:49Z start ... (21 more cycles) 2026-04-10T08:23:12Z die 2026-04-10T08:23:27Z start ← finally stable

Related symptom

openclaw security audit --deep reports the same loopback port as unreachable:

WARN gateway.probe_failed Gateway probe failed (deep) timeout Fix: Run "openclaw status --all" to debug connectivity/auth, then re-run "openclaw security audit --deep".

Root Cause

This appears to share the same root cause: the loopback port 18789 intermittently refuses connections or fails to respond to the local probe.

Fix Action

Fix / Workaround

Workaround

PR fix notes

PR #64255: fix: stop configReloader before plugin teardown to prevent crash loop

Repository: openclaw/openclaw
Author: EronFan
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/64255

Description (problem / solution / changelog)

Fix: plugin config reload crash loop (issue #64201)

Problem

Modifying plugins.allow or plugins.entries.* causes gateway crash loop (~25 restarts / 6 minutes). Root cause: ECONNREFUSED on loopback:18789 because the config reloader restart fires while the previous server port is still being torn down.

Root cause

In createGatewayCloseHandler, configReloader.stop() was called after plugin teardown (stopChannel, pluginServices.stop) and after the HTTP/WebSocket server close. This means:

Config watcher detects config change → fires restart
Old gateway process teardown starts (plugins stop, then configReloader.stop())
New gateway process starts → tries to bind 127.0.0.1:18789
Old process hasn't closed the port yet → ECONNREFUSED
New process crashes → restart loop

Fix

Reorder createGatewayCloseHandler shutdown sequence so that before any plugin teardown runs:

configReloader.stop() — stop the config watcher first, preventing further restarts
Broadcast shutdown event to connected clients
Close all client sockets
Close HTTP and WebSocket servers (releasing port 18789)

Only then proceed with plugin/channel teardown (stopChannel, pluginServices.stop, etc.).

Test

Added server-close.test.ts → stops the config reloader and closes listeners before plugin teardown continues which verifies the shutdown order: configReloader.stop, wss.close, http.close all occur before bonjourStop and pluginServices.stop.

Files changed

src/gateway/server-close.ts — reordered shutdown sequence
src/gateway/server-close.test.ts — added shutdown-order test

Changed files

extensions/ollama/openclaw.plugin.json (modified, +2/-1)
src/gateway/server-close.test.ts (modified, +64/-0)
src/gateway/server-close.ts (modified, +61/-57)

Code Example

## Trigger event in logs (Berlin time, UTC+02:00)

  Config change detected at 10:12:57 (`plugins.allow` modified — unused channel
  plugins removed). Reload deferred until in-flight ops complete, then SIGUSR1
  sent at 10:17:17:

  2026-04-10T10:12:57.766+02:00 [reload] config change detected; evaluating reload (plugins.allow)
  2026-04-10T10:12:57.791+02:00 [reload] config change requires gateway restart (plugins.allow) — deferring until 2 operation(s), 1 reply(ies), 1 embedded run(s) complete
  2026-04-10T10:13:58.876+02:00 [reload] config change detected; evaluating reload (plugins.entries.whatsapp.enabled, plugins.entries.discord.enabled,
  plugins.entries.slack.enabled, plugins.entries.nostr.enabled, plugins.entries.googlechat.enabled)
  2026-04-10T10:17:17.891+02:00 [reload] all operations and replies completed; restarting gateway now
  2026-04-10T10:17:17.893+02:00 [gateway] signal SIGUSR1 received
  2026-04-10T10:17:17.895+02:00 [gateway] received SIGUSR1; restarting

  ## Crash on every restart attempt

  [gateway] Starting OpenClaw gateway...
  node:events:497
        throw er; // Unhandled 'error' event
        ^
  Error: connect ECONNREFUSED 127.0.0.1:18789
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16)
  Emitted 'error' event on WebSocket instance at:
      at emitErrorAndClose (/hostinger/node_modules/ws/lib/websocket.js:1046:13)
      at ClientRequest. (/hostinger/node_modules/ws/lib/websocket.js:886:5)
      at ClientRequest.emit (node:events:519:28)
      at emitErrorEvent (node:_http_client:107:11)
      at Socket.socketErrorListener (node:_http_client:574:5)
      at Socket.emit (node:events:519:28)
      at emitErrorNT (node:internal/streams/destroy:170:8)
      at emitErrorCloseNT (node:internal/streams/destroy:129:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
    errno: -111,
    code: 'ECONNREFUSED',
    syscall: 'connect',
    address: '127.0.0.1',
    port: 18789
  }
  Node.js v22.22.0

  ## Container restart loop (`docker events`)

  25 die/start cycles between 10:17 and 10:23, ~15s apart:

  2026-04-10T08:17:20Z die
  2026-04-10T08:17:20Z start
  2026-04-10T08:17:36Z die
  2026-04-10T08:17:37Z start
  2026-04-10T08:17:48Z die
  2026-04-10T08:17:49Z start
  ... (21 more cycles)
  2026-04-10T08:23:12Z die
  2026-04-10T08:23:27Z start  ← finally stable

  ## Related symptom

  `openclaw security audit --deep` reports the same loopback port as unreachable:

  WARN  gateway.probe_failed  Gateway probe failed (deep)
    timeout
    Fix: Run "openclaw status --all" to debug connectivity/auth, then re-run "openclaw security audit --deep".

RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

Summary

Environment

Image: ghcr.io/hostinger/hvps-openclaw:latest
Image revision: 69fa0d8 (created 2026-03-02)
Host OS: Ubuntu 24.04.4 LTS
Docker compose: v5.0.2
Reverse proxy: Traefik (HTTPS via Let's Encrypt)
Active plugins: telegram only

Steps to reproduce

Steps to Reproduce

Have a running OpenClaw instance with multiple plugins listed in plugins.allow (e.g. discord, slack, googlechat, nostr, whatsapp).
Edit /data/.openclaw/openclaw.json and remove unused plugins from plugins.allow and disable them in plugins.entries.*.
OpenClaw detects the config change and triggers an automatic gateway reload via SIGUSR1.
The reload sequence fails with an unhandled WebSocket error.

Expected behavior

Expected Behaviour

A WebSocket client connecting to the local gateway control port should wait for the gateway to be ready, or
The error event should be handled gracefully (retry/backoff) rather than bubbling up as an unhandled exception that kills the process.

Related Symptom: Deep Security Audit Probe Timeout

openclaw security audit --deep reports:

WARN gateway.probe_failed Gateway probe failed (deep) timeout

This appears to share the same root cause: the loopback port 18789 intermittently refuses connections or fails to respond to the local probe.

Actual behavior

Observed Behaviour

After the reload signal, the process exits with an unhandled error event:

[gateway] received SIGUSR1; restarting node:events:497 throw er; // Unhandled 'error' event ^ Error: connect ECONNREFUSED 127.0.0.1:18789 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16) Emitted 'error' event on WebSocket instance at: at emitErrorAndClose (/hostinger/node_modules/ws/lib/websocket.js:1046:13) at ClientRequest. (/hostinger/node_modules/ws/lib/websocket.js:886:5) errno: -111, code: 'ECONNREFUSED', syscall: 'connect', address: '127.0.0.1', port: 18789

Docker then restarts the container per its restart policy. The new instance hits the same race condition during startup and crashes again. After roughly 25 restart cycles (~6 minutes) the timing happens to align and the gateway comes up cleanly.

docker events confirms ~25 die/start cycles during the loop window.

OpenClaw version

2026.4.9 (build 0512059)

Operating system

Ubuntu 24.04.4 LTS (kernel 6.8.0-107-generic, x86_64)

Install method

docker (image: ghcr.io/hostinger/hvps-openclaw:latest, revision 69fa0d8, image created 2026-03-02)

Model

openrouter/moonshotai/kimi-k2.5

Provider / routing chain

openclaw -> openrouter -> moonshotai/kimi-k2.5

Additional provider/model setup details

Default agent model: openrouter/moonshotai/kimi-k2.5 (configured in agents.defaults.model.primary)
11 agents registered (main, Router, Chef, Codee, Mail, Memory, Ops, PM, Search, Shares, Travel)
Mix of openrouter/moonshotai/kimi-k2.5, openrouter/anthropic/claude-sonnet-4-5, and openrouter/anthropic/claude-opus-4-6 per agent
Single channel: telegram (allowlist groups, pairing for DMs)
Reverse proxy: Traefik (HTTPS via Let's Encrypt) in front of port 60413
The bug is unrelated to model/provider — it occurs purely on the gateway control loop during a config reload, before any model call.

Logs, screenshots, and evidence

## Trigger event in logs (Berlin time, UTC+02:00)

  Config change detected at 10:12:57 (`plugins.allow` modified — unused channel
  plugins removed). Reload deferred until in-flight ops complete, then SIGUSR1
  sent at 10:17:17:

  2026-04-10T10:12:57.766+02:00 [reload] config change detected; evaluating reload (plugins.allow)
  2026-04-10T10:12:57.791+02:00 [reload] config change requires gateway restart (plugins.allow) — deferring until 2 operation(s), 1 reply(ies), 1 embedded run(s) complete
  2026-04-10T10:13:58.876+02:00 [reload] config change detected; evaluating reload (plugins.entries.whatsapp.enabled, plugins.entries.discord.enabled,
  plugins.entries.slack.enabled, plugins.entries.nostr.enabled, plugins.entries.googlechat.enabled)
  2026-04-10T10:17:17.891+02:00 [reload] all operations and replies completed; restarting gateway now
  2026-04-10T10:17:17.893+02:00 [gateway] signal SIGUSR1 received
  2026-04-10T10:17:17.895+02:00 [gateway] received SIGUSR1; restarting

  ## Crash on every restart attempt

  [gateway] Starting OpenClaw gateway...
  node:events:497
        throw er; // Unhandled 'error' event
        ^
  Error: connect ECONNREFUSED 127.0.0.1:18789
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16)
  Emitted 'error' event on WebSocket instance at:
      at emitErrorAndClose (/hostinger/node_modules/ws/lib/websocket.js:1046:13)
      at ClientRequest. (/hostinger/node_modules/ws/lib/websocket.js:886:5)
      at ClientRequest.emit (node:events:519:28)
      at emitErrorEvent (node:_http_client:107:11)
      at Socket.socketErrorListener (node:_http_client:574:5)
      at Socket.emit (node:events:519:28)
      at emitErrorNT (node:internal/streams/destroy:170:8)
      at emitErrorCloseNT (node:internal/streams/destroy:129:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
    errno: -111,
    code: 'ECONNREFUSED',
    syscall: 'connect',
    address: '127.0.0.1',
    port: 18789
  }
  Node.js v22.22.0

  ## Container restart loop (`docker events`)

  25 die/start cycles between 10:17 and 10:23, ~15s apart:

  2026-04-10T08:17:20Z die
  2026-04-10T08:17:20Z start
  2026-04-10T08:17:36Z die
  2026-04-10T08:17:37Z start
  2026-04-10T08:17:48Z die
  2026-04-10T08:17:49Z start
  ... (21 more cycles)
  2026-04-10T08:23:12Z die
  2026-04-10T08:23:27Z start  ← finally stable

  ## Related symptom

  `openclaw security audit --deep` reports the same loopback port as unreachable:

  WARN  gateway.probe_failed  Gateway probe failed (deep)
    timeout
    Fix: Run "openclaw status --all" to debug connectivity/auth, then re-run "openclaw security audit --deep".

Impact and severity

Affected: any OpenClaw deployment that edits plugins.allow or plugins.entries.* while the gateway is running (observed on 2026.4.9 docker).

Severity: High during the loop window — gateway is unreachable, no Telegram messages can be sent or received. Self-recovers after ~6 minutes thanks to docker restart: unless-stopped.

Frequency: 1/1 reproducible — happened immediately after a single plugin config edit. Did not happen on cold container restarts before/after.

Consequence: ~6 minutes of downtime per config change, missed scheduled cron job runs that fall in that window, no observed data loss.

Additional information

Workaround

Avoid letting OpenClaw auto-reload after config changes. Instead, edit the config and immediately restart the container manually:

docker restart openclaw-jhps-openclaw-1

A clean cold-restart goes through without entering the crash loop.

Impact

- Severity: Medium — gateway is unreachable for several minutes after
any plugin config change.
- Self-recovery: Yes (eventually), thanks to restart: unless-stopped.
- Data loss: None observed.

extent analysis

TL;DR

The most likely fix for the OpenClaw crash loop issue is to handle the unhandled 'error' event that occurs when the WebSocket connection to the gateway control port is refused, allowing the gateway to restart cleanly after configuration changes.

Guidance

Identify and handle the unhandled 'error' event in the WebSocket connection to prevent the process from exiting.
Implement a retry mechanism with backoff to handle temporary connection refusals to the gateway control port.
Consider adding a delay between sending the SIGUSR1 signal and restarting the gateway to ensure that all operations and replies are completed.
Verify that the gateway can restart cleanly after configuration changes by checking the logs for any error messages.

Example

// Handle the unhandled 'error' event
ws.on('error', (error) => {
  console.error('WebSocket error:', error);
  // Implement retry mechanism with backoff
});

// Implement retry mechanism with backoff
const retry = require('async-retry');
retry(
  async () => {
    // Establish WebSocket connection
  },
  {
    retries: 5,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 5000,
  }
);

Notes

The provided workaround of manually restarting the container after configuration changes can be used as a temporary solution, but it is not a permanent fix. The root cause of the issue needs to be addressed to prevent the crash loop from occurring.

Recommendation

Apply the workaround of manually restarting the container after configuration changes until a permanent fix is implemented to handle the unhandled 'error' event and prevent the crash loop.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Expected Behaviour

A WebSocket client connecting to the local gateway control port should wait for the gateway to be ready, or
The error event should be handled gracefully (retry/backoff) rather than bubbling up as an unhandled exception that kills the process.

Related Symptom: Deep Security Audit Probe Timeout

openclaw security audit --deep reports:

WARN gateway.probe_failed Gateway probe failed (deep) timeout

This appears to share the same root cause: the loopback port 18789 intermittently refuses connections or fails to respond to the local probe.

#docker error #permission error #memory optimization #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix [Bug]: OpenClaw: Crash loop on plugin config reload (ECONNREFUSED on loopback port 18789) [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Environment

Error Message

Trigger event in logs (Berlin time, UTC+02:00)

Crash on every restart attempt

Container restart loop (docker events)

Related symptom

Root Cause

Fix Action

Fix / Workaround

Workaround

PR fix notes

PR #64255: fix: stop configReloader before plugin teardown to prevent crash loop

Description (problem / solution / changelog)

Fix: plugin config reload crash loop (issue #64201)

Problem

Root cause

Fix

Test

Files changed

Changed files

Code Example

Bug type

Beta release blocker

Summary

Environment

Steps to reproduce

Steps to Reproduce

Expected behavior

Expected Behaviour

Related Symptom: Deep Security Audit Probe Timeout

Actual behavior

Observed Behaviour

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Expected Behaviour

Related Symptom: Deep Security Audit Probe Timeout

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Container restart loop (`docker events`)