openclaw - ✅(Solved) Fix [Bug]: OpenClaw: Crash loop on plugin config reload (ECONNREFUSED on loopback port 18789) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64201Fetched 2026-04-11 06:15:56
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
labeled ×2cross-referenced ×1

OpenClaw enters a crash loop for ~6 minutes (≈25 restarts at 15s intervals) whenever the plugins.allow or plugins.entries.* configuration is changed at runtime. The container self-recovers thanks to restart: unless-stopped, but during the loop the gateway is unreachable.

Environment

  • Image: ghcr.io/hostinger/hvps-openclaw:latest
  • Image revision: 69fa0d8 (created 2026-03-02)
  • Host OS: Ubuntu 24.04.4 LTS
  • Docker compose: v5.0.2
  • Reverse proxy: Traefik (HTTPS via Let's Encrypt)
  • Active plugins: telegram only

Error Message

Trigger event in logs (Berlin time, UTC+02:00)

Config change detected at 10:12:57 (plugins.allow modified — unused channel plugins removed). Reload deferred until in-flight ops complete, then SIGUSR1 sent at 10:17:17:

2026-04-10T10:12:57.766+02:00 [reload] config change detected; evaluating reload (plugins.allow) 2026-04-10T10:12:57.791+02:00 [reload] config change requires gateway restart (plugins.allow) — deferring until 2 operation(s), 1 reply(ies), 1 embedded run(s) complete 2026-04-10T10:13:58.876+02:00 [reload] config change detected; evaluating reload (plugins.entries.whatsapp.enabled, plugins.entries.discord.enabled, plugins.entries.slack.enabled, plugins.entries.nostr.enabled, plugins.entries.googlechat.enabled) 2026-04-10T10:17:17.891+02:00 [reload] all operations and replies completed; restarting gateway now 2026-04-10T10:17:17.893+02:00 [gateway] signal SIGUSR1 received 2026-04-10T10:17:17.895+02:00 [gateway] received SIGUSR1; restarting

Crash on every restart attempt

[gateway] Starting OpenClaw gateway... node:events:497 throw er; // Unhandled 'error' event ^ Error: connect ECONNREFUSED 127.0.0.1:18789 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16) Emitted 'error' event on WebSocket instance at: at emitErrorAndClose (/hostinger/node_modules/ws/lib/websocket.js:1046:13) at ClientRequest. (/hostinger/node_modules/ws/lib/websocket.js:886:5) at ClientRequest.emit (node:events:519:28) at emitErrorEvent (node:_http_client:107:11) at Socket.socketErrorListener (node:_http_client:574:5) at Socket.emit (node:events:519:28) at emitErrorNT (node:internal/streams/destroy:170:8) at emitErrorCloseNT (node:internal/streams/destroy:129:3) at process.processTicksAndRejections (node:internal/process/task_queues:90:21) { errno: -111, code: 'ECONNREFUSED', syscall: 'connect', address: '127.0.0.1', port: 18789 } Node.js v22.22.0

Container restart loop (docker events)

25 die/start cycles between 10:17 and 10:23, ~15s apart:

2026-04-10T08:17:20Z die 2026-04-10T08:17:20Z start 2026-04-10T08:17:36Z die 2026-04-10T08:17:37Z start 2026-04-10T08:17:48Z die 2026-04-10T08:17:49Z start ... (21 more cycles) 2026-04-10T08:23:12Z die 2026-04-10T08:23:27Z start ← finally stable

Related symptom

openclaw security audit --deep reports the same loopback port as unreachable:

WARN gateway.probe_failed Gateway probe failed (deep) timeout Fix: Run "openclaw status --all" to debug connectivity/auth, then re-run "openclaw security audit --deep".

Root Cause

This appears to share the same root cause: the loopback port 18789 intermittently refuses connections or fails to respond to the local probe.

Fix Action

Fix / Workaround

Workaround

PR fix notes

PR #64255: fix: stop configReloader before plugin teardown to prevent crash loop

Description (problem / solution / changelog)

Fix: plugin config reload crash loop (issue #64201)

Problem

Modifying plugins.allow or plugins.entries.* causes gateway crash loop (~25 restarts / 6 minutes). Root cause: ECONNREFUSED on loopback:18789 because the config reloader restart fires while the previous server port is still being torn down.

Root cause

In createGatewayCloseHandler, configReloader.stop() was called after plugin teardown (stopChannel, pluginServices.stop) and after the HTTP/WebSocket server close. This means:

  1. Config watcher detects config change → fires restart
  2. Old gateway process teardown starts (plugins stop, then configReloader.stop())
  3. New gateway process starts → tries to bind 127.0.0.1:18789
  4. Old process hasn't closed the port yet → ECONNREFUSED
  5. New process crashes → restart loop

Fix

Reorder createGatewayCloseHandler shutdown sequence so that before any plugin teardown runs:

  1. configReloader.stop() — stop the config watcher first, preventing further restarts
  2. Broadcast shutdown event to connected clients
  3. Close all client sockets
  4. Close HTTP and WebSocket servers (releasing port 18789)

Only then proceed with plugin/channel teardown (stopChannel, pluginServices.stop, etc.).

Test

Added server-close.test.tsstops the config reloader and closes listeners before plugin teardown continues which verifies the shutdown order: configReloader.stop, wss.close, http.close all occur before bonjourStop and pluginServices.stop.

Files changed

  • src/gateway/server-close.ts — reordered shutdown sequence
  • src/gateway/server-close.test.ts — added shutdown-order test

Changed files

  • extensions/ollama/openclaw.plugin.json (modified, +2/-1)
  • src/gateway/server-close.test.ts (modified, +64/-0)
  • src/gateway/server-close.ts (modified, +61/-57)

Code Example

## Trigger event in logs (Berlin time, UTC+02:00)

  Config change detected at 10:12:57 (`plugins.allow` modified — unused channel
  plugins removed). Reload deferred until in-flight ops complete, then SIGUSR1
  sent at 10:17:17:

  2026-04-10T10:12:57.766+02:00 [reload] config change detected; evaluating reload (plugins.allow)
  2026-04-10T10:12:57.791+02:00 [reload] config change requires gateway restart (plugins.allow) — deferring until 2 operation(s), 1 reply(ies), 1 embedded run(s) complete
  2026-04-10T10:13:58.876+02:00 [reload] config change detected; evaluating reload (plugins.entries.whatsapp.enabled, plugins.entries.discord.enabled,
  plugins.entries.slack.enabled, plugins.entries.nostr.enabled, plugins.entries.googlechat.enabled)
  2026-04-10T10:17:17.891+02:00 [reload] all operations and replies completed; restarting gateway now
  2026-04-10T10:17:17.893+02:00 [gateway] signal SIGUSR1 received
  2026-04-10T10:17:17.895+02:00 [gateway] received SIGUSR1; restarting

  ## Crash on every restart attempt

  [gateway] Starting OpenClaw gateway...
  node:events:497
        throw er; // Unhandled 'error' event
        ^
  Error: connect ECONNREFUSED 127.0.0.1:18789
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16)
  Emitted 'error' event on WebSocket instance at:
      at emitErrorAndClose (/hostinger/node_modules/ws/lib/websocket.js:1046:13)
      at ClientRequest. (/hostinger/node_modules/ws/lib/websocket.js:886:5)
      at ClientRequest.emit (node:events:519:28)
      at emitErrorEvent (node:_http_client:107:11)
      at Socket.socketErrorListener (node:_http_client:574:5)
      at Socket.emit (node:events:519:28)
      at emitErrorNT (node:internal/streams/destroy:170:8)
      at emitErrorCloseNT (node:internal/streams/destroy:129:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
    errno: -111,
    code: 'ECONNREFUSED',
    syscall: 'connect',
    address: '127.0.0.1',
    port: 18789
  }
  Node.js v22.22.0

  ## Container restart loop (`docker events`)

  25 die/start cycles between 10:17 and 10:23, ~15s apart:

  2026-04-10T08:17:20Z die
  2026-04-10T08:17:20Z start
  2026-04-10T08:17:36Z die
  2026-04-10T08:17:37Z start
  2026-04-10T08:17:48Z die
  2026-04-10T08:17:49Z start
  ... (21 more cycles)
  2026-04-10T08:23:12Z die
  2026-04-10T08:23:27Z start  ← finally stable

  ## Related symptom

  `openclaw security audit --deep` reports the same loopback port as unreachable:

  WARN  gateway.probe_failed  Gateway probe failed (deep)
    timeout
    Fix: Run "openclaw status --all" to debug connectivity/auth, then re-run "openclaw security audit --deep".
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

OpenClaw enters a crash loop for ~6 minutes (≈25 restarts at 15s intervals) whenever the plugins.allow or plugins.entries.* configuration is changed at runtime. The container self-recovers thanks to restart: unless-stopped, but during the loop the gateway is unreachable.

Environment

  • Image: ghcr.io/hostinger/hvps-openclaw:latest
  • Image revision: 69fa0d8 (created 2026-03-02)
  • Host OS: Ubuntu 24.04.4 LTS
  • Docker compose: v5.0.2
  • Reverse proxy: Traefik (HTTPS via Let's Encrypt)
  • Active plugins: telegram only

Steps to reproduce

Steps to Reproduce

  1. Have a running OpenClaw instance with multiple plugins listed in plugins.allow (e.g. discord, slack, googlechat, nostr, whatsapp).
  2. Edit /data/.openclaw/openclaw.json and remove unused plugins from plugins.allow and disable them in plugins.entries.*.
  3. OpenClaw detects the config change and triggers an automatic gateway reload via SIGUSR1.
  4. The reload sequence fails with an unhandled WebSocket error.

Expected behavior

Expected Behaviour

  • A WebSocket client connecting to the local gateway control port should wait for the gateway to be ready, or
  • The error event should be handled gracefully (retry/backoff) rather than bubbling up as an unhandled exception that kills the process.

Related Symptom: Deep Security Audit Probe Timeout

openclaw security audit --deep reports:

WARN gateway.probe_failed Gateway probe failed (deep) timeout

This appears to share the same root cause: the loopback port 18789 intermittently refuses connections or fails to respond to the local probe.

Actual behavior

Observed Behaviour

After the reload signal, the process exits with an unhandled error event:

[gateway] received SIGUSR1; restarting node:events:497 throw er; // Unhandled 'error' event ^ Error: connect ECONNREFUSED 127.0.0.1:18789 at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16) Emitted 'error' event on WebSocket instance at: at emitErrorAndClose (/hostinger/node_modules/ws/lib/websocket.js:1046:13) at ClientRequest. (/hostinger/node_modules/ws/lib/websocket.js:886:5) errno: -111, code: 'ECONNREFUSED', syscall: 'connect', address: '127.0.0.1', port: 18789

Docker then restarts the container per its restart policy. The new instance hits the same race condition during startup and crashes again. After roughly 25 restart cycles (~6 minutes) the timing happens to align and the gateway comes up cleanly.

docker events confirms ~25 die/start cycles during the loop window.

OpenClaw version

2026.4.9 (build 0512059)

Operating system

Ubuntu 24.04.4 LTS (kernel 6.8.0-107-generic, x86_64)

Install method

docker (image: ghcr.io/hostinger/hvps-openclaw:latest, revision 69fa0d8, image created 2026-03-02)

Model

openrouter/moonshotai/kimi-k2.5

Provider / routing chain

openclaw -> openrouter -> moonshotai/kimi-k2.5

Additional provider/model setup details

  • Default agent model: openrouter/moonshotai/kimi-k2.5 (configured in agents.defaults.model.primary)
  • 11 agents registered (main, Router, Chef, Codee, Mail, Memory, Ops, PM, Search, Shares, Travel)
  • Mix of openrouter/moonshotai/kimi-k2.5, openrouter/anthropic/claude-sonnet-4-5, and openrouter/anthropic/claude-opus-4-6 per agent
  • Single channel: telegram (allowlist groups, pairing for DMs)
  • Reverse proxy: Traefik (HTTPS via Let's Encrypt) in front of port 60413
  • The bug is unrelated to model/provider — it occurs purely on the gateway control loop during a config reload, before any model call.

Logs, screenshots, and evidence

## Trigger event in logs (Berlin time, UTC+02:00)

  Config change detected at 10:12:57 (`plugins.allow` modified — unused channel
  plugins removed). Reload deferred until in-flight ops complete, then SIGUSR1
  sent at 10:17:17:

  2026-04-10T10:12:57.766+02:00 [reload] config change detected; evaluating reload (plugins.allow)
  2026-04-10T10:12:57.791+02:00 [reload] config change requires gateway restart (plugins.allow) — deferring until 2 operation(s), 1 reply(ies), 1 embedded run(s) complete
  2026-04-10T10:13:58.876+02:00 [reload] config change detected; evaluating reload (plugins.entries.whatsapp.enabled, plugins.entries.discord.enabled,
  plugins.entries.slack.enabled, plugins.entries.nostr.enabled, plugins.entries.googlechat.enabled)
  2026-04-10T10:17:17.891+02:00 [reload] all operations and replies completed; restarting gateway now
  2026-04-10T10:17:17.893+02:00 [gateway] signal SIGUSR1 received
  2026-04-10T10:17:17.895+02:00 [gateway] received SIGUSR1; restarting

  ## Crash on every restart attempt

  [gateway] Starting OpenClaw gateway...
  node:events:497
        throw er; // Unhandled 'error' event
        ^
  Error: connect ECONNREFUSED 127.0.0.1:18789
      at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1637:16)
  Emitted 'error' event on WebSocket instance at:
      at emitErrorAndClose (/hostinger/node_modules/ws/lib/websocket.js:1046:13)
      at ClientRequest. (/hostinger/node_modules/ws/lib/websocket.js:886:5)
      at ClientRequest.emit (node:events:519:28)
      at emitErrorEvent (node:_http_client:107:11)
      at Socket.socketErrorListener (node:_http_client:574:5)
      at Socket.emit (node:events:519:28)
      at emitErrorNT (node:internal/streams/destroy:170:8)
      at emitErrorCloseNT (node:internal/streams/destroy:129:3)
      at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
    errno: -111,
    code: 'ECONNREFUSED',
    syscall: 'connect',
    address: '127.0.0.1',
    port: 18789
  }
  Node.js v22.22.0

  ## Container restart loop (`docker events`)

  25 die/start cycles between 10:17 and 10:23, ~15s apart:

  2026-04-10T08:17:20Z die
  2026-04-10T08:17:20Z start
  2026-04-10T08:17:36Z die
  2026-04-10T08:17:37Z start
  2026-04-10T08:17:48Z die
  2026-04-10T08:17:49Z start
  ... (21 more cycles)
  2026-04-10T08:23:12Z die
  2026-04-10T08:23:27Z start  ← finally stable

  ## Related symptom

  `openclaw security audit --deep` reports the same loopback port as unreachable:

  WARN  gateway.probe_failed  Gateway probe failed (deep)
    timeout
    Fix: Run "openclaw status --all" to debug connectivity/auth, then re-run "openclaw security audit --deep".

Impact and severity

Affected: any OpenClaw deployment that edits plugins.allow or plugins.entries.* while the gateway is running (observed on 2026.4.9 docker).

Severity: High during the loop window — gateway is unreachable, no Telegram messages can be sent or received. Self-recovers after ~6 minutes thanks to docker restart: unless-stopped.

Frequency: 1/1 reproducible — happened immediately after a single plugin config edit. Did not happen on cold container restarts before/after.

Consequence: ~6 minutes of downtime per config change, missed scheduled cron job runs that fall in that window, no observed data loss.

Additional information

Workaround

Avoid letting OpenClaw auto-reload after config changes. Instead, edit the config and immediately restart the container manually:

docker restart openclaw-jhps-openclaw-1

A clean cold-restart goes through without entering the crash loop.

Impact

- Severity: Medium — gateway is unreachable for several minutes after
any plugin config change.
- Self-recovery: Yes (eventually), thanks to restart: unless-stopped.
- Data loss: None observed.

extent analysis

TL;DR

The most likely fix for the OpenClaw crash loop issue is to handle the unhandled 'error' event that occurs when the WebSocket connection to the gateway control port is refused, allowing the gateway to restart cleanly after configuration changes.

Guidance

  • Identify and handle the unhandled 'error' event in the WebSocket connection to prevent the process from exiting.
  • Implement a retry mechanism with backoff to handle temporary connection refusals to the gateway control port.
  • Consider adding a delay between sending the SIGUSR1 signal and restarting the gateway to ensure that all operations and replies are completed.
  • Verify that the gateway can restart cleanly after configuration changes by checking the logs for any error messages.

Example

// Handle the unhandled 'error' event
ws.on('error', (error) => {
  console.error('WebSocket error:', error);
  // Implement retry mechanism with backoff
});

// Implement retry mechanism with backoff
const retry = require('async-retry');
retry(
  async () => {
    // Establish WebSocket connection
  },
  {
    retries: 5,
    factor: 2,
    minTimeout: 1000,
    maxTimeout: 5000,
  }
);

Notes

The provided workaround of manually restarting the container after configuration changes can be used as a temporary solution, but it is not a permanent fix. The root cause of the issue needs to be addressed to prevent the crash loop from occurring.

Recommendation

Apply the workaround of manually restarting the container after configuration changes until a permanent fix is implemented to handle the unhandled 'error' event and prevent the crash loop.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Expected Behaviour

  • A WebSocket client connecting to the local gateway control port should wait for the gateway to be ready, or
  • The error event should be handled gracefully (retry/backoff) rather than bubbling up as an unhandled exception that kills the process.

Related Symptom: Deep Security Audit Probe Timeout

openclaw security audit --deep reports:

WARN gateway.probe_failed Gateway probe failed (deep) timeout

This appears to share the same root cause: the loopback port 18789 intermittently refuses connections or fails to respond to the local probe.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING