openclaw - 💡(How to fix) Fix [Bug]: Gateway crashes on every lane task completion after hot-reload (schema drift in command-queue global singleton) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62959Fetched 2026-04-09 08:00:10
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
closed ×1commented ×1labeled ×1

Gateway process throws an unhandled TypeError in notifyActiveTaskWaiters() on every lane=main task completion after a
hooks:loader hot-reload, causing all connected WebSocket clients to drop with code=1006.

Error Message

~/.openclaw/logs/gateway.err.log (repeats every 1–3 minutes; sample from 2026-04-08):

2026-04-08T10:54:50.237+08:00 [diagnostic] lane task error: lane=main durationMs=7651 error="TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator))" 2026-04-08T10:54:50.239+08:00 [openclaw] Unhandled promise rejection: TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator)) at Array.from (<anonymous>) at notifyActiveTaskWaiters (file:///Users/<user>/.npm-global/lib/node_modules/openclaw/dist/command-queue-Cssp02gj.js:88:29) at file:///Users/<user>/.npm-global/lib/node_modules/openclaw/dist/command-queue-Cssp02gj.js:130:8 2026-04-08T10:54:54.859+08:00 [hooks:loader] Loading managed hook code into the gateway process. Managed hooks are trusted local code.

Frequency in this log on 2026-04-08: 204 lines matching notifyActiveTaskWaiters.

Source under inspection (current bundle):

dist/command-queue-Cssp02gj.js:42-48 — getQueueState() initializer const COMMAND_QUEUE_STATE_KEY = Symbol.for("openclaw.commandQueueState"); function getQueueState() { return resolveGlobalSingleton(COMMAND_QUEUE_STATE_KEY, () => ({ gatewayDraining: false, lanes: new Map(), activeTaskWaiters: new Set(), nextTaskId: 1 })); }

dist/command-queue-Cssp02gj.js:86-89 — failing call site function notifyActiveTaskWaiters() { const queueState = getQueueState(); for (const waiter of Array.from(queueState.activeTaskWaiters)) ... }

dist/global-singleton-vftIYBun.js:2-8 — singleton resolver (reuses existing key as-is) function resolveGlobalSingleton(key, create) { const globalStore = globalThis; if (Object.prototype.hasOwnProperty.call(globalStore, key)) return globalStore[key]; const created = create(); globalStore[key] = created; return created; }

resolveGlobalSingleton does not re-initialize when the global key already exists. If an earlier bundle created the singleton without activeTaskWaiters, every subsequent caller in the new bundle reads undefined for that field. Array.from(undefined) then throws.

Local mitigation verified: after restarting the gateway (PID changed 99775 → 3326) and patching getQueueState() to fill missing fields on reuse, multiple lane=main tasks (sessions.list, chat.history, models.list, node.list, device.pair.list) completed cleanly with zero new notifyActiveTaskWaiters errors in gateway.err.log. Pre-restart frequency: ≥1 crash per 1–3 minutes. Post-fix frequency over the verification window: 0.

Root Cause

Affected: every WebSocket client connected to a hot-reloaded gateway (observed in openclaw-control-ui via Chrome 146 and via QuarkPC/6.6.5.788 on the same host). Severity: High — gateway process is crashed and reloaded on every lane task completion; user-visible symptom is disconnected (1006): no reason immediately at the end of each model response, with ~5s reconnect. Frequency: deterministic on the affected process — every lane=main task completion (observed every 1–3 minutes throughout 2026-04-08, 204 stack traces in a single day's gateway.err.log). Consequence: every assistant turn ends in a forced reconnect; in-flight WebSocket state is lost; root cause is invisible to clients (1006 carries no reason) and the symptom is easily misdiagnosed as a WebSocket / network issue.

Fix Action

Fix / Workaround

Local mitigation verified: after restarting the gateway (PID changed 99775 → 3326) and patching getQueueState() to fill missing fields on reuse, multiple lane=main tasks (sessions.list, chat.history, models.list, node.list, device.pair.list) completed cleanly with zero new notifyActiveTaskWaiters errors in gateway.err.log. Pre-restart frequency: ≥1 crash per 1–3 minutes. Post-fix frequency over the verification window: 0.


Suggested minimal patch (verified locally) — make getQueueState self-heal so reused singletons get any new fields filled in:

Code Example

~/.openclaw/logs/gateway.err.log (repeats every 13 minutes; sample from 2026-04-08):

  2026-04-08T10:54:50.237+08:00 [diagnostic] lane task error: lane=main durationMs=7651 error="TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator))"
  2026-04-08T10:54:50.239+08:00 [openclaw] Unhandled promise rejection: TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator))
      at Array.from (<anonymous>)
      at notifyActiveTaskWaiters (file:///Users/<user>/.npm-global/lib/node_modules/openclaw/dist/command-queue-Cssp02gj.js:88:29)
      at file:///Users/<user>/.npm-global/lib/node_modules/openclaw/dist/command-queue-Cssp02gj.js:130:8
  2026-04-08T10:54:54.859+08:00 [hooks:loader] Loading managed hook code into the gateway process. Managed hooks are trusted local code.

Frequency in this log on 2026-04-08: 204 lines matching `notifyActiveTaskWaiters`.

Source under inspection (current bundle):

  dist/command-queue-Cssp02gj.js:42-48getQueueState() initializer
      const COMMAND_QUEUE_STATE_KEY = Symbol.for("openclaw.commandQueueState");
      function getQueueState() {
          return resolveGlobalSingleton(COMMAND_QUEUE_STATE_KEY, () => ({
              gatewayDraining: false,
              lanes: new Map(),
              activeTaskWaiters: new Set(),
              nextTaskId: 1
          }));
      }

  dist/command-queue-Cssp02gj.js:86-89 — failing call site
      function notifyActiveTaskWaiters() {
          const queueState = getQueueState();
          for (const waiter of Array.from(queueState.activeTaskWaiters)) ...
      }

  dist/global-singleton-vftIYBun.js:2-8 — singleton resolver (reuses existing key as-is)
      function resolveGlobalSingleton(key, create) {
          const globalStore = globalThis;
          if (Object.prototype.hasOwnProperty.call(globalStore, key)) return globalStore[key];
          const created = create();
          globalStore[key] = created;
          return created;
      }

resolveGlobalSingleton does not re-initialize when the global key already exists. If an earlier bundle created the singleton without `activeTaskWaiters`, every subsequent caller in the new bundle reads `undefined` for that field. `Array.from(undefined)` then throws.

Local mitigation verified: after restarting the gateway (PID changed 997753326) and patching getQueueState() to fill missing fields on reuse, multiple lane=main tasks (sessions.list, chat.history, models.list, node.list, device.pair.list) completed cleanly with zero new notifyActiveTaskWaiters errors in gateway.err.log. Pre-restart frequency:1 crash per 13 minutes. Post-fix frequency over the verification window: 0.
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

Gateway process throws an unhandled TypeError in notifyActiveTaskWaiters() on every lane=main task completion after a
hooks:loader hot-reload, causing all connected WebSocket clients to drop with code=1006.

Steps to reproduce

  1. Run openclaw gateway start (npm global install, LaunchAgent on macOS).
  2. Wait until ~/.openclaw/logs/gateway.err.log shows [hooks:loader] Loading managed hook code into the gateway process
    (i.e. the bundle has been reloaded into the running Node process at least once).
  3. Trigger any RPC that goes through lane=main — any webchat message, or any of sessions.list / chat.history /
    models.list / node.list / device.pair.list from a connected client.
  4. The task completes; the process then immediately logs the TypeError trace shown in "Logs" below, and every connected
    WebSocket client receives code=1006.

Synthetic minimum-version reproduction in isolation: NOT_ENOUGH_INFO.

Expected behavior

notifyActiveTaskWaiters() iterates queueState.activeTaskWaiters without throwing, regardless of whether the global queue-state singleton was created by the current bundle or carried over from an earlier bundle via hot-reload.

Actual behavior

notifyActiveTaskWaiters() throws TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator)) at
dist/command-queue-Cssp02gj.js:88:29, surfaced as an unhandled promise rejection. The crash repeats on every subsequent
lane=main task completion (observed every 1–3 minutes). Each crash is followed by [hooks:loader] Loading managed hook code into the gateway process and a Bonjour re-advertise, and every active WebSocket client is dropped with code=1006 reason=n/a.

OpenClaw version

2026.4.5

Operating system

macOS 26.3.1 (arm64, Mac mini M-series)

Install method

npm global (/Users/<user>/.npm-global/lib/node_modules/openclaw), Node.js v24.14.0, supervised via LaunchAgent ai.openclaw.gateway

Model

NOT_ENOUGH_INFO — bug is in the gateway's internal command-queue layer (lane task completion), reproduces independently of which model is active. During observation the active model was bailian/qwen3.5-plus.

Provider / routing chain

NOT_ENOUGH_INFO — bug is in the gateway's internal command-queue layer; it does not depend on any provider routing. The active route during observation was openclaw -> bailian (DashScope-compatible openai-completions endpoint, configured under models.providers.bailian).

Additional provider/model setup details

Not relevant — the failing code path (notifyActiveTaskWaiters) runs after every lane=main task regardless of model/provider. Reproduced with sessions.list, chat.history, models.list, node.list, and device.pair.list as well as model completions.

Logs, screenshots, and evidence

~/.openclaw/logs/gateway.err.log (repeats every 1–3 minutes; sample from 2026-04-08):

  2026-04-08T10:54:50.237+08:00 [diagnostic] lane task error: lane=main durationMs=7651 error="TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator))"
  2026-04-08T10:54:50.239+08:00 [openclaw] Unhandled promise rejection: TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator))
      at Array.from (<anonymous>)
      at notifyActiveTaskWaiters (file:///Users/<user>/.npm-global/lib/node_modules/openclaw/dist/command-queue-Cssp02gj.js:88:29)
      at file:///Users/<user>/.npm-global/lib/node_modules/openclaw/dist/command-queue-Cssp02gj.js:130:8
  2026-04-08T10:54:54.859+08:00 [hooks:loader] Loading managed hook code into the gateway process. Managed hooks are trusted local code.

Frequency in this log on 2026-04-08: 204 lines matching `notifyActiveTaskWaiters`.

Source under inspection (current bundle):

  dist/command-queue-Cssp02gj.js:42-48 — getQueueState() initializer
      const COMMAND_QUEUE_STATE_KEY = Symbol.for("openclaw.commandQueueState");
      function getQueueState() {
          return resolveGlobalSingleton(COMMAND_QUEUE_STATE_KEY, () => ({
              gatewayDraining: false,
              lanes: new Map(),
              activeTaskWaiters: new Set(),
              nextTaskId: 1
          }));
      }

  dist/command-queue-Cssp02gj.js:86-89 — failing call site
      function notifyActiveTaskWaiters() {
          const queueState = getQueueState();
          for (const waiter of Array.from(queueState.activeTaskWaiters)) ...
      }

  dist/global-singleton-vftIYBun.js:2-8 — singleton resolver (reuses existing key as-is)
      function resolveGlobalSingleton(key, create) {
          const globalStore = globalThis;
          if (Object.prototype.hasOwnProperty.call(globalStore, key)) return globalStore[key];
          const created = create();
          globalStore[key] = created;
          return created;
      }

resolveGlobalSingleton does not re-initialize when the global key already exists. If an earlier bundle created the singleton without `activeTaskWaiters`, every subsequent caller in the new bundle reads `undefined` for that field. `Array.from(undefined)` then throws.

Local mitigation verified: after restarting the gateway (PID changed 997753326) and patching getQueueState() to fill missing fields on reuse, multiple lane=main tasks (sessions.list, chat.history, models.list, node.list, device.pair.list) completed cleanly with zero new notifyActiveTaskWaiters errors in gateway.err.log. Pre-restart frequency: ≥1 crash per 1–3 minutes. Post-fix frequency over the verification window: 0.

Impact and severity

Affected: every WebSocket client connected to a hot-reloaded gateway (observed in openclaw-control-ui via Chrome 146 and via QuarkPC/6.6.5.788 on the same host). Severity: High — gateway process is crashed and reloaded on every lane task completion; user-visible symptom is disconnected (1006): no reason immediately at the end of each model response, with ~5s reconnect. Frequency: deterministic on the affected process — every lane=main task completion (observed every 1–3 minutes throughout 2026-04-08, 204 stack traces in a single day's gateway.err.log). Consequence: every assistant turn ends in a forced reconnect; in-flight WebSocket state is lost; root cause is invisible to clients (1006 carries no reason) and the symptom is easily misdiagnosed as a WebSocket / network issue.

Additional information

Last known good version: NOT_ENOUGH_INFO. First known bad version: 2026.4.5 (only version observed locally). Regression bisect: NOT_ENOUGH_INFO.

Note: gateway.log (the normal log) does NOT contain the TypeError or the 1006 storm — they only appear in gateway.err.log. Anyone debugging by tailing gateway.log will see nothing wrong while the process is in fact crashing on every task. Surfacing the underlying crash in either the main log or the Control UI would have shortened diagnosis significantly.

Suggested minimal patch (verified locally) — make getQueueState self-heal so reused singletons get any new fields filled in:

function getQueueState() {
    const state = resolveGlobalSingleton(COMMAND_QUEUE_STATE_KEY, () => ({
        gatewayDraining: false,
        lanes: new Map(),
        activeTaskWaiters: new Set(),
        nextTaskId: 1
    }));
    if (!state.lanes) state.lanes = new Map();
    if (!state.activeTaskWaiters) state.activeTaskWaiters = new Set();
    if (typeof state.nextTaskId !== "number") state.nextTaskId = 1;
    if (typeof state.gatewayDraining !== "boolean") state.gatewayDraining = false;
    return state;
}

A more general fix (versioned singleton key, or a hydration step in resolveGlobalSingleton): NOT_ENOUGH_INFO on whether maintainers prefer a per-singleton self-heal vs. a centralized migration helper.

extent analysis

TL;DR

The most likely fix is to modify the getQueueState function to self-heal and fill in missing fields when reusing existing singletons.

Guidance

  • Identify the getQueueState function in the command-queue-Cssp02gj.js file and modify it to include checks for missing fields (lanes, activeTaskWaiters, nextTaskId, gatewayDraining) and initialize them if they are undefined.
  • Verify that the resolveGlobalSingleton function is correctly resolving the singleton instance and that the getQueueState function is returning the expected state object.
  • Test the modified getQueueState function by running the openclaw gateway start command and triggering lane=main tasks to ensure that the TypeError is no longer thrown.
  • Consider implementing a more general fix, such as using a versioned singleton key or adding a hydration step in resolveGlobalSingleton, to prevent similar issues in the future.

Example

function getQueueState() {
    const state = resolveGlobalSingleton(COMMAND_QUEUE_STATE_KEY, () => ({
        gatewayDraining: false,
        lanes: new Map(),
        activeTaskWaiters: new Set(),
        nextTaskId: 1
    }));
    if (!state.lanes) state.lanes = new Map();
    if (!state.activeTaskWaiters) state.activeTaskWaiters = new Set();
    if (typeof state.nextTaskId !== "number") state.nextTaskId = 1;
    if (typeof state.gatewayDraining !== "boolean") state.gatewayDraining = false;
    return state;
}

Notes

The provided patch has been verified locally, but it is unclear whether this is the preferred solution by the maintainers. A more general fix may be necessary to prevent similar issues in the future.

Recommendation

Apply the suggested minimal patch to the getQueueState function to self-heal and fill in missing fields when reusing existing singletons, as this has been verified to fix the issue locally.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

notifyActiveTaskWaiters() iterates queueState.activeTaskWaiters without throwing, regardless of whether the global queue-state singleton was created by the current bundle or carried over from an earlier bundle via hot-reload.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING