openclaw - 💡(How to fix) Fix Event loop blocked for ~97s during startup — startPluginServices causes 6min restart delay [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72960Fetched 2026-04-28 06:29:30
View on GitHub
Comments
2
Participants
2
Timeline
8
Reactions
0
Timeline (top)
cross-referenced ×3commented ×2labeled ×2closed ×1

Error Message

2026-04-28T01:00:20.723+08:00 [qqbot/native-approvals] connect error: gateway request timeout for connect 2026-04-28T01:00:20.730+08:00 gateway connect failed: Error: gateway request timeout for connect

Root Cause

Root Cause Analysis

Code Example

2026-04-28T00:58:43.855+08:00 [telegram] webhook advertised to telegram on https://open.xinzhi.de/telegram-webhook

---

2026-04-28T01:00:20.348+08:00 [ws] handshake timeout conn=1e6d105e... peer=127.0.0.1:34124->127.0.0.1:18789 lastFrame=connect handshakeMs=96967

---

const handshakeTimeoutMs = getPreauthHandshakeTimeoutMsFromEnv(); // returns 10000
const handshakeTimer = setTimeout(() => {
    if (!client) {
        handshakeState = "failed";
        setCloseCause("handshake-timeout", {
            handshakeMs: Date.now() - openedAt,  // computed as 96967 instead of ~10000
            endpoint
        });
        logWsControl.warn(`handshake timeout conn=${connId}...`);
        close();
    }
}, handshakeTimeoutMs);

---

2026-04-28T01:00:20.723+08:00 [qqbot/native-approvals] connect error: gateway request timeout for connect
2026-04-28T01:00:20.730+08:00 gateway connect failed: Error: gateway request timeout for connect

---

2026-04-28T01:02:11.055+08:00 [plugins] active-memory: start timeoutMs=30000
2026-04-28T01:03:16.302+08:00 [plugins] active-memory: done status=timeout elapsedMs=65250 summaryChars=0
RAW_BUFFERClick to expand / collapse

Environment

  • OpenClaw version: 2026.4.25 (aa36ee6)
  • Node.js: v22.22.2
  • OS: Debian 12 (bookworm), Linux 6.1.0-34-amd64, x86_64
  • Enabled plugins: qqbot, telegram, memory-core, memory-wiki, active-memory, brave, deepseek, voyage
  • Agent model: cpam/glm-5.1 (default), deepseek/deepseek-v4-pro

Problem

After systemctl --user restart openclaw-gateway.service, it takes ~6 minutes 18 seconds from restart to the first /new reply being delivered. A normal restart should complete in under 30 seconds.

Root Cause Analysis

1. Event loop completely blocked for 97 seconds during startup

The gateway log file (/tmp/openclaw/openclaw-2026-04-28.log) shows a 97-second gap with ZERO log entries between channel initialization and the next batch of events:

MinuteLog countWhat happened
00:5724Gateway ready, channels starting
00:5811QQBot/Telegram providers initialized, [telegram] webhook advertised
00:590Complete silence — event loop blocked
01:0018All 18 queued events fire at once

Last log before block:

2026-04-28T00:58:43.855+08:00 [telegram] webhook advertised to telegram on https://open.xinzhi.de/telegram-webhook

First log after block:

2026-04-28T01:00:20.348+08:00 [ws] handshake timeout conn=1e6d105e... peer=127.0.0.1:34124->127.0.0.1:18789 lastFrame=connect handshakeMs=96967

2. Server-side WebSocket handshake timeout confirms event loop blocking

The configured default preauth handshake timeout is 10 seconds (DEFAULT_PREAUTH_HANDSHAKE_TIMEOUT_MS = 1e4 in client-BcOgCTuB.js), but the log shows handshakeMs=96967 (~97 seconds). This means:

  • The setTimeout(fn, 10000) callback couldn't fire until 97 seconds after it was armed
  • The connect frame arrived at the TCP level (lastFrame=connect), but the JS on('message') handler never ran
  • This is a clear indicator that the Node.js event loop was completely stalled

The handshake timeout code in question (server.impl-C1dgKTkE.js):

const handshakeTimeoutMs = getPreauthHandshakeTimeoutMsFromEnv(); // returns 10000
const handshakeTimer = setTimeout(() => {
    if (!client) {
        handshakeState = "failed";
        setCloseCause("handshake-timeout", {
            handshakeMs: Date.now() - openedAt,  // computed as 96967 instead of ~10000
            endpoint
        });
        logWsControl.warn(`handshake timeout conn=${connId}...`);
        close();
    }
}, handshakeTimeoutMs);

3. Location: startGatewaySidecarsstartPluginServices phase

From the source code in server.impl-C1dgKTkE.js, the startGatewaySidecars function executes in this order:

  1. startChannels() — completes at 00:58:43 (QQBot + Telegram initialized) ✅
  2. startPluginServices()this is where the event loop blocks 🔴
  3. setImmediate(() => startGatewayMemoryBackend(...)) — never reached during the block
  4. Other sidecar startups — all deferred

The blocking happens inside startPluginServices() which iterates over plugin services and calls service.start(serviceContext) on each one. One of the plugin services (likely memory-core or memory-wiki) performs synchronous work that blocks the event loop.

4. Impact: QQBot native approvals cannot complete handshake

The QQBot native approval handler creates an internal GatewayClient that connects to ws://127.0.0.1:18789. The TCP+WS upgrade succeeds (handled by OS kernel), the client sends a connect frame, but the server can never process it because the event loop is blocked. After 97 seconds:

2026-04-28T01:00:20.723+08:00 [qqbot/native-approvals] connect error: gateway request timeout for connect
2026-04-28T01:00:20.730+08:00 gateway connect failed: Error: gateway request timeout for connect

5. Secondary issue: active-memory plugin timeout not enforced

Separately, the active-memory plugin has timeoutMs: 30000 configured, but ran for 65250ms (65 seconds):

2026-04-28T01:02:11.055+08:00 [plugins] active-memory: start timeoutMs=30000
2026-04-28T01:03:16.302+08:00 [plugins] active-memory: done status=timeout elapsedMs=65250 summaryChars=0

This blocked the user's /new session from being processed for an additional 65 seconds after the event loop recovered, plus a 32-second failover decision delay.

Complete Timeline

TimeEventDuration
00:57:38systemctl restart issued
00:57:56Gateway ready (10.8s core init)+18s
00:57:56 → 00:58:43Channel init (QQBot + Telegram)+47s
00:58:43 → 01:00:20Event loop blocked (startPluginServices)+97s 🔴
01:00:20 → 01:01:11System recovers, QQBot WS connects+51s
01:02:11 → 01:03:16active-memory blocks (30s timeout ignored)+65s 🔴
01:03:48Failover decision+32s
01:03:56User receives /new replyTotal: 6m18s

Suggested Fix

  1. Identify the synchronous bottleneck in plugin service startups — the 97-second event loop block during startPluginServices() is the primary issue. One of the plugin services is doing heavy synchronous work (likely database/file initialization in memory-core or memory-wiki). This should be made async or deferred.

  2. Enforce active-memory timeout — the configured 30s timeout (timeoutMs) is not being respected; the plugin ran for 65 seconds.

  3. Consider deferring non-critical plugin initstartPluginServices() and memory backends could be started with setImmediate or staggered to avoid blocking the event loop during the critical channel startup phase.

extent analysis

TL;DR

The most likely fix involves identifying and addressing the synchronous bottleneck in plugin service startups, specifically within the startPluginServices() function, to prevent the 97-second event loop block.

Guidance

  • Review the startPluginServices() function to identify which plugin service is causing the synchronous blockage, focusing on memory-core or memory-wiki as likely candidates.
  • Modify the identified plugin service to perform its initialization asynchronously to prevent blocking the event loop.
  • Enforce the active-memory plugin's timeout configuration to prevent it from running beyond its specified limit, ensuring it does not contribute to additional delays.
  • Consider staggering or deferring the startup of non-critical plugins to further reduce the load on the event loop during critical phases like channel initialization.

Example

An example of how to modify a synchronous initialization to an asynchronous one might involve using callbacks, promises, or async/await. For instance, if a plugin's start method is blocking:

// Before: Synchronous initialization
service.start = function() {
  // Heavy synchronous work here
  // ...
};

// After: Asynchronous initialization
service.start = function(callback) {
  // Perform heavy work asynchronously
  asyncHeavyWork(function(err, result) {
    // Handle result or error
    callback(err, result);
  });
};

Notes

The exact modifications will depend on the specific code and requirements of the plugin services. It's crucial to ensure that any asynchronous changes do not introduce new issues, such as race conditions or unhandled promises.

Recommendation

Apply the workaround by modifying the plugin services to initialize asynchronously, focusing on the memory-core and memory-wiki plugins first, to address the primary cause of the event loop blockage. This should significantly reduce the restart time and improve the overall responsiveness of the system.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Event loop blocked for ~97s during startup — startPluginServices causes 6min restart delay [2 comments, 2 participants]