openclaw - 💡(How to fix) Fix [Bug]: gateway restart blocks main thread for ~75s — `sidecars.channels` takes 45s, then a second ~28s freeze right after "gateway ready" [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74325Fetched 2026-04-30 06:25:28
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
3
Author
Timeline (top)
cross-referenced ×2commented ×1

After every `openclaw gateway restart` on 2026.4.26, the gateway is unresponsive for ~75 seconds total, in two distinct synchronous phases visible from logs and from a plugin-side event-loop-lag monitor:

  1. Pre-ready (~45s): between `starting channels and sidecars...` and `gateway ready`. Almost entirely inside `sidecars.channels` (confirmed by `OPENCLAW_GATEWAY_STARTUP_TRACE=1`).
  2. Post-ready (~28s): a single ~28-30s synchronous block immediately after `gateway ready` is logged.

The result is that any plugin or local client that establishes a WebSocket connection during this window hits the gateway's 30s WS handshake timeout, and a third-party plugin's full recovery cycle ends up at ~80s after a restart even though the plugin's own work takes a fraction of a second.

This is not the same as #61278 (no hooks configured here), #73655 (single-restart, not a long-running leak; no Manifest plugin; tiny session state), or #74135 (consistent and reproducible on every restart, not intermittent).

Root Cause

  • TCP accept succeeds, but the gateway-side HTTP-upgrade handler can't run because the post-ready block is now in progress
  • 30s later, gateway-side `handshake timeout` fires and closes the connection
  • Plugin retries with backoff (5s typical)
  • Total: from `SIGTERM` to plugin actually serving RPCs, ~80 seconds

Code Example

startup trace: plugins.bootstrap                3015.7ms total=3015.7ms eventLoopMax=0.0ms
startup trace: runtime.early                    195.7ms  total=3229.9ms eventLoopMax=0.0ms
startup trace: http.bound                       97.1ms   total=3327.0ms eventLoopMax=0.0ms
startup trace: post-attach.update-sentinel      160.1ms  total=3488.8ms eventLoopMax=81.8ms
startup trace: sidecars.session-locks           13.5ms   total=3554.8ms eventLoopMax=10.2ms
startup trace: sidecars.gmail-watch             0.1ms    total=3556.2ms eventLoopMax=0.0ms
startup trace: sidecars.gmail-model             0.1ms    total=3557.4ms eventLoopMax=0.0ms
startup trace: sidecars.internal-hooks          0.1ms    total=3558.9ms eventLoopMax=0.0ms
startup trace: sidecars.channels                45481.6ms total=49042.0ms eventLoopMax=7725.9ms   ← problem
startup trace: sidecars.plugin-services         483.8ms  total=49529.1ms eventLoopMax=380.9ms
startup trace: sidecars.memory                  0.2ms    total=49530.6ms eventLoopMax=0.0ms
startup trace: sidecars.restart-sentinel        1.0ms    total=49532.5ms eventLoopMax=0.0ms
startup trace: sidecars.subagent-recovery       1.9ms    total=49535.3ms eventLoopMax=0.0ms
startup trace: sidecars.main-session-recovery   3.0ms    total=49541.0ms eventLoopMax=0.0ms
startup trace: sidecars.total                   46001.1ms total=49542.0ms eventLoopMax=0.0ms
startup trace: runtime.post-attach              46216.9ms total=49545.2ms eventLoopMax=0.0ms
startup trace: ready                            0.8ms    total=49546.0ms eventLoopMax=0.0ms

---

await measureStartup(params.startupTrace, "sidecars.channels", async () => {
    if (!skipChannels) try {
        await prewarmConfiguredPrimaryModel({ cfg: params.cfg, log: params.log });
        await params.startChannels();
    } catch (err) {}
});
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug

Summary

After every `openclaw gateway restart` on 2026.4.26, the gateway is unresponsive for ~75 seconds total, in two distinct synchronous phases visible from logs and from a plugin-side event-loop-lag monitor:

  1. Pre-ready (~45s): between `starting channels and sidecars...` and `gateway ready`. Almost entirely inside `sidecars.channels` (confirmed by `OPENCLAW_GATEWAY_STARTUP_TRACE=1`).
  2. Post-ready (~28s): a single ~28-30s synchronous block immediately after `gateway ready` is logged.

The result is that any plugin or local client that establishes a WebSocket connection during this window hits the gateway's 30s WS handshake timeout, and a third-party plugin's full recovery cycle ends up at ~80s after a restart even though the plugin's own work takes a fraction of a second.

This is not the same as #61278 (no hooks configured here), #73655 (single-restart, not a long-running leak; no Manifest plugin; tiny session state), or #74135 (consistent and reproducible on every restart, not intermittent).

Steps to reproduce

  1. Install OpenClaw 2026.4.26 (npm global, default 8 bundled plugins: `acpx, bonjour, browser, device-pair, memory-core, openclaw-coclaw, phone-control, talk-voice`)
  2. `hooks: null` in `openclaw.json`; primary model is a CLI provider (`openai-codex/gpt-5.5`); no Gmail/internal hooks; ~3 sessions on disk (~34 MB total)
  3. `openclaw gateway restart`
  4. Watch the gateway log; optionally set `OPENCLAW_GATEWAY_STARTUP_TRACE=1` to get per-step timing

Expected behavior

`gateway ready` is logged within a few seconds after `http server listening`, and the main thread is responsive immediately after. The bonjour mDNS service should not be reported as `stuck in announcing`. Local WS handshakes from plugins should complete in milliseconds.

Actual behavior

Repeatable across every restart on this host:

  • `http server listening` fires at T+3.3s
  • `starting channels and sidecars...` immediately after
  • ~45 seconds of mostly-blocked main thread
  • `gateway ready` at T+49.5s
  • Another ~28-30 second main-thread freeze right after
  • During the second freeze, any inbound WS handshake (from local clients including plugins) sits half-open and is closed by the gateway's 30s WS handshake timeout (`[ws] handshake timeout conn=… closed before connect …`)
  • bonjour watchdog fires `service stuck in announcing for ~35000ms`

Per-step timing from `OPENCLAW_GATEWAY_STARTUP_TRACE=1`

startup trace: plugins.bootstrap                3015.7ms total=3015.7ms eventLoopMax=0.0ms
startup trace: runtime.early                    195.7ms  total=3229.9ms eventLoopMax=0.0ms
startup trace: http.bound                       97.1ms   total=3327.0ms eventLoopMax=0.0ms
startup trace: post-attach.update-sentinel      160.1ms  total=3488.8ms eventLoopMax=81.8ms
startup trace: sidecars.session-locks           13.5ms   total=3554.8ms eventLoopMax=10.2ms
startup trace: sidecars.gmail-watch             0.1ms    total=3556.2ms eventLoopMax=0.0ms
startup trace: sidecars.gmail-model             0.1ms    total=3557.4ms eventLoopMax=0.0ms
startup trace: sidecars.internal-hooks          0.1ms    total=3558.9ms eventLoopMax=0.0ms
startup trace: sidecars.channels                45481.6ms total=49042.0ms eventLoopMax=7725.9ms   ← problem
startup trace: sidecars.plugin-services         483.8ms  total=49529.1ms eventLoopMax=380.9ms
startup trace: sidecars.memory                  0.2ms    total=49530.6ms eventLoopMax=0.0ms
startup trace: sidecars.restart-sentinel        1.0ms    total=49532.5ms eventLoopMax=0.0ms
startup trace: sidecars.subagent-recovery       1.9ms    total=49535.3ms eventLoopMax=0.0ms
startup trace: sidecars.main-session-recovery   3.0ms    total=49541.0ms eventLoopMax=0.0ms
startup trace: sidecars.total                   46001.1ms total=49542.0ms eventLoopMax=0.0ms
startup trace: runtime.post-attach              46216.9ms total=49545.2ms eventLoopMax=0.0ms
startup trace: ready                            0.8ms    total=49546.0ms eventLoopMax=0.0ms

The pre-ready block is overwhelmingly inside `sidecars.channels`: 45.5 seconds wall time, with the worst single synchronous chunk at ~7.7 seconds. The other sidecar steps are all sub-second.

In `startGatewaySidecars` (`server.impl-*.js`), this corresponds to:

await measureStartup(params.startupTrace, "sidecars.channels", async () => {
    if (!skipChannels) try {
        await prewarmConfiguredPrimaryModel({ cfg: params.cfg, log: params.log });
        await params.startChannels();
    } catch (err) {}
});

Likely subjects:

  • `prewarmConfiguredPrimaryModel` — could touch `ensureOpenClawModelsJson`, `resolveModel`, and a chain of dynamic imports. Even though our primary is a CLI provider (`isCliProvider(...) === true`), the lookups for `isConfiguredCliBackendPrimary` and the parallel `Promise.all` of imports run unconditionally before the early-return.
  • `startChannels` — runs all channels' `start()` in parallel (8 workers), but for the bundled `bonjour`/`browser`/`device-pair`/`memory-core`/`acpx`/`phone-control`/`talk-voice` plugins something synchronous is clearly happening (the watchdog reports `service stuck in announcing for 35s`, which means mDNS callbacks couldn't run for 35s — only consistent with a busy main thread, not just async I/O).

Post-ready ~28s block

The startup trace stops emitting at the `ready` mark (line: `if (name === "ready") eventLoopDelay?.disable();`), so the second freeze is invisible to it. We caught it from a plugin-side `setInterval(200)` lag monitor across multiple restarts — values consistent at 27-31 seconds. The main thread does not become responsive until that freeze ends; only after that does the gateway accept the first real WS upgrade. Source unknown — candidates inside `startGatewayPostAttachRuntime` continuation, the `gateway_start` hook runner setup (even though we have no user hooks configured), or some lazy import that fires on first activity.

If useful, a similar `OPENCLAW_GATEWAY_STARTUP_TRACE_AFTER_READY=1` (or just keeping the trace alive past `ready`) would let people pin this down precisely.

Cascading effect on plugins

A third-party channel plugin that opens a local gateway WS during `sidecars.plugin-services` (right after the ~45s block) sees:

  • TCP accept succeeds, but the gateway-side HTTP-upgrade handler can't run because the post-ready block is now in progress
  • 30s later, gateway-side `handshake timeout` fires and closes the connection
  • Plugin retries with backoff (5s typical)
  • Total: from `SIGTERM` to plugin actually serving RPCs, ~80 seconds

End-user clients connected via WebRTC see their data channel re-established ~5 seconds before the plugin's gateway WS is back, so any RPC they issue in that window fails with `gateway_not_ready` despite the channel looking healthy.

OpenClaw version

2026.4.26 (`be8c246`)

Operating system

Linux 6.6.87.2-microsoft-standard-WSL2 (Ubuntu on WSL2), Node.js v22.21.1, x86_64

Model

openai-codex/gpt-5.5 (CLI provider; not exercised by the blocking work)

Provider / routing chain

N/A

Install method

npm global

Logs, screenshots, and evidence

Per-step timings reproduced above. Plugin-side event-loop lag monitor across multiple restarts shows the same shape every time:

  • Pre-ready: chunked sync work totaling ~44s (sample chunks: 1762ms, 730ms, 6515ms, 4857ms, plus larger blocks)
  • Post-ready: a single chunk of 27.2s - 30.6s

bonjour: `service stuck in announcing for 35170ms` (consistent across restarts, ~35s) `[ws] handshake timeout` on every local WS that opened during the freeze.

Additional information

Repro is fully deterministic on this host: every `openclaw gateway restart` reproduces the same shape, with timing variance under a couple of seconds. The trace was captured with `OPENCLAW_GATEWAY_STARTUP_TRACE=1` set on the systemd unit; the trace itself has negligible overhead.

We're happy to run further instrumented restarts (e.g., logging which channel plugin's `start()` is occupying time, dumping `process._getActiveHandles()` mid-block, or testing `OPENCLAW_SKIP_CHANNELS=1` to confirm the bisection) if the maintainers want more data.


Reported by the CoClaw team. This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.

extent analysis

TL;DR

The OpenClaw gateway experiences a 75-second unresponsive period after restart, caused by synchronous blocks in the sidecars.channels step and an unknown post-ready block, which can be mitigated by investigating and optimizing the prewarmConfiguredPrimaryModel and startChannels functions.

Guidance

  1. Investigate prewarmConfiguredPrimaryModel: This function is a likely cause of the pre-ready block, and its optimization could significantly reduce the unresponsive period.
  2. Analyze startChannels: The startChannels function runs all channels' start() in parallel, but something synchronous is happening, causing the main thread to freeze; identifying and addressing this issue could help.
  3. Enable OPENCLAW_GATEWAY_STARTUP_TRACE_AFTER_READY=1: This could help pinpoint the cause of the post-ready block by providing more detailed tracing information.
  4. Test OPENCLAW_SKIP_CHANNELS=1: This might confirm whether the issue is indeed related to the sidecars.channels step and help with bisection.
  5. Monitor event-loop lag: Continue using the plugin-side event-loop lag monitor to gather more data on the pre-ready and post-ready blocks.

Example

No specific code example is provided, as the issue requires further investigation into the prewarmConfiguredPrimaryModel and startChannels functions.

Notes

The provided information suggests that the issue is specific to the OpenClaw version 2026.4.26 and the described environment. Further investigation and data collection are necessary to determine the root cause and develop a comprehensive fix.

Recommendation

Apply a workaround by optimizing the prewarmConfiguredPrimaryModel and startChannels functions, and consider enabling OPENCLAW_GATEWAY_STARTUP_TRACE_AFTER_READY=1 to gather more information about the post-ready block.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

`gateway ready` is logged within a few seconds after `http server listening`, and the main thread is responsive immediately after. The bonjour mDNS service should not be reported as `stuck in announcing`. Local WS handshakes from plugins should complete in milliseconds.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: gateway restart blocks main thread for ~75s — `sidecars.channels` takes 45s, then a second ~28s freeze right after "gateway ready" [1 comments, 2 participants]