openclaw - 💡(How to fix) Fix [Bug]: Regression — channel sidecar startup again blocks for ~3 min after `ready` on v2026.4.25 (recurrence of #63450) [13 comments, 11 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72846Fetched 2026-04-28 06:31:32
View on GitHub
Comments
13
Participants
11
Timeline
35
Reactions
3
Author
Timeline (top)
commented ×13cross-referenced ×9subscribed ×6labeled ×3

The bug fixed in #63450 / PR #63480 ("Gateway channel sidecar startup blocked by chat.history WS request, ~80–110 s delay since v2026.4.8", closed 2026-04-09) has returned in v2026.4.25 with a longer delay (~180 s instead of ~80–110 s). The original issue thread is locked to comments, so filing this as a separate regression report.

Symptom matches #63450's description nearly verbatim: between the starting channels and sidecars... log line and the moment channels (Browser control, Telegram provider, acpx runtime) actually start, the gateway sits silent for ~3 minutes. CLI WebSocket handshakes time out during the window, and any inbound Telegram message that arrives during it gets queued and replied to ~2 minutes after it lands.

Error Message

  1. During the ~3-minute window, run any openclaw CLI command (e.g. openclaw cron list). It hangs, and after 30 s the gateway log shows WARN handshake timeout conn=... peer=127.0.0.1:<ephemeral>. 13:12:53.xxx WARN handshake timeout conn=... peer=127.0.0.1:56754 -> 127.0.0.1:18789 13:55:44.533 WARN handshake timeout conn=... peer=127.0.0.1:32940 -> 127.0.0.1:18789
  • No plugin error in the ready line — all 10 plugins reported loaded successfully.

Root Cause

If any of these reintroduced a sync chat.history read on the channel-startup path (which was the root cause #63480 originally fixed by deferring), the symptom and timing would match exactly. Worth comparing the channel-startup hook order against the pre-#63480 codepath.

Fix Action

Fix / Workaround

If any of these reintroduced a sync chat.history read on the channel-startup path (which was the root cause #63480 originally fixed by deferring), the symptom and timing would match exactly. Worth comparing the channel-startup hook order against the pre-#63480 codepath.

Mitigation suggestion

Code Example

13:09:48.391  INFO  ready (10 plugins: acpx, active-memory, bonjour, browser, device-pair,
                    memory-core, memory-wiki, phone-control, talk-voice, telegram; 10.2s)
13:09:48.521  INFO  starting channels and sidecars...
13:09:48.733  INFO  loaded 4 internal hook handlers
~3 min of silence ↑
13:12:53.xxx  WARN  handshake timeout conn=... peer=127.0.0.1:56754 -> 127.0.0.1:18789
13:12:59.xxx  INFO  Browser control listening on http://127.0.0.1:18791/ (auth=token)
13:12:59.xxx  INFO  [default] starting provider (@raywu07_bot)

---

13:52:48.055  INFO  ready (10 plugins: ...; 11.6s)
13:52:48.190  INFO  starting channels and sidecars...
13:52:48.390  INFO  loaded 4 internal hook handlers
~3 min of silence ↑
13:55:44.533  WARN  handshake timeout conn=... peer=127.0.0.1:32940 -> 127.0.0.1:18789
13:55:44.553  INFO  embedded acpx runtime backend registered
13:55:44.897  INFO  Browser control listening on http://127.0.0.1:18791/ (auth=token)
13:55:45.823  INFO  [default] starting provider (@raywu07_bot)
14:00:39.221  INFO  telegram sendMessage ok chat=... message=5593
                    ↑ first reply finally sent — 8 min after restart
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

The bug fixed in #63450 / PR #63480 ("Gateway channel sidecar startup blocked by chat.history WS request, ~80–110 s delay since v2026.4.8", closed 2026-04-09) has returned in v2026.4.25 with a longer delay (~180 s instead of ~80–110 s). The original issue thread is locked to comments, so filing this as a separate regression report.

Symptom matches #63450's description nearly verbatim: between the starting channels and sidecars... log line and the moment channels (Browser control, Telegram provider, acpx runtime) actually start, the gateway sits silent for ~3 minutes. CLI WebSocket handshakes time out during the window, and any inbound Telegram message that arrives during it gets queued and replied to ~2 minutes after it lands.

Steps to reproduce

  1. Install v2026.4.25 stable on Linux Debian 11 (Node 24.13.0 via NVM, npm-global install). Eager bundled-plugin postinstall (OPENCLAW_EAGER_BUNDLED_PLUGIN_DEPS=1) completed cleanly — ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.25-<hash>/ is fully populated, so this is not on-demand plugin compilation.
  2. Restart the gateway (systemctl --user restart openclaw-gateway on a user systemd setup).
  3. Watch the gateway log. The ready (N plugins: ...; XX.Xs) line lands in ~10–12 s. The next non-noise log entry is ~3 minutes later.
  4. During the ~3-minute window, run any openclaw CLI command (e.g. openclaw cron list). It hangs, and after 30 s the gateway log shows WARN handshake timeout conn=... peer=127.0.0.1:<ephemeral>.
  5. Send a Telegram message to the bot during the window. The reply is sent ~2 minutes later (specifically, telegram sendMessage ok appears in the gateway log roughly 8 minutes after the restart).

Expected behavior

Per the resolution of #63450, channel sidecars should start within ~3 s of starting channels and sidecars, not ~3 minutes. PR #63480 fixed this regression at v4.8; v4.25 has reintroduced an equivalent or worse blocker.

Actual behavior — log evidence (two independent restarts on same machine)

Restart 1 — 2026-04-27 13:09:33Z

13:09:48.391  INFO  ready (10 plugins: acpx, active-memory, bonjour, browser, device-pair,
                    memory-core, memory-wiki, phone-control, talk-voice, telegram; 10.2s)
13:09:48.521  INFO  starting channels and sidecars...
13:09:48.733  INFO  loaded 4 internal hook handlers
                    ↑ ~3 min of silence ↑
13:12:53.xxx  WARN  handshake timeout conn=... peer=127.0.0.1:56754 -> 127.0.0.1:18789
13:12:59.xxx  INFO  Browser control listening on http://127.0.0.1:18791/ (auth=token)
13:12:59.xxx  INFO  [default] starting provider (@raywu07_bot)

Restart 2 — 2026-04-27 13:52:31Z

13:52:48.055  INFO  ready (10 plugins: ...; 11.6s)
13:52:48.190  INFO  starting channels and sidecars...
13:52:48.390  INFO  loaded 4 internal hook handlers
                    ↑ ~3 min of silence ↑
13:55:44.533  WARN  handshake timeout conn=... peer=127.0.0.1:32940 -> 127.0.0.1:18789
13:55:44.553  INFO  embedded acpx runtime backend registered
13:55:44.897  INFO  Browser control listening on http://127.0.0.1:18791/ (auth=token)
13:55:45.823  INFO  [default] starting provider (@raywu07_bot)
14:00:39.221  INFO  telegram sendMessage ok chat=... message=5593
                    ↑ first reply finally sent — 8 min after restart

Diagnostic notes

  • Telegram channel itself is healthy throughout the windowgetMe, getMyCommands, getWebhookInfo all respond fast (<300 ms) over the Bot API. The bot is reachable; the gateway just hasn't started its provider client yet.
  • HTTP routes work fastGET /health returns 200 in ~5 ms during the window. Only WebSocket handshakes (/__openclaw__/ws) hang.
  • No plugin error in the ready line — all 10 plugins reported loaded successfully.
  • OPENCLAW_EAGER_BUNDLED_PLUGIN_DEPS=1 does not help — the bundled-plugin install ran at upgrade time and the staged tree was complete by the time the gateway started. Whatever the channel-startup is waiting on, it's not on-disk plugin compilation.
  • Possibly related secondary symptom: active-memory plugin's pre-reply sub-agent run elapses ~45 s for a configured timeoutMs: 15000. Documented separately on #71127. If both symptoms share the same lock/queue holdup, fixing this regression should fix that secondary case too.

Suspected cause (informed guess)

The v2026.4.25 release notes touch the runtime-context / chat-history path in several places:

  • "Heartbeat, cron, and exec wakeups submitted as transient runtime context (removed from visible transcripts)"
  • "Sessions separate reset freshness from store updatedAt (heartbeat/cron/exec no longer prevent daily/idle resets)"
  • "Embedded runtime context sent as hidden next-turn custom message (not visible user prompt)"
  • "Doctor repairs 2026.4.24 transcripts with duplicated prompt-rewrite branches"

If any of these reintroduced a sync chat.history read on the channel-startup path (which was the root cause #63480 originally fixed by deferring), the symptom and timing would match exactly. Worth comparing the channel-startup hook order against the pre-#63480 codepath.

Environment

ItemValue
OpenClaw version2026.4.25 (stable)
OSDebian 11 (AWS Bitnami)
Node24.13.0 (NVM)
Installnpm install -g [email protected] --ignore-scripts + restore deps + eager bundled-plugin postinstall
Gateway servicesystemd user-level
Plugins enabledacpx, active-memory, bonjour (dormant via OPENCLAW_DISABLE_BONJOUR=1), browser, device-pair, memory-core, memory-wiki, phone-control, talk-voice, telegram
Hardening flags setOPENCLAW_SERVICE_REPAIR_POLICY=external, OPENCLAW_DISABLE_BONJOUR=1, OPENCLAW_EAGER_BUNDLED_PLUGIN_DEPS=1

Mitigation suggestion

Two options worth considering, possibly in combination:

  1. Restore #63480's defer behavior — whatever code path is again synchronously waiting on chat.history (or an equivalent runtime-context build) during channel startup should be deferred or made async, the same way #63480 originally addressed it.
  2. Stop logging ready (...) until channels are actually up — the misleading ready log makes ops scripts (and humans) assume the gateway is usable when it isn't. Either delay the ready line by ~3 minutes (matches reality) or emit a separate available event after channels register.

Related

  • #63450 (closed by #63480 at v4.8 → regression returns at v4.25)
  • #71127 (stuck-session symptom — likely caused by the same blocker; v4.25 deterministic repro added in a comment there)
  • #51469 (CLI handshake timeout too short for cold-start — partially overlaps; their root cause is ESM compile, ours is the channel-startup blocker, but both manifest as CLI handshake timeout for the user)

extent analysis

TL;DR

The regression in v2026.4.25 can be fixed by restoring the defer behavior from #63480, which made the chat.history read asynchronous during channel startup.

Guidance

  • Review the changes made in the v2026.4.25 release notes, specifically the modifications to the runtime-context and chat-history path, to identify the potential cause of the regression.
  • Compare the channel-startup hook order against the pre-#63480 codepath to determine if any changes reintroduced a synchronous chat.history read.
  • Consider implementing a temporary workaround by delaying the ready log line until channels are actually up or emitting a separate available event after channels register.
  • Investigate the possibility of making the chat.history read asynchronous during channel startup, similar to the fix in #63480.

Example

No code snippet is provided as the issue does not contain sufficient information to create a specific example.

Notes

The root cause of the regression is likely related to the changes made in the v2026.4.25 release, specifically the modifications to the runtime-context and chat-history path. The fix will depend on identifying and addressing the specific change that reintroduced the synchronous chat.history read.

Recommendation

Apply a workaround by restoring the defer behavior from #63480, which made the chat.history read asynchronous during channel startup, to fix the regression. This change should be made to the channel-startup codepath to ensure that the chat.history read is performed asynchronously, allowing the channels to start up quickly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Per the resolution of #63450, channel sidecars should start within ~3 s of starting channels and sidecars, not ~3 minutes. PR #63480 fixed this regression at v4.8; v4.25 has reintroduced an equivalent or worse blocker.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Regression — channel sidecar startup again blocks for ~3 min after `ready` on v2026.4.25 (recurrence of #63450) [13 comments, 11 participants]