openclaw - ✅(Solved) Fix [Bug]: [regression] Gateway channel sidecar startup blocked by chat.history WS request (~80s delay) since v2026.4.8 [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63450Fetched 2026-04-09 07:53:35
View on GitHub
Comments
1
Participants
2
Timeline
5
Reactions
1
Author
Timeline (top)
labeled ×2commented ×1cross-referenced ×1subscribed ×1

Since v2026.4.8, the gateway takes ~110 seconds after starting channels and sidecars... before Telegram, browser control, and plugins actually come up. This is a regression from v2026.3.23-2, where channels were fully up within ~3 seconds of the gateway being ready.

Root Cause

Root cause (observed)

Fix Action

Fixed

PR fix notes

PR #63480: fix(gateway): start channels before WebSocket handlers (#63450)

Description (problem / solution / changelog)

Summary

  • Problem: WebSocket RPC (chat.history, etc.) was enabled before startGatewaySidecars finished. chat.history performs large synchronous session-store and transcript reads on the main thread, which can block the event loop and delay Telegram, browser control, and plugin startup (see #63450).
  • Why it matters: Operators saw multi-minute gaps after starting channels and sidecars... while a pending chat.history completed; channels stayed down until then.
  • What changed: Run deferred channel-plugin reload (when applicable), then startGatewaySidecars, then attach WebSocket handlers. Gateway WebSocket upgrades already return 503 until wss connection listeners exist, so clients do not run heavy RPC during this window.
  • What did NOT change: Handler implementations, chat.history semantics, and Tailscale/update-check ordering after WS attach.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #63450
  • Related #63450
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: Startup ordered WebSocket message handling before channels/sidecars. Synchronous file reads and JSON work in chat.history starved the event loop, so async channel startup did not complete until that work finished.
  • Missing detection / guardrail: Ordering was not covered by an integration test asserting WS attach vs channel start.
  • Contributing context (if known): Regression reported since v2026.4.8; consistent with sync loadSessionStore / readSessionMessages paths.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/gateway/server.minimal-channel-pin.test.ts, src/gateway/server.shared-token-session-rotation.test.ts (gateway startup smoke).
  • Scenario the test should lock in: Full gateway boot completes with channels and WS both healthy.
  • Why this is the smallest reliable guardrail: Ordering fix is structural; dedicated timing tests are flaky; existing gateway server tests still pass after reorder.
  • Existing test that already covers this (if any): Gateway tests above exercise startGatewayServer.
  • If no new test is added, why not: Behavior is startup sequencing; no stable hook exposed for a small unit test without refactoring server.impl.ts for injection.

User-visible / Behavior Changes

  • WebSocket clients may observe a slightly longer window where the gateway HTTP server is up but WebSocket upgrades return 503 (Gateway websocket handlers unavailable) until channels and sidecars have started; then RPC behaves as before.

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux/macOS dev (CI covers the rest)
  • Runtime/container: Node 22+
  • Model/provider: N/A
  • Integration/channel (if any): N/A
  • Relevant config (redacted): N/A

Steps

  1. pnpm check
  2. pnpm test src/gateway/server.minimal-channel-pin.test.ts src/gateway/server.shared-token-session-rotation.test.ts

Expected

  • Tests pass; gateway startup ordering no longer allows WS RPC before channel/sidecar start.

Actual

  • pnpm check passed locally; scoped gateway tests passed.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets (repro described in #63450: chat.history duration matches channel startup delay)
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: Typecheck/lint (pnpm check); gateway server smoke tests listed above.
  • Edge cases checked: minimalTestGateway path unchanged (sidecars still skipped when that env is set).
  • What you did not verify: Full pnpm test suite and pnpm build (local pnpm build failed on unrelated acpx/runtime resolution in this workspace).

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: Slightly longer period where HTTP is listening but WebSocket upgrades return 503 until sidecars finish.
    • Mitigation: Same 503 behavior existed before attachGatewayWsHandlers; this only extends that window to cover channel startup, which is typically short relative to a stuck chat.history.

Changed files

  • CHANGELOG.md (modified, +2/-0)
  • src/gateway/server-http.ts (modified, +7/-4)
  • src/gateway/server.impl.ts (modified, +29/-25)
  • src/gateway/server.preauth-hardening.test.ts (modified, +1/-1)
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Summary

Since v2026.4.8, the gateway takes ~110 seconds after starting channels and sidecars... before Telegram, browser control, and plugins actually come up. This is a regression from v2026.3.23-2, where channels were fully up within ~3 seconds of the gateway being ready.

Environment

  • OpenClaw: v2026.4.8 (9ece252)
  • OS: Ubuntu (systemd user service)
  • Node: via linuxbrew
  • Channels: Telegram, Gmail watcher, browser control

Root cause (observed)

Using journalctl -o short-monotonic, the gap is fully explained by a chat.history WebSocket request that takes ~81 seconds to complete, during which channel sidecars are blocked from starting:

[190837.244] [gateway] starting channels and sidecars... [190837.500] [hooks] loaded 4 internal hook handlers ← 110 seconds of silence → [190947.524] [ws] ⇄ res ✓ chat.history 81075ms [190947.532] [telegram] [default] starting provider [190947.760] [plugins] embedded acpx runtime backend registered

In v2026.3.23-2, Telegram started within ~3 seconds of starting channels and sidecars.

Steps to reproduce

In Openclaw v2026.4.8, restart gateway

Expected behavior

Channel sidecars (Telegram, browser control, plugins) should start independently of pending WebSocket requests. A slow or large chat.history response should not block the gateway startup sequence.

Actual behavior

Using journalctl -o short-monotonic, the gap is fully explained by a chat.history WebSocket request that takes ~81 seconds to complete, during which channel sidecars are blocked from starting:

OpenClaw version

2026.4.8 (9ece252)

Operating system

Ubuntu Server 24.04

Install method

npm global / systemd service

Model

n/a

Provider / routing chain

openclaw -> *

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

  • Session file for agent:main:main is 737KB — not excessively large
  • sessions.json index is 6.8MB (91 sessions × ~142KB each, dominated by skillsSnapshot fields)
  • The chat.history latency is consistent across restarts (~81s every time)
  • No errors in logs — just a silent block

extent analysis

TL;DR

  • The gateway startup sequence can be improved by investigating and optimizing the chat.history WebSocket request that is currently blocking channel sidecars from starting.

Guidance

  • Review the chat.history WebSocket request to understand why it is taking ~81 seconds to complete and see if there are any optimization opportunities.
  • Consider implementing a timeout or asynchronous processing for the chat.history request to prevent it from blocking the startup of channel sidecars.
  • Investigate the sessions.json index size and the skillsSnapshot fields to determine if they are contributing to the slow chat.history response.
  • Check if there are any differences in the implementation or configuration of the chat.history request between versions v2026.3.23-2 and v2026.4.8 that could be causing the regression.

Example

  • No code snippet is provided as the issue does not contain sufficient information about the implementation of the chat.history request.

Notes

  • The issue seems to be related to a specific regression in version v2026.4.8, and the fix may involve reverting or modifying changes made in this version.
  • The chat.history request latency is consistent across restarts, which suggests that the issue is not related to external factors such as network connectivity.

Recommendation

  • Apply workaround: Implement a timeout or asynchronous processing for the chat.history request to prevent it from blocking the startup of channel sidecars, as this will allow the gateway to start up more quickly and provide a better user experience.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Channel sidecars (Telegram, browser control, plugins) should start independently of pending WebSocket requests. A slow or large chat.history response should not block the gateway startup sequence.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: [regression] Gateway channel sidecar startup blocked by chat.history WS request (~80s delay) since v2026.4.8 [1 pull requests, 1 comments, 2 participants]