openclaw - ✅(Solved) Fix Discord gateway never reaches ready state after stale-socket recovery [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58577Fetched 2026-04-08 02:00:46
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×2closed ×1locked ×1subscribed ×1

Discord channel never reaches "ready" state after a stale-socket recovery cycle, leaving the bot unable to send/receive messages. The gateway WebSocket connects and exchanges heartbeats, but Carbon's ready event never fires.

Root Cause

  • TCP connection to gateway.discord.gg succeeds (ESTAB, ~44KB received from Discord)
  • Discord heartbeats are exchanged (bytes_sent/bytes_received increase over time)
  • But the Carbon ready event never fires
  • Log stays permanently at: [discord] client initialized as ...; awaiting gateway readiness
  • Heartbeat module reports Unknown Channel because channel cache is never populated
  • All message delivery fails with Unknown Channel

PR fix notes

PR #58958: fix(discord): unify gateway reconnect ownership

Description (problem / solution / changelog)

This removes the Discord gateway reconnect split-brain between Carbon and OpenClaw.

OpenClaw now owns reconnect decisions end-to-end, treats Carbon reconnect exhaustion as transport noise instead of a lifecycle-fatal event, and reconnects from one controller for both close-driven recovery and stale-socket recovery.

Fixes #58764 Fixes #58577

Changed files

  • extensions/discord/src/gateway-logging.test.ts (modified, +4/-4)
  • extensions/discord/src/gateway-logging.ts (modified, +1/-1)
  • extensions/discord/src/monitor.gateway.test.ts (modified, +0/-37)
  • extensions/discord/src/monitor.gateway.ts (modified, +0/-6)
  • extensions/discord/src/monitor/gateway-handle.ts (modified, +1/-0)
  • extensions/discord/src/monitor/provider.lifecycle.reconnect.ts (removed, +0/-497)
  • extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +74/-573)
  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +84/-32)
  • extensions/discord/src/monitor/provider.test.ts (modified, +2/-2)
  • src/agents/pi-hooks/context-pruning.test.ts (modified, +2/-0)

PR #59019: refactor(discord): let carbon own gateway reconnects

Description (problem / solution / changelog)

Fixes #58764 Fixes #58577

Carbon already owns the Discord gateway reconnect/resume state machine, so this removes the OpenClaw-side reconnect controller and hands reconnect behavior back to Carbon.

OpenClaw now only supervises startup readiness and lifecycle shutdown. It no longer reads gateway debug events as control flow, no longer suppresses reconnect exhaustion, and no longer carries the stale-socket/force-stop reconnect patch layer.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/discord/src/gateway-logging.test.ts (modified, +13/-8)
  • extensions/discord/src/gateway-logging.ts (modified, +3/-3)
  • extensions/discord/src/monitor.gateway.test.ts (modified, +4/-24)
  • extensions/discord/src/monitor.gateway.ts (modified, +0/-1)
  • extensions/discord/src/monitor/provider.lifecycle.reconnect.ts (removed, +0/-497)
  • extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +142/-529)
  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +239/-27)
  • extensions/discord/src/monitor/provider.test.ts (modified, +2/-2)

Code Example

03/31 21:21  [health-monitor] restarting (reason: stale-socket)  ← first occurrence
04/01 00:21  [health-monitor] restarting (reason: stale-socket)
04/01 04:16  [health-monitor] restarting (reason: stale-socket)
04/01 04:51  [health-monitor] restarting (reason: stale-socket)
             [discord] client initialized ... awaiting gateway readiness  ← STUCK
04/01 05:05  Service restart (auto-update to 2026.3.28)
             [discord] logged in to discord as ...                        ← login OK
             [heartbeat] failed: Unknown Channel                          ← but ready never fires

---

ESTAB  162.43.76.147:44734162.159.134.234:443
  bytes_sent:2612  bytes_received:44044  data_segs_out:12  data_segs_in:22
RAW_BUFFERClick to expand / collapse

Bug type

Regression

Beta release blocker

Yes

Summary

Discord channel never reaches "ready" state after a stale-socket recovery cycle, leaving the bot unable to send/receive messages. The gateway WebSocket connects and exchanges heartbeats, but Carbon's ready event never fires.

Steps to reproduce

  1. Run OpenClaw gateway with Discord channel enabled (using @buape/carbon)
  2. Wait for health-monitor to detect a stale Discord WebSocket ([health-monitor] restarting (reason: stale-socket))
  3. After the restart, Discord logs "client initialized as ...; awaiting gateway readiness" and never progresses
  4. All subsequent service restarts reproduce the same stuck state

Alternatively: any gateway restart after the initial stale-socket event reproduces this — the Discord channel never recovers.

Expected behavior

After a stale-socket detection and Discord provider restart, the Carbon client should:

  1. Connect to Discord gateway WebSocket
  2. Complete IDENTIFY handshake
  3. Receive READY + GUILD_CREATE events
  4. Emit the ready event so messages can be sent/received

This worked correctly on versions prior to ~2026.3.24.

Actual behavior

  • TCP connection to gateway.discord.gg succeeds (ESTAB, ~44KB received from Discord)
  • Discord heartbeats are exchanged (bytes_sent/bytes_received increase over time)
  • But the Carbon ready event never fires
  • Log stays permanently at: [discord] client initialized as ...; awaiting gateway readiness
  • Heartbeat module reports Unknown Channel because channel cache is never populated
  • All message delivery fails with Unknown Channel

OpenClaw version

Tested on 2026.3.24, 2026.3.28, and 2026.3.31-beta.1 — all exhibit the same behavior.

Operating system

Arch Linux x86_64, kernel 6.19.10-1-cachyos, 4GB RAM VPS (Shin)

Install method

npm install -g openclaw (system-wide, /usr/lib/node_modules/openclaw)

Model

volcengine/kimi-k2.5 (primary), but model is irrelevant — this is a channel/transport issue.

Provider / routing chain

NOT_ENOUGH_INFO (not model-related)

Logs, screenshots, evidence

Timeline (2026-04-01 JST)

03/31 21:21  [health-monitor] restarting (reason: stale-socket)  ← first occurrence
04/01 00:21  [health-monitor] restarting (reason: stale-socket)
04/01 04:16  [health-monitor] restarting (reason: stale-socket)
04/01 04:51  [health-monitor] restarting (reason: stale-socket)
             [discord] client initialized ... awaiting gateway readiness  ← STUCK
04/01 05:05  Service restart (auto-update to 2026.3.28)
             [discord] logged in to discord as ...                        ← login OK
             [heartbeat] failed: Unknown Channel                          ← but ready never fires

Network evidence (from ss -tnip)

The Discord gateway WebSocket IS connected and exchanging data:

ESTAB  162.43.76.147:44734 → 162.159.134.234:443
  bytes_sent:2612  bytes_received:44044  data_segs_out:12  data_segs_in:22

44KB received is consistent with HELLO + READY + GUILD_CREATE payloads. Heartbeat ACKs continue flowing. But Carbon never emits the ready event.

Attempted fixes (none resolved)

  • IPv4-first DNS (--dns-result-order=ipv4first)
  • Disable IPv6 on eth0 (sysctl net.ipv6.conf.eth0.disable_ipv6=1)
  • Disable autoSelectFamily via preload script
  • Node.js v22 LTS downgrade (v22.22.2) — behavior changes slightly ("logged in" appears instead of "awaiting gateway readiness") but ready still never fires
  • Clean delivery queue
  • Multiple OpenClaw version changes (2026.3.24 ↔ 2026.3.28 ↔ 2026.3.31-beta.1)

Additional context

  • Bot is in 2 guilds, token is valid (REST API works, GET /users/@me returns correct bot info)
  • GET /gateway/bot returns session_start_limit.remaining: 976/1000
  • Direct WebSocket test to gateway.discord.gg using the bundled ws library works perfectly (receives HELLO within ms)
  • The remote relay (wss://r1.kanitama.dpdns.org) also fails to connect inside the gateway process, though it works when tested externally
  • Server has no global IPv6 route (only Tailscale fd7a:: prefix), causing ETIMEDOUT/ENETUNREACH on IPv6 connection attempts — Telegram's fetch handler recovers via sticky IPv4 fallback, but Carbon/gateway WS does not

Impact and severity

Critical — Discord channel is completely non-functional. The bot cannot send or receive any messages. Telegram continues to work. The issue persists across service restarts and version changes. Manual intervention (beyond restart) cannot recover the Discord channel.

The stale-socket → stuck-ready chain seems deterministic once triggered. No self-recovery has been observed over multiple hours.

extent analysis

TL;DR

The Discord channel's failure to reach the "ready" state after a stale-socket recovery cycle may be resolved by investigating and addressing the underlying issues with the WebSocket connection and the Carbon client's ready event emission.

Guidance

  • Investigate the WebSocket connection establishment and handshake process to identify any potential issues that might be preventing the Carbon client from emitting the ready event.
  • Verify that the IDENTIFY handshake is completed successfully and that the READY and GUILD_CREATE events are received by the client.
  • Check the logs for any error messages or warnings that might indicate the cause of the issue, such as issues with the channel cache or heartbeat module.
  • Consider testing the WebSocket connection and Carbon client functionality in isolation to determine if the issue is specific to the OpenClaw gateway or a more general problem with the Discord API or WebSocket implementation.

Example

No specific code example is provided as the issue seems to be related to the interaction between the OpenClaw gateway, Discord API, and WebSocket connection, which requires a more in-depth investigation and analysis of the system's configuration and logs.

Notes

The issue seems to be specific to the Discord channel and does not affect the Telegram functionality, which suggests that the problem might be related to the Discord API or WebSocket implementation. The fact that the issue persists across service restarts and version changes suggests that it might be a more fundamental problem with the system's configuration or the underlying libraries and dependencies.

Recommendation

Apply a workaround by investigating and addressing the underlying issues with the WebSocket connection and the Carbon client's ready event emission, as the root cause of the issue is not immediately clear and may require further analysis and debugging.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After a stale-socket detection and Discord provider restart, the Carbon client should:

  1. Connect to Discord gateway WebSocket
  2. Complete IDENTIFY handshake
  3. Receive READY + GUILD_CREATE events
  4. Emit the ready event so messages can be sent/received

This worked correctly on versions prior to ~2026.3.24.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Discord gateway never reaches ready state after stale-socket recovery [2 pull requests, 1 participants]