After a stale-socket detection and Discord provider restart, the Carbon client should: 1. Connect to Discord gateway WebSocket 2. Complete IDENTIFY handshake 3. Receive READY + GUILD_CREATE events 4. Emit the ready event so messages can be sent/received This worked correctly on versions prior to ~2026.3.24.

openclaw - ✅(Solved) Fix Discord gateway never reaches ready state after stale-socket recovery [2 pull requests, 1 participants]

openclaw2026-03-31 21:52:30

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#58577•Fetched 2026-04-08 02:00:46

View on GitHub

Comments

Participants

Timeline

Reactions

Author

midasdf

Participants

midasdf

Timeline (top)

cross-referenced ×2closed ×1locked ×1subscribed ×1

Discord channel never reaches "ready" state after a stale-socket recovery cycle, leaving the bot unable to send/receive messages. The gateway WebSocket connects and exchanges heartbeats, but Carbon's ready event never fires.

Root Cause

TCP connection to gateway.discord.gg succeeds (ESTAB, ~44KB received from Discord)
Discord heartbeats are exchanged (bytes_sent/bytes_received increase over time)
But the Carbon ready event never fires
Log stays permanently at: [discord] client initialized as ...; awaiting gateway readiness
Heartbeat module reports Unknown Channel because channel cache is never populated
All message delivery fails with Unknown Channel

PR fix notes

PR #58958: fix(discord): unify gateway reconnect ownership

Repository: openclaw/openclaw
Author: obviyus
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/58958

Description (problem / solution / changelog)

This removes the Discord gateway reconnect split-brain between Carbon and OpenClaw.

OpenClaw now owns reconnect decisions end-to-end, treats Carbon reconnect exhaustion as transport noise instead of a lifecycle-fatal event, and reconnects from one controller for both close-driven recovery and stale-socket recovery.

Fixes #58764 Fixes #58577

Changed files

extensions/discord/src/gateway-logging.test.ts (modified, +4/-4)
extensions/discord/src/gateway-logging.ts (modified, +1/-1)
extensions/discord/src/monitor.gateway.test.ts (modified, +0/-37)
extensions/discord/src/monitor.gateway.ts (modified, +0/-6)
extensions/discord/src/monitor/gateway-handle.ts (modified, +1/-0)
extensions/discord/src/monitor/provider.lifecycle.reconnect.ts (removed, +0/-497)
extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +74/-573)
extensions/discord/src/monitor/provider.lifecycle.ts (modified, +84/-32)
extensions/discord/src/monitor/provider.test.ts (modified, +2/-2)
src/agents/pi-hooks/context-pruning.test.ts (modified, +2/-0)

PR #59019: refactor(discord): let carbon own gateway reconnects

Repository: openclaw/openclaw
Author: obviyus
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/59019

Description (problem / solution / changelog)

Fixes #58764 Fixes #58577

Carbon already owns the Discord gateway reconnect/resume state machine, so this removes the OpenClaw-side reconnect controller and hands reconnect behavior back to Carbon.

OpenClaw now only supervises startup readiness and lifecycle shutdown. It no longer reads gateway debug events as control flow, no longer suppresses reconnect exhaustion, and no longer carries the stale-socket/force-stop reconnect patch layer.

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/discord/src/gateway-logging.test.ts (modified, +13/-8)
extensions/discord/src/gateway-logging.ts (modified, +3/-3)
extensions/discord/src/monitor.gateway.test.ts (modified, +4/-24)
extensions/discord/src/monitor.gateway.ts (modified, +0/-1)
extensions/discord/src/monitor/provider.lifecycle.reconnect.ts (removed, +0/-497)
extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +142/-529)
extensions/discord/src/monitor/provider.lifecycle.ts (modified, +239/-27)
extensions/discord/src/monitor/provider.test.ts (modified, +2/-2)

Code Example

03/31 21:21  [health-monitor] restarting (reason: stale-socket)  ← first occurrence
04/01 00:21  [health-monitor] restarting (reason: stale-socket)
04/01 04:16  [health-monitor] restarting (reason: stale-socket)
04/01 04:51  [health-monitor] restarting (reason: stale-socket)
             [discord] client initialized ... awaiting gateway readiness  ← STUCK
04/01 05:05  Service restart (auto-update to 2026.3.28)
             [discord] logged in to discord as ...                        ← login OK
             [heartbeat] failed: Unknown Channel                          ← but ready never fires

---

ESTAB  162.43.76.147:44734 → 162.159.134.234:443
  bytes_sent:2612  bytes_received:44044  data_segs_out:12  data_segs_in:22

RAW_BUFFERClick to expand / collapse

Bug type

Regression

Beta release blocker

Yes

Summary

Steps to reproduce

Run OpenClaw gateway with Discord channel enabled (using @buape/carbon)
Wait for health-monitor to detect a stale Discord WebSocket ([health-monitor] restarting (reason: stale-socket))
After the restart, Discord logs "client initialized as ...; awaiting gateway readiness" and never progresses
All subsequent service restarts reproduce the same stuck state

Alternatively: any gateway restart after the initial stale-socket event reproduces this — the Discord channel never recovers.

Expected behavior

After a stale-socket detection and Discord provider restart, the Carbon client should:

Connect to Discord gateway WebSocket
Complete IDENTIFY handshake
Receive READY + GUILD_CREATE events
Emit the ready event so messages can be sent/received

This worked correctly on versions prior to ~2026.3.24.

Actual behavior

TCP connection to gateway.discord.gg succeeds (ESTAB, ~44KB received from Discord)
Discord heartbeats are exchanged (bytes_sent/bytes_received increase over time)
But the Carbon ready event never fires
Log stays permanently at: [discord] client initialized as ...; awaiting gateway readiness
Heartbeat module reports Unknown Channel because channel cache is never populated
All message delivery fails with Unknown Channel

OpenClaw version

Tested on 2026.3.24, 2026.3.28, and 2026.3.31-beta.1 — all exhibit the same behavior.

Operating system

Arch Linux x86_64, kernel 6.19.10-1-cachyos, 4GB RAM VPS (Shin)

Install method

npm install -g openclaw (system-wide, /usr/lib/node_modules/openclaw)

Model

volcengine/kimi-k2.5 (primary), but model is irrelevant — this is a channel/transport issue.

Provider / routing chain

NOT_ENOUGH_INFO (not model-related)

Logs, screenshots, evidence

Timeline (2026-04-01 JST)

03/31 21:21  [health-monitor] restarting (reason: stale-socket)  ← first occurrence
04/01 00:21  [health-monitor] restarting (reason: stale-socket)
04/01 04:16  [health-monitor] restarting (reason: stale-socket)
04/01 04:51  [health-monitor] restarting (reason: stale-socket)
             [discord] client initialized ... awaiting gateway readiness  ← STUCK
04/01 05:05  Service restart (auto-update to 2026.3.28)
             [discord] logged in to discord as ...                        ← login OK
             [heartbeat] failed: Unknown Channel                          ← but ready never fires

Network evidence (from `ss -tnip`)

The Discord gateway WebSocket IS connected and exchanging data:

ESTAB  162.43.76.147:44734 → 162.159.134.234:443
  bytes_sent:2612  bytes_received:44044  data_segs_out:12  data_segs_in:22

44KB received is consistent with HELLO + READY + GUILD_CREATE payloads. Heartbeat ACKs continue flowing. But Carbon never emits the ready event.

Attempted fixes (none resolved)

IPv4-first DNS (--dns-result-order=ipv4first)
Disable IPv6 on eth0 (sysctl net.ipv6.conf.eth0.disable_ipv6=1)
Disable autoSelectFamily via preload script
Node.js v22 LTS downgrade (v22.22.2) — behavior changes slightly ("logged in" appears instead of "awaiting gateway readiness") but ready still never fires
Clean delivery queue
Multiple OpenClaw version changes (2026.3.24 ↔ 2026.3.28 ↔ 2026.3.31-beta.1)

Additional context

Bot is in 2 guilds, token is valid (REST API works, GET /users/@me returns correct bot info)
GET /gateway/bot returns session_start_limit.remaining: 976/1000
Direct WebSocket test to gateway.discord.gg using the bundled ws library works perfectly (receives HELLO within ms)
The remote relay (wss://r1.kanitama.dpdns.org) also fails to connect inside the gateway process, though it works when tested externally
Server has no global IPv6 route (only Tailscale fd7a:: prefix), causing ETIMEDOUT/ENETUNREACH on IPv6 connection attempts — Telegram's fetch handler recovers via sticky IPv4 fallback, but Carbon/gateway WS does not

Impact and severity

Critical — Discord channel is completely non-functional. The bot cannot send or receive any messages. Telegram continues to work. The issue persists across service restarts and version changes. Manual intervention (beyond restart) cannot recover the Discord channel.

The stale-socket → stuck-ready chain seems deterministic once triggered. No self-recovery has been observed over multiple hours.

extent analysis

TL;DR

The Discord channel's failure to reach the "ready" state after a stale-socket recovery cycle may be resolved by investigating and addressing the underlying issues with the WebSocket connection and the Carbon client's ready event emission.

Guidance

Investigate the WebSocket connection establishment and handshake process to identify any potential issues that might be preventing the Carbon client from emitting the ready event.
Verify that the IDENTIFY handshake is completed successfully and that the READY and GUILD_CREATE events are received by the client.
Check the logs for any error messages or warnings that might indicate the cause of the issue, such as issues with the channel cache or heartbeat module.
Consider testing the WebSocket connection and Carbon client functionality in isolation to determine if the issue is specific to the OpenClaw gateway or a more general problem with the Discord API or WebSocket implementation.

Example

No specific code example is provided as the issue seems to be related to the interaction between the OpenClaw gateway, Discord API, and WebSocket connection, which requires a more in-depth investigation and analysis of the system's configuration and logs.

Notes

The issue seems to be specific to the Discord channel and does not affect the Telegram functionality, which suggests that the problem might be related to the Discord API or WebSocket implementation. The fact that the issue persists across service restarts and version changes suggests that it might be a more fundamental problem with the system's configuration or the underlying libraries and dependencies.

Recommendation

Apply a workaround by investigating and addressing the underlying issues with the WebSocket connection and the Carbon client's ready event emission, as the root cause of the issue is not immediately clear and may require further analysis and debugging.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

After a stale-socket detection and Discord provider restart, the Carbon client should:

Connect to Discord gateway WebSocket
Complete IDENTIFY handshake
Receive READY + GUILD_CREATE events
Emit the ready event so messages can be sent/received

This worked correctly on versions prior to ~2026.3.24.

#api #memory management #API rate limit #retriever error #indexing error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Discord gateway never reaches ready state after stale-socket recovery [2 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

PR fix notes

PR #58958: fix(discord): unify gateway reconnect ownership

Description (problem / solution / changelog)

Changed files

PR #59019: refactor(discord): let carbon own gateway reconnects

Description (problem / solution / changelog)

Changed files

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Logs, screenshots, evidence

Timeline (2026-04-01 JST)

Network evidence (from ss -tnip)

Attempted fixes (none resolved)

Additional context

Impact and severity

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Network evidence (from `ss -tnip`)