openclaw - 💡(How to fix) Fix Discord ingest lag of 100–400 s on stable connection persists after PR #68159 / 2026.4.1 reconnect-ownership change [1 participants]

openclaw2026-04-25 11:09:07

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#71546•Fetched 2026-04-26 05:11:38

View on GitHub

Comments

Participants

Timeline

Reactions

Author

apex-system

Participants

apex-system

Timeline (top)

cross-referenced ×2

On openclaw 2026.4.23, single-account, stable wired network, the time between Discord delivering a DM and the agent runtime starting to process it is regularly 100–400 seconds. Agent processing + LLM call after that point is healthy at 5–7 s, so the delay is entirely in the message-ingest path. This persists after the recent Discord-lifecycle hardening that closed #56492 (PR #68159, commit 5adf9d2…081f45), #53132, and #51116.

Filing as a separate, narrow issue per @steipete's instruction in the #53132 close: "If a similar startup hang is still reproducible on a current build, please open a fresh issue with current logs." This is a different failure surface than #38596 (which is about the health-monitor restart loop) — here, the bot does NOT visibly restart; the lag happens inside Carbon's reconnect/RESUME flow on a connection that the gateway still considers up.

Root Cause

Fix Action

Fix / Workaround

What I tried — A/B downgrade to Carbon 0.15.0 (does not work)

npm pack @buape/[email protected] → atomic swap into dist/extensions/discord/node_modules/@buape/carbon.
Patched dist/extensions/discord/package.json @buape/carbon pin to "0.15.0".
SIGUSR1 restart.

#38596 OPEN — health-monitor restart-loop circuit-breaker bypass. Same family of symptoms but different surface (visible restart loop vs. silent in-Carbon reconnect-with-buffering). Re-commenting there with the same trajectory data to un-stale.
#51116 CLOSED 2026-04-25 — "WS disconnects every ~10 min, messages lost." Closed as obsolete on main. The "messages lost" framing was too strong — this issue documents that messages are delayed, not lost, on current main, but the user-facing latency is still in the hundreds of seconds.
#56492 CLOSED 2026-04-24 — Carbon Client constructor IDENTIFY race. Fixed by PR #68159. The login race no longer happens for me; this issue is about post-login churn, which #68159 doesn't touch.
#53132 CLOSED 2026-04-24 — multi-account awaiting gateway readiness hang. Different class (multi-account); fixed in 2026.3.24+. Single-account doesn't hang on login on current main.
#43468 CLOSED 2026-03-15 — the (correctly rejected) "remove Carbon" issue. Not asking for that here.

Code Example

seq=  1 2026-04-25T07:25:22.028Z session.started
seq=  4 2026-04-25T07:25:23.309Z prompt.submitted
seq=  5 2026-04-25T07:25:26.783Z model.completed   reply="Hi."
seq=  7 2026-04-25T07:25:26.898Z session.ended

user message_id 1497497072411218070 → Discord-snowflake-time 2026-04-25T07:18:44.213Z
=> Discord→openclaw INGEST lag = 397.8 s
   Agent processing total      =   4.9 s
   Model latency only          =   3.5 s
   END-TO-END                  = 402.6 s

---

[diagnostic] stuck session: sessionId=unknown
  sessionKey=agent:main:discord:direct:<userid>
  state=processing age=501s queueDepth=1

---

$ awk '/2026-04-25T15:27:21/{flag=1} flag' gateway.log \
  | grep -oE "Gateway websocket closed: [0-9]+|reconnect scheduled.*\((zombie|invalid-session|close)" \
  | sort | uniq -c | sort -rn

  23 Gateway websocket closed: 1000
   8 reconnect scheduled in <ms>ms (zombie
   8 reconnect scheduled in <ms>ms (invalid-session
   1 Gateway websocket closed: 1006
   ... (close-resume reconnect events)

---

15:33:12  closed 1000  (3:14 after login)
15:39:46  closed 1000  (3:28 after re-init)
15:44:21  zombie reconnect
15:49:57  closed 1000
15:53:10  closed 1000

---

$ ping -c 3 gateway.discord.gg
9.319/10.380/11.440 ms (0% loss)

RAW_BUFFERClick to expand / collapse

Summary

Environment


OpenClaw	2026.4.23 (`a979721` per `--help` and `status --json runtimeVersion`)
`@buape/carbon`	0.16.0 (latest on npm; pinned exactly by `@openclaw/discord` plugin)
`discord-api-types`	`^0.38.47`
`ws`	`^8.20.0`
OS	macOS 15.6.1 (arm64)
Node (gateway runtime)	22.22.2
Discord setup	1 bot account, 1 guild, 1 user, MESSAGE_CONTENT intent enabled
Network	wired Ethernet, 9–31 ms ping to `gateway.discord.gg`, 0% packet loss
Other channels	telegram disabled (`channels.telegram.enabled: false`), mDNS off (`discovery.mdns.mode: "off"`), TTS preflight off (`messages.tts.enabled: false`)
Service	launchd `ai.openclaw.gateway`, `gateway.controlUi.allowInsecureAuth: true` (separate concern)

Reproduction

Configure single Discord bot, restart gateway, wait for [gateway] ready (6 plugins …) then [discord] logged in to discord as <id> (OpenClaw).
Leave bot idle.
Observe gateway.log: [discord] gateway: Gateway websocket closed: 1000 followed by Gateway reconnect scheduled in <~1000ms> (close, resume=true) — happens every 3–5 minutes on idle.
Send a DM to the bot at any point.
Most of the time the bot replies in 5–10 s. Periodically (when the inbound message lands during a reconnect window or right around a did not reach READY within 30000ms event), reply takes 100–400 s.

Evidence — agent trajectory log

From ~/.openclaw/agents/main/sessions/<sid>.trajectory.jsonl, with the user message_id decoded from the embedded Discord snowflake (Discord epoch 1420070400000):

seq=  1 2026-04-25T07:25:22.028Z session.started
seq=  4 2026-04-25T07:25:23.309Z prompt.submitted
seq=  5 2026-04-25T07:25:26.783Z model.completed   reply="Hi."
seq=  7 2026-04-25T07:25:26.898Z session.ended

user message_id 1497497072411218070 → Discord-snowflake-time 2026-04-25T07:18:44.213Z
=> Discord→openclaw INGEST lag = 397.8 s
   Agent processing total      =   4.9 s
   Model latency only          =   3.5 s
   END-TO-END                  = 402.6 s

Three consecutive idle-bot tests on the same session, same wired network:

User msg	Discord ts (UTC)	session.started	Δ ingest	Δ model	Δ end-to-end
`hi`	07:18:44.213	07:25:22.028	397.8 s	3.5 s	402.6 s
`Pingping`	07:42:45.606	07:44:25.532	99.9 s	5.7 s	106.5 s
`Double ping`	07:51:03.549	07:53:13.547	130.0 s	6.7 s	137.3 s

The agent + LLM segment is consistently 5–10 s. The 100–400 s headline number is entirely in the segment between Discord's gateway and openclaw's DiscordMessageListener enqueuing into the agent runtime.

Correlated stuck session warning while the inbound message sits in queue:

[diagnostic] stuck session: sessionId=unknown
  sessionKey=agent:main:discord:direct:<userid>
  state=processing age=501s queueDepth=1

queueDepth=1 with a pending message that does eventually get answered → this is buffering / replay during reconnect, not permanent loss (so distinct from #51116's user-facing claim that messages are "lost" — they're delayed, not dropped).

WS-close cadence (4-hour window, single account, otherwise idle)

$ awk '/2026-04-25T15:27:21/{flag=1} flag' gateway.log \
  | grep -oE "Gateway websocket closed: [0-9]+|reconnect scheduled.*\((zombie|invalid-session|close)" \
  | sort | uniq -c | sort -rn

  23 Gateway websocket closed: 1000
   8 reconnect scheduled in <ms>ms (zombie
   8 reconnect scheduled in <ms>ms (invalid-session
   1 Gateway websocket closed: 1006
   ... (close-resume reconnect events)

Concrete close timestamps over a single ~26-min window post-restart:

15:33:12  closed 1000  (3:14 after login)
15:39:46  closed 1000  (3:28 after re-init)
15:44:21  zombie reconnect
15:49:57  closed 1000
15:53:10  closed 1000

Every WS close triggers Carbon's RESUME flow. Inbound messages received during the reconnect window get buffered. RESUME usually succeeds within ~1 s, but when it fails — discord gateway opened but did not reach READY within 30000ms (defined in dist/extensions/discord/provider-Bc1Lm79N.js:5897 as DISCORD_GATEWAY_RUNTIME_READY_TIMEOUT_MS = 3e4) — the channel exits and the outer auto-restart attempt 1/10 in 5s cycle kicks in, adding 30–60 s of unavailability per failed RESUME.

Network is not the cause

$ ping -c 3 gateway.discord.gg
9.319/10.380/11.440 ms (0% loss)

Discord-side reachability is fine. CPU on the gateway process is 0.0% per ps -o %cpu during these episodes; RSS 57 MB; no event-loop stall.

What I tried — A/B downgrade to Carbon 0.15.0 (does not work)

Hypothesis: Carbon 0.16.0 (released 2026-04-16) regressed RESUME vs 0.15.0 (2026-04-10). Tested by:

npm pack @buape/[email protected] → atomic swap into dist/extensions/discord/node_modules/@buape/carbon.
Patched dist/extensions/discord/package.json @buape/carbon pin to "0.15.0".
SIGUSR1 restart.

Result: bot reaches [discord] client initialized as <id> (OpenClaw); awaiting gateway readiness and never proceeds to logged in to discord. openclaw status --json shows channelSummary: [] for the entire test window. No errors in gateway.err.log. Reverted cleanly to 0.16.0 via backup; symptom from the new issue resumed exactly as before.

So 0.15.0 is incompatible with the current @openclaw/discord plugin code path; not a viable downgrade.

Related issues

#38596 OPEN — health-monitor restart-loop circuit-breaker bypass. Same family of symptoms but different surface (visible restart loop vs. silent in-Carbon reconnect-with-buffering). Re-commenting there with the same trajectory data to un-stale.
#51116 CLOSED 2026-04-25 — "WS disconnects every ~10 min, messages lost." Closed as obsolete on main. The "messages lost" framing was too strong — this issue documents that messages are delayed, not lost, on current main, but the user-facing latency is still in the hundreds of seconds.
#56492 CLOSED 2026-04-24 — Carbon Client constructor IDENTIFY race. Fixed by PR #68159. The login race no longer happens for me; this issue is about post-login churn, which #68159 doesn't touch.
#53132 CLOSED 2026-04-24 — multi-account awaiting gateway readiness hang. Different class (multi-account); fixed in 2026.3.24+. Single-account doesn't hang on login on current main.
#43468 CLOSED 2026-03-15 — the (correctly rejected) "remove Carbon" issue. Not asking for that here.

Asks

Concrete and narrow:

Is the 3–5 min close-1000 cadence on idle considered baseline behavior on current main, or a regression worth investigating? A reproducer on a clean install would settle this — happy to provide more environment detail.
What's the right place in the code path for a buffered-message catch-up after RESUME? The Carbon Client.events after RESUMED should re-deliver missed events per Discord's gateway spec (SESSION op resume + RESUMED event with replayed dispatches). If openclaw's DiscordMessageListener is being recreated during the inner reconnect, those replayed events would never reach the agent runtime — that would explain the 100–400 s lag matching the reconnect window exactly. Worth checking whether the listener is preserved across Carbon's internal reconnects.
Would a chat.history-style catch-up on every successful RESUME (re-fetching the last N messages on each affected channel since last_message_id) be in scope as a defensive backstop, even if the underlying buffering issue is fixed? It's the same ask #51116 made, and it'd be a small, opt-in change behind a config flag.

Logs / artifacts available on request

/tmp/openclaw/openclaw-2026-04-25.log (full ndjson trace, ~250 KB at time of writing)
~/.openclaw/agents/main/sessions/<sid>.trajectory.jsonl (the trajectory excerpts above came from here)
~/.openclaw/logs/gateway.log, gateway.err.log
The Carbon 0.15.0 A/B test result (post-restart awaiting gateway readiness hang) reproducible on demand.

Happy to upload as gist links if you'd prefer — let me know format you'd like.

extent analysis

TL;DR

The 100-400 second delay between Discord delivering a DM and the agent runtime starting to process it can be mitigated by investigating the 3-5 minute close-1000 cadence on idle and implementing a buffered-message catch-up after RESUME.

Guidance

Investigate if the 3-5 minute close-1000 cadence on idle is a regression worth fixing, as it may be causing the delay.
Check if the DiscordMessageListener is being recreated during Carbon's internal reconnects, which could prevent replayed events from reaching the agent runtime.
Consider implementing a chat.history-style catch-up on every successful RESUME to re-fetch the last N messages on each affected channel since last_message_id as a defensive backstop.
Review the Carbon Client constructor and IDENTIFY race fix (PR #68159) to ensure it doesn't affect the post-login churn.
Analyze the gateway.log and trajectory.jsonl files to understand the reconnect and buffering behavior.

Example

No code snippet is provided as the issue requires further investigation and analysis of the code path.

Notes

The issue is specific to the OpenClaw 2026.4.23 version and the Carbon 0.16.0 library. The A/B test with Carbon 0.15.0 showed incompatibility with the current @openclaw/discord plugin code path.

Recommendation

Apply a workaround by implementing a chat.history-style catch-up on every successful RESUME to mitigate the delay, while investigating the root cause of the 3-5 minute close-1000 cadence on idle.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #output truncation #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Discord ingest lag of 100–400 s on stable connection persists after PR #68159 / 2026.4.1 reconnect-ownership change [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

What I tried — A/B downgrade to Carbon 0.15.0 (does not work)

Code Example

Summary

Environment

Reproduction

Evidence — agent trajectory log

WS-close cadence (4-hour window, single account, otherwise idle)

Network is not the cause

What I tried — A/B downgrade to Carbon 0.15.0 (does not work)

Related issues

Asks

Logs / artifacts available on request

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Discord ingest lag of 100–400 s on stable connection persists after PR #68159 / 2026.4.1 reconnect-ownership change [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

What I tried — A/B downgrade to Carbon 0.15.0 (does not work)

Code Example

Summary

Environment

Reproduction

Evidence — agent trajectory log

WS-close cadence (4-hour window, single account, otherwise idle)

Network is not the cause

What I tried — A/B downgrade to Carbon 0.15.0 (does not work)

Related issues

Asks

Logs / artifacts available on request

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING