openclaw - 💡(How to fix) Fix Discord inbound worker hangs for 30 min on Anthropic API errors before timing out

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

During the Anthropic API outage on 2026-05-08 (~19:08–21:08 UTC), a single Discord inbound message in #general caused the discord channel's inbound worker to hang for the full 30‑minute timeout window. While hung, no further messages from that channel were processed.

Error Message

2026-05-08T21:55:16.264+00:00 error {"subsystem":"gateway/channels/discord"}

Root Cause

Root cause (suspected)

Code Example

2026-05-08T21:55:16.264+00:00 error {"subsystem":"gateway/channels/discord"}
discord inbound worker timed out after 1800 seconds
(channelId=1485426154323054622, messageId=1502421149768744980)
RAW_BUFFERClick to expand / collapse

Summary

During the Anthropic API outage on 2026-05-08 (~19:08–21:08 UTC), a single Discord inbound message in #general caused the discord channel's inbound worker to hang for the full 30‑minute timeout window. While hung, no further messages from that channel were processed.

Symptom

2026-05-08T21:55:16.264+00:00 error {"subsystem":"gateway/channels/discord"}
discord inbound worker timed out after 1800 seconds
(channelId=1485426154323054622, messageId=1502421149768744980)

User experience: messages posted in the affected Discord channel got no response. Outbound REST kept working, gateway WebSocket stayed connected, but no replies were generated. Status probe still reported running, connected, works — there was no surfaced signal that the channel was effectively broken.

Root cause (suspected)

The Anthropic API returned hangs/long-running errors during the outage. The claude‑runner inbound worker appears to wait the full 30‑minute hard timeout for the upstream LLM call to resolve, with no shorter per‑request deadline or circuit breaker.

Impact

  • One bad message blocks the entire channel's inbound queue for 30 min (assuming serialized per-channel processing).
  • openclaw channels status keeps reporting healthy state during the hang, so monitoring/alerting cannot detect the stall.

Suggested fixes

  1. Per-LLM-call timeout much shorter than the worker hard timeout (e.g. 90s) with retry + abort.
  2. Fast-fail on Anthropic 5xx / overloaded responses instead of long-poll waiting.
  3. Surface a "worker stalled" warning on channels status when in‑flight age exceeds N minutes.
  4. Consider parallelism > 1 per channel, or at least a bounded queue with drop/notify on overflow.

Environment

  • OpenClaw 2026.3.24
  • claude-runner plugin (local install at ~/.openclaw/extensions/claude-runner/)
  • Discord channel provider, single-bot multi-channel config

Reported by Karlien via Ginny (main agent).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Discord inbound worker hangs for 30 min on Anthropic API errors before timing out