openclaw - 💡(How to fix) Fix Discord inbound worker repeatedly times out after 1800s while gateway is still running [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71948Fetched 2026-04-27 05:36:58
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Author
Timeline (top)
commented ×1

I'm seeing Discord replies fail with:

Discord inbound worker timed out.

In the service logs this appears as:

2026-04-26T10:04:19+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1497687704866000906, messageId=1497772780744347771)
2026-04-26T12:06:09+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1489624037758861415, messageId=1497803443287625800)

The gateway process itself remains active, but Discord inbound work appears to get stuck long enough to hit the 30 minute default timeout. Around the same periods I also see gateway/local worker health symptoms like websocket handshake timeouts, subagent announce timeouts, session locks, qmd timeouts, and context overflow recovery.

Error Message

[ws] handshake timeout ... peer=127.0.0.1:...->127.0.0.1:18789 Subagent announce failed: Error: gateway timeout after 10000ms [session-write-lock] releasing lock held for 66843ms / 71211ms / 97029ms [memory] qmd embed failed ... timed out after 600000ms [agent/embedded] [context-overflow-diag] ... Context overflow: estimated context size exceeds safe threshold during tool loop.

Root Cause

I'm seeing Discord replies fail with:

Discord inbound worker timed out.

In the service logs this appears as:

2026-04-26T10:04:19+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1497687704866000906, messageId=1497772780744347771)
2026-04-26T12:06:09+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1489624037758861415, messageId=1497803443287625800)

The gateway process itself remains active, but Discord inbound work appears to get stuck long enough to hit the 30 minute default timeout. Around the same periods I also see gateway/local worker health symptoms like websocket handshake timeouts, subagent announce timeouts, session locks, qmd timeouts, and context overflow recovery.

Code Example

2026-04-26T10:04:19+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1497687704866000906, messageId=1497772780744347771)
2026-04-26T12:06:09+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1489624037758861415, messageId=1497803443287625800)

---

[ws] handshake timeout ... peer=127.0.0.1:...->127.0.0.1:18789
Subagent announce failed: Error: gateway timeout after 10000ms
[session-write-lock] releasing lock held for 66843ms / 71211ms / 97029ms
[memory] qmd embed failed ... timed out after 600000ms
[agent/embedded] [context-overflow-diag] ... Context overflow: estimated context size exceeds safe threshold during tool loop.

---

[discord] gateway error: Error: socket hang up
[discord] gateway: Gateway websocket closed: 1006
[discord] gateway: Gateway reconnect scheduled ... (invalid-session, resume=false)

---

Tasks: 80
Memory: 6.5G
CPU: 3d+ accumulated
RAW_BUFFERClick to expand / collapse

Summary

I'm seeing Discord replies fail with:

Discord inbound worker timed out.

In the service logs this appears as:

2026-04-26T10:04:19+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1497687704866000906, messageId=1497772780744347771)
2026-04-26T12:06:09+08:00 [discord] inbound worker timed out after 1800 seconds (channelId=1489624037758861415, messageId=1497803443287625800)

The gateway process itself remains active, but Discord inbound work appears to get stuck long enough to hit the 30 minute default timeout. Around the same periods I also see gateway/local worker health symptoms like websocket handshake timeouts, subagent announce timeouts, session locks, qmd timeouts, and context overflow recovery.

What I expected

A Discord inbound message should either complete, fail with the underlying agent/model error, or surface enough diagnostic context to tell which internal worker/session is stuck.

If this timeout is expected behavior, it would help if the user-facing Discord reply included the agent/session/run id or a clearer reason than just Discord inbound worker timed out.

What actually happens

The Discord channel gets a generic timeout reply after 1800 seconds.

Nearby logs show related pressure/errors:

[ws] handshake timeout ... peer=127.0.0.1:...->127.0.0.1:18789
Subagent announce failed: Error: gateway timeout after 10000ms
[session-write-lock] releasing lock held for 66843ms / 71211ms / 97029ms
[memory] qmd embed failed ... timed out after 600000ms
[agent/embedded] [context-overflow-diag] ... Context overflow: estimated context size exceeds safe threshold during tool loop.

There were also Discord gateway reconnect/session churn events in the same general window:

[discord] gateway error: Error: socket hang up
[discord] gateway: Gateway websocket closed: 1006
[discord] gateway: Gateway reconnect scheduled ... (invalid-session, resume=false)

System/config context

  • OpenClaw: 2026.4.24 (46d2415)
  • OS: LMDE 6 / Debian kernel 6.1.0-44-amd64
  • Node: v24.15.0
  • npm: 11.12.1
  • Codex CLI: 0.120.0
  • Running as user systemd service: openclaw-gateway.service
  • Gateway mode: local loopback, port 18789
  • Discord enabled with multiple accounts/bots
  • Discord healthMonitor.enabled=false
  • Discord threadBindings.enabled=true
  • Discord threadBindings.spawnSubagentSessions=true
  • Discord threadBindings.spawnAcpSessions=true
  • Agent defaults:
    • contextTokens=120000
    • primary model openai-codex/gpt-5.5
    • timeoutSeconds=3600
    • subagents.maxConcurrent=5
    • subagents.maxChildrenPerAgent=5
    • subagents.announceTimeoutMs=300000
    • compaction reserve floor 24000
  • Discord inbound worker timeout appears to be using the default 1800000ms / 1800s; I did not find an explicit per-account channels.discord.accounts.<id>.inboundWorker.runTimeoutMs override in my config.

At the time of inspection the service was still running but using substantial resources:

Tasks: 80
Memory: 6.5G
CPU: 3d+ accumulated

Why I think this might be an OpenClaw issue

The timeout itself is documented/configured, but in practice it seems to be acting as the only visible failure mode for several possible internal stalls:

  • queued Discord inbound run stuck behind session locks
  • qmd embed/search/update timeouts
  • subagent announce timeouts
  • local gateway websocket handshake timeouts
  • context overflow recovery taking a long time or looping

It would be useful if the inbound worker timeout carried the underlying run/session state into the Discord error reply and logs, or if the queue could cancel/unblock the stuck worker more cleanly before the full 1800s elapses.

Possible improvement

When the inbound worker times out, include something like:

  • account id / agent id
  • session key
  • run id
  • queue depth
  • whether the agent was waiting on model, tool, qmd, session lock, or gateway connect
  • whether the timeout came from default inboundWorker.runTimeoutMs or an explicit account override

That would make this much easier to diagnose from Discord without digging through journal logs.

extent analysis

TL;DR

The Discord inbound worker timeout may be caused by internal stalls, and including more diagnostic context in the error reply and logs could help identify the root cause.

Guidance

  • Review the system logs for patterns or correlations between the Discord inbound worker timeouts and other errors, such as session locks, qmd timeouts, and gateway reconnects.
  • Consider increasing the inboundWorker.runTimeoutMs value or implementing a more robust queue management system to handle stuck workers.
  • Investigate the resource usage of the OpenClaw service, as high CPU and memory usage may be contributing to the timeouts.
  • Examine the Discord account configuration, particularly the threadBindings settings, to ensure they are optimized for the workload.

Example

No specific code example is provided, as the issue appears to be related to configuration and system resource management rather than a specific code snippet.

Notes

The issue may be related to the complex interactions between the OpenClaw service, Discord gateway, and various timeouts. Further investigation is needed to determine the root cause.

Recommendation

Apply a workaround by increasing the inboundWorker.runTimeoutMs value or implementing a more robust queue management system to handle stuck workers, as the current timeout may be too aggressive and masking underlying issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING