openclaw - ✅(Solved) Fix [Bug]: Regression in OpenClaw 2026.5.4 / 2026.5.3-1: gateway status handler stalls ~50s, event loop degraded, Discord heartbeat timeouts [1 pull requests, 4 comments, 5 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#77995Fetched 2026-05-06 06:18:06
View on GitHub
Comments
4
Participants
5
Timeline
9
Reactions
2
Author
Timeline (top)
commented ×4cross-referenced ×2labeled ×2closed ×1

Environment:

  • Host: Linux, systemd user service
  • Gateway bind: loopback, 127.0.0.1:18789
  • Node runtime in logs: Node 22.22.2
  • Active Discord providers: Edward/default, Julia, Resilia
  • Last known good recovery target: [email protected] + @openclaw/[email protected]

Summary: After updating from 2026.5.3-1 to 2026.5.4, the gateway started but became operationally unstable. Local RPC calls over loopback, especially status, stalled around 49-54s, causing client-side timeouts. health returned but reported degraded event loop/CPU. Discord probes produced /users/@me fetch timeouts and Discord gateway heartbeat ACK timeouts. Rolling back to 2026.5.3-1 did not fix the runtime issue. Rolling back further to 2026.5.2 restored acceptable behavior.

Version matrix:

  • 2026.5.4

    • openclaw health: returns, but reports Gateway event loop: degraded reasons=event_loop_utilization,cpu
    • gateway call status --timeout 15000 --json: client timeout
    • gateway call status --timeout 30000 --params '{"includeChannelSummary":false}' --json: client timeout
    • Gateway log still completed status after about 49.2s-50.1s
    • Fetch timeout examples: Discord /users/@me, timer delayed about 39.2s-40.4s, “likely event-loop starvation”
  • 2026.5.3-1

    • First issue after rollback: config written by 5.4 contained plugins.bundledDiscovery=compat; 5.3-1 rejected it with Unrecognized key: "bundledDiscovery"
    • After removing only that key, gateway started
    • openclaw health: returns, but still reports degraded event loop/CPU
    • gateway call status --timeout 15000 --json: client timeout
    • gateway call status --timeout 30000 --params '{"includeChannelSummary":false}' --json: client timeout
    • gateway call status --timeout 60000 --json: succeeds, but only after about 61.5s
    • Gateway log status durations observed around 48.6s-53.9s
    • Discord heartbeat ACK timeouts still occurred
  • 2026.5.2

    • Gateway started cleanly after boot wave
    • openclaw health: succeeds, no event-loop degradation reported
    • gateway call status --timeout 15000 --json: succeeds
    • Gateway log for single status: 7311ms
    • Earlier status checks: about 6435ms and 7396ms
    • channels status --probe --timeout 15000: Edward, Julia, Resilia all running, connected, works, audit ok
    • tasks.active=0, taskAudit.errors=0
    • Fresh CPU sample after tests: about 0.11 core over 20s

Expected: status over local loopback should complete within the configured timeout, ideally under 10-15s, without event-loop degradation or Discord heartbeat disruption.

Actual: In 2026.5.4 and 2026.5.3-1, status blocks for ~50s and causes degraded event-loop/CPU reports plus Discord heartbeat/fetch timeout symptoms. 2026.5.2 restores acceptable behavior.

Additional technical details / logs

All timestamps below are local time, CEST, on 2026-05-05.

Update interruption

During openclaw update from 2026.5.3-1 to 2026.5.4, the managed gateway was stopped:

17:25:32 [gateway] signal SIGTERM received
17:25:32 [gateway] received SIGTERM; shutting down

This likely interrupted an in-flight Discord response from Edward.

———

## 2026.5.4 symptoms

Gateway started, but the runtime degraded soon after startup.

Startup:

17:28:40 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 14.8s)

Early event-loop starvation:

17:28:46 [fetch-timeout] fetch timeout after 2500ms (elapsed 4671ms)
timer delayed 2171ms, likely event-loop starvation
operation=fetchWithTimeout url=https://registry.npmjs.org/openclaw/latest

Liveness warning:

17:29:05 [diagnostic] liveness warning:
reasons=event_loop_delay,event_loop_utilization,cpu
eventLoopDelayP99Ms=5641.3
eventLoopDelayMaxMs=5641.3
eventLoopUtilization=1
cpuCoreRatio=1.223
active=1 waiting=0 queued=0
phase=channels.discord.start-account

Discord heartbeat failure:

17:29:49 [discord] gateway error: Error: Gateway heartbeat ACK timeout

Slow status handler:

17:36:46 [ws] ⇄ res ✓ status 58615ms
17:36:46 [fetch-timeout] fetch timeout after 10000ms (elapsed 58644ms)
timer delayed 48644ms, likely event-loop starvation
operation=fetchWithTimeout url=https://discord.com/api/v10/users/@me

After controlled restart, issue persisted:

17:37:45 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 13.1s)
17:38:11 [diagnostic] liveness warning:
reasons=event_loop_delay,event_loop_utilization,cpu
eventLoopDelayP99Ms=3911.2
eventLoopUtilization=0.953
cpuCoreRatio=1.175

Another slow status:

17:43:44 [ws] ⇄ res ✓ status 51135ms
17:43:44 [fetch-timeout] fetch timeout after 10000ms (elapsed 51231ms)
timer delayed 41231ms, likely event-loop starvation
url=https://discord.com/api/v10/users/@me

Client-side results:

- gateway call status --timeout 15000 --json: timeout
- gateway call status --timeout 30000 --params '{"includeChannelSummary":false}' --json: timeout
- Gateway still eventually completed the handler after ~49-58s.

External Discord API from the host was fast without auth:

- /gateway: ~0.14s
- /users/@me: HTTP 401 in ~0.27s

So this did not look like a plain network outage to Discord.

———

## Rollback to 2026.5.3-1

Main package:

OpenClaw 2026.5.3-1 (2eae30e)

Discord plugin:

@openclaw/discord 2026.5.3

First rollback problem: config incompatibility caused by 5.4:

Gateway failed to start: Error: Invalid config at /home/mani/.openclaw/openclaw.json.
plugins: Unrecognized key: "bundledDiscovery"

The key was:

"plugins": {
  "bundledDiscovery": "compat"
}

After removing only that key, gateway started:

18:52:22 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 14.0s)

But runtime issue persisted:

18:52:49 [diagnostic] liveness warning:
reasons=event_loop_delay,event_loop_utilization,cpu
eventLoopDelayP99Ms=6350.2
eventLoopDelayMaxMs=6350.2
eventLoopUtilization=0.999
cpuCoreRatio=1.193

Slow status examples:

18:54:05 [ws] ⇄ res ✓ status 50099ms
18:54:05 [fetch-timeout] fetch timeout after 10000ms (elapsed 50135ms)
timer delayed 40135ms, likely event-loop starvation
url=https://discord.com/api/v10/users/@me
18:54:05 [discord] gateway error: Error: Gateway heartbeat ACK timeout

18:58:45 [ws] ⇄ res ✓ status 53877ms
18:58:45 [fetch-timeout] fetch timeout after 10000ms (elapsed 53888ms)
timer delayed 43888ms, likely event-loop starvation

18:59:58 [ws] ⇄ res ✓ status 50234ms
18:59:58 [fetch-timeout] fetch timeout after 10000ms (elapsed 50251ms)
timer delayed 40251ms, likely event-loop starvation
18:59:58 [discord] gateway error: Error: Gateway heartbeat ACK timeout

19:01:44 [ws] ⇄ res ✓ status 48657ms
19:01:44 [fetch-timeout] fetch timeout after 10000ms (elapsed 48671ms)
timer delayed 38671ms, likely event-loop starvation

Status payload from 5.3-1 with --timeout 60000:

{
  "runtimeVersion": "2026.5.3-1",
  "eventLoop": {
    "degraded": true,
    "reasons": ["event_loop_utilization", "cpu"],
    "intervalMs": 50637,
    "delayP99Ms": 0,
    "delayMaxMs": 0,
    "utilization": 1,
    "cpuCoreRatio": 1.05
  },
  "tasks": {
    "active": 0,
    "byStatus": {
      "queued": 0,
      "running": 0
    }
  },
  "taskAudit": {
    "errors": 0,
    "warnings": 20
  }
}

Client-side timeout behavior:

- status --timeout 15000: timeout
- status --timeout 30000 --params '{"includeChannelSummary":false}': timeout
- status --timeout 60000: succeeded, but wall time was ~61.5s

———

## Rollback to 2026.5.2

Main package:

OpenClaw 2026.5.2 (8b2a6e5)

Discord plugin:

@openclaw/discord 2026.5.2

Startup:

19:42:11 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 37.6s)
19:42:13 [gateway] ready

There were still startup/boot-wave warnings, but they settled:

19:43:58 [diagnostic] liveness warning:
reasons=event_loop_delay
eventLoopDelayP99Ms=2158
eventLoopDelayMaxMs=5859.4
eventLoopUtilization=0.817
cpuCoreRatio=0.839
active=1 waiting=0 queued=0

After boot wave, stability improved:

{
  "active": 0,
  "waiting": 0,
  "queued": 0,
  "eventLoopUtilization": 0.039,
  "cpuCoreRatio": 0.058
}

Health:

openclaw health
Discord: configured
Agents: main (default), belzebub, balbina, bernadeta, julia, resilia

No event-loop degradation was reported by health.

Status timings on 5.2:

19:47:12 [ws] ⇄ res ✓ status 6435ms
19:47:19 [ws] ⇄ res ✓ status 7396ms
19:48:06 [ws] ⇄ res ✓ status 7311ms

Status payload on 5.2:

{
  "runtimeVersion": "2026.5.2",
  "eventLoop": null,
  "tasks": {
    "active": 0,
    "byStatus": {
      "queued": 0,
      "running": 0
    }
  },
  "taskAudit": {
    "errors": 0,
    "warnings": 20
  }
}

Channel probe on 5.2:

Gateway reachable.
- Discord default: enabled, configured, running, connected, bot:@Edward, works, audit ok
- Discord julia: enabled, configured, running, connected, bot:@Julia, works, audit ok
- Discord resilia: enabled, configured, running, connected, bot:@Resilia, works, audit ok

CPU samples:

- 5.3-1 during degraded state: up to ~1.0 core
- 5.2 after boot wave: ~0.01 core over 15s
- 5.2 after status/channel tests: ~0.11 core over 20s

## Interpretation

The failure appears tied to the gateway status path and/or Discord account probing inside that path. In 2026.5.4 and 2026.5.3-1, status blocks the event loop for ~49-58s and delays timers badly enough to trigger Discord fetch timeouts and heartbeat ACK timeouts. In 2026.5.2, the
same operational setup completes status in ~6-8s and does not report event-loop degradation after startup.

Error Message

17:25:32 [gateway] signal SIGTERM received 17:25:32 [gateway] received SIGTERM; shutting down

This likely interrupted an in-flight Discord response from Edward.

———

2026.5.4 symptoms

Gateway started, but the runtime degraded soon after startup.

Startup:

17:28:40 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 14.8s)

Early event-loop starvation:

17:28:46 [fetch-timeout] fetch timeout after 2500ms (elapsed 4671ms) timer delayed 2171ms, likely event-loop starvation operation=fetchWithTimeout url=https://registry.npmjs.org/openclaw/latest

Liveness warning:

17:29:05 [diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=5641.3 eventLoopDelayMaxMs=5641.3 eventLoopUtilization=1 cpuCoreRatio=1.223 active=1 waiting=0 queued=0 phase=channels.discord.start-account

Discord heartbeat failure:

17:29:49 [discord] gateway error: Error: Gateway heartbeat ACK timeout

Slow status handler:

17:36:46 [ws] ⇄ res ✓ status 58615ms 17:36:46 [fetch-timeout] fetch timeout after 10000ms (elapsed 58644ms) timer delayed 48644ms, likely event-loop starvation operation=fetchWithTimeout url=https://discord.com/api/v10/users/@me

After controlled restart, issue persisted:

17:37:45 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 13.1s) 17:38:11 [diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=3911.2 eventLoopUtilization=0.953 cpuCoreRatio=1.175

Another slow status:

17:43:44 [ws] ⇄ res ✓ status 51135ms 17:43:44 [fetch-timeout] fetch timeout after 10000ms (elapsed 51231ms) timer delayed 41231ms, likely event-loop starvation url=https://discord.com/api/v10/users/@me

Client-side results:

  • gateway call status --timeout 15000 --json: timeout
  • gateway call status --timeout 30000 --params '{"includeChannelSummary":false}' --json: timeout
  • Gateway still eventually completed the handler after ~49-58s.

External Discord API from the host was fast without auth:

  • /gateway: ~0.14s
  • /users/@me: HTTP 401 in ~0.27s

So this did not look like a plain network outage to Discord.

———

Rollback to 2026.5.3-1

Main package:

OpenClaw 2026.5.3-1 (2eae30e)

Discord plugin:

@openclaw/discord 2026.5.3

First rollback problem: config incompatibility caused by 5.4:

Gateway failed to start: Error: Invalid config at /home/mani/.openclaw/openclaw.json. plugins: Unrecognized key: "bundledDiscovery"

The key was:

"plugins": { "bundledDiscovery": "compat" }

After removing only that key, gateway started:

18:52:22 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 14.0s)

But runtime issue persisted:

18:52:49 [diagnostic] liveness warning: reasons=event_loop_delay,event_loop_utilization,cpu eventLoopDelayP99Ms=6350.2 eventLoopDelayMaxMs=6350.2 eventLoopUtilization=0.999 cpuCoreRatio=1.193

Slow status examples:

18:54:05 [ws] ⇄ res ✓ status 50099ms 18:54:05 [fetch-timeout] fetch timeout after 10000ms (elapsed 50135ms) timer delayed 40135ms, likely event-loop starvation url=https://discord.com/api/v10/users/@me 18:54:05 [discord] gateway error: Error: Gateway heartbeat ACK timeout

18:58:45 [ws] ⇄ res ✓ status 53877ms 18:58:45 [fetch-timeout] fetch timeout after 10000ms (elapsed 53888ms) timer delayed 43888ms, likely event-loop starvation

18:59:58 [ws] ⇄ res ✓ status 50234ms 18:59:58 [fetch-timeout] fetch timeout after 10000ms (elapsed 50251ms) timer delayed 40251ms, likely event-loop starvation 18:59:58 [discord] gateway error: Error: Gateway heartbeat ACK timeout

19:01:44 [ws] ⇄ res ✓ status 48657ms 19:01:44 [fetch-timeout] fetch timeout after 10000ms (elapsed 48671ms) timer delayed 38671ms, likely event-loop starvation

Status payload from 5.3-1 with --timeout 60000:

{ "runtimeVersion": "2026.5.3-1", "eventLoop": { "degraded": true, "reasons": ["event_loop_utilization", "cpu"], "intervalMs": 50637, "delayP99Ms": 0, "delayMaxMs": 0, "utilization": 1, "cpuCoreRatio": 1.05 }, "tasks": { "active": 0, "byStatus": { "queued": 0, "running": 0 } }, "taskAudit": { "errors": 0, "warnings": 20 } }

Client-side timeout behavior:

  • status --timeout 15000: timeout
  • status --timeout 30000 --params '{"includeChannelSummary":false}': timeout
  • status --timeout 60000: succeeded, but wall time was ~61.5s

———

Rollback to 2026.5.2

Main package:

OpenClaw 2026.5.2 (8b2a6e5)

Discord plugin:

@openclaw/discord 2026.5.2

Startup:

19:42:11 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 37.6s) 19:42:13 [gateway] ready

There were still startup/boot-wave warnings, but they settled:

19:43:58 [diagnostic] liveness warning: reasons=event_loop_delay eventLoopDelayP99Ms=2158 eventLoopDelayMaxMs=5859.4 eventLoopUtilization=0.817 cpuCoreRatio=0.839 active=1 waiting=0 queued=0

After boot wave, stability improved:

{ "active": 0, "waiting": 0, "queued": 0, "eventLoopUtilization": 0.039, "cpuCoreRatio": 0.058 }

Health:

openclaw health Discord: configured Agents: main (default), belzebub, balbina, bernadeta, julia, resilia

No event-loop degradation was reported by health.

Status timings on 5.2:

19:47:12 [ws] ⇄ res ✓ status 6435ms 19:47:19 [ws] ⇄ res ✓ status 7396ms 19:48:06 [ws] ⇄ res ✓ status 7311ms

Status payload on 5.2:

{ "runtimeVersion": "2026.5.2", "eventLoop": null, "tasks": { "active": 0, "byStatus": { "queued": 0, "running": 0 } }, "taskAudit": { "errors": 0, "warnings": 20 } }

Channel probe on 5.2:

Gateway reachable.

  • Discord default: enabled, configured, running, connected, bot:@Edward, works, audit ok
  • Discord julia: enabled, configured, running, connected, bot:@Julia, works, audit ok
  • Discord resilia: enabled, configured, running, connected, bot:@Resilia, works, audit ok

CPU samples:

  • 5.3-1 during degraded state: up to ~1.0 core
  • 5.2 after boot wave: ~0.01 core over 15s
  • 5.2 after status/channel tests: ~0.11 core over 20s

Interpretation

The failure appears tied to the gateway status path and/or Discord account probing inside that path. In 2026.5.4 and 2026.5.3-1, status blocks the event loop for ~49-58s and delays timers badly enough to trigger Discord fetch timeouts and heartbeat ACK timeouts. In 2026.5.2, the same operational setup completes status in ~6-8s and does not report event-loop degradation after startup.

Steps to reproduce

update to 2026.5.4

Expected behavior

No load and timeouts after couple of minutes (or hours) after version update when CPU is 0% in the VM.

Actual behavior

extremely slow reaction from openclaw bot or response, codex detected critical timeout and didn't even suggest testing crons or any of my usual workload before fixing / rollback

OpenClaw version

2026.5.4 , 2026.5.3-1

Operating system

DEBIAN_VERSION_FULL=13.4 on RPI

Install method

npm

Model

codex 5.5 high

Provider / routing chain

N/A issue occurs without any load, before work chain start

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Root Cause

First rollback problem: config incompatibility caused by 5.4:

Fix Action

Fixed

PR fix notes

PR #78028: Avoid duplicate status inspection work

Description (problem / solution / changelog)

Fixes #77995.

Summary

  • combine task registry/audit status inspection into one reconciliation pass
  • keep plain text openclaw status from building an unused channel summary
  • preserve status --all, JSON fast path, and explicit gateway status behavior

Verification

  • PATH="/tmp/openclaw-pnpm-shim:$PATH" node scripts/test-projects.mjs src/commands/status.scan.test.ts src/commands/status.summary.test.ts src/tasks/task-registry.test.ts --maxWorkers=1
  • git diff --check
  • PATH="/tmp/openclaw-pnpm-shim:$PATH" node scripts/check-changed.mjs

Changed files

  • src/commands/status.scan.test.ts (modified, +5/-2)
  • src/commands/status.scan.ts (modified, +1/-0)
  • src/commands/status.summary.test.ts (modified, +46/-32)
  • src/commands/status.summary.ts (modified, +1/-2)
  • src/tasks/task-registry.maintenance.ts (modified, +13/-3)
  • src/tasks/task-registry.test.ts (modified, +38/-0)

Code Example

17:25:32 [gateway] signal SIGTERM received
  17:25:32 [gateway] received SIGTERM; shutting down

  This likely interrupted an in-flight Discord response from Edward.

  ———

  ## 2026.5.4 symptoms

  Gateway started, but the runtime degraded soon after startup.

  Startup:

  17:28:40 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 14.8s)

  Early event-loop starvation:

  17:28:46 [fetch-timeout] fetch timeout after 2500ms (elapsed 4671ms)
  timer delayed 2171ms, likely event-loop starvation
  operation=fetchWithTimeout url=https://registry.npmjs.org/openclaw/latest

  Liveness warning:

  17:29:05 [diagnostic] liveness warning:
  reasons=event_loop_delay,event_loop_utilization,cpu
  eventLoopDelayP99Ms=5641.3
  eventLoopDelayMaxMs=5641.3
  eventLoopUtilization=1
  cpuCoreRatio=1.223
  active=1 waiting=0 queued=0
  phase=channels.discord.start-account

  Discord heartbeat failure:

  17:29:49 [discord] gateway error: Error: Gateway heartbeat ACK timeout

  Slow status handler:

  17:36:46 [ws] ⇄ res ✓ status 58615ms
  17:36:46 [fetch-timeout] fetch timeout after 10000ms (elapsed 58644ms)
  timer delayed 48644ms, likely event-loop starvation
  operation=fetchWithTimeout url=https://discord.com/api/v10/users/@me

  After controlled restart, issue persisted:

  17:37:45 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 13.1s)
  17:38:11 [diagnostic] liveness warning:
  reasons=event_loop_delay,event_loop_utilization,cpu
  eventLoopDelayP99Ms=3911.2
  eventLoopUtilization=0.953
  cpuCoreRatio=1.175

  Another slow status:

  17:43:44 [ws] ⇄ res ✓ status 51135ms
  17:43:44 [fetch-timeout] fetch timeout after 10000ms (elapsed 51231ms)
  timer delayed 41231ms, likely event-loop starvation
  url=https://discord.com/api/v10/users/@me

  Client-side results:

  - gateway call status --timeout 15000 --json: timeout
  - gateway call status --timeout 30000 --params '{"includeChannelSummary":false}' --json: timeout
  - Gateway still eventually completed the handler after ~49-58s.

  External Discord API from the host was fast without auth:

  - /gateway: ~0.14s
  - /users/@me: HTTP 401 in ~0.27s

  So this did not look like a plain network outage to Discord.

  ———

  ## Rollback to 2026.5.3-1

  Main package:

  OpenClaw 2026.5.3-1 (2eae30e)

  Discord plugin:

  @openclaw/discord 2026.5.3

  First rollback problem: config incompatibility caused by 5.4:

  Gateway failed to start: Error: Invalid config at /home/mani/.openclaw/openclaw.json.
  plugins: Unrecognized key: "bundledDiscovery"

  The key was:

  "plugins": {
    "bundledDiscovery": "compat"
  }

  After removing only that key, gateway started:

  18:52:22 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 14.0s)

  But runtime issue persisted:

  18:52:49 [diagnostic] liveness warning:
  reasons=event_loop_delay,event_loop_utilization,cpu
  eventLoopDelayP99Ms=6350.2
  eventLoopDelayMaxMs=6350.2
  eventLoopUtilization=0.999
  cpuCoreRatio=1.193

  Slow status examples:

  18:54:05 [ws] ⇄ res ✓ status 50099ms
  18:54:05 [fetch-timeout] fetch timeout after 10000ms (elapsed 50135ms)
  timer delayed 40135ms, likely event-loop starvation
  url=https://discord.com/api/v10/users/@me
  18:54:05 [discord] gateway error: Error: Gateway heartbeat ACK timeout

  18:58:45 [ws] ⇄ res ✓ status 53877ms
  18:58:45 [fetch-timeout] fetch timeout after 10000ms (elapsed 53888ms)
  timer delayed 43888ms, likely event-loop starvation

  18:59:58 [ws] ⇄ res ✓ status 50234ms
  18:59:58 [fetch-timeout] fetch timeout after 10000ms (elapsed 50251ms)
  timer delayed 40251ms, likely event-loop starvation
  18:59:58 [discord] gateway error: Error: Gateway heartbeat ACK timeout

  19:01:44 [ws] ⇄ res ✓ status 48657ms
  19:01:44 [fetch-timeout] fetch timeout after 10000ms (elapsed 48671ms)
  timer delayed 38671ms, likely event-loop starvation

  Status payload from 5.3-1 with --timeout 60000:

  {
    "runtimeVersion": "2026.5.3-1",
    "eventLoop": {
      "degraded": true,
      "reasons": ["event_loop_utilization", "cpu"],
      "intervalMs": 50637,
      "delayP99Ms": 0,
      "delayMaxMs": 0,
      "utilization": 1,
      "cpuCoreRatio": 1.05
    },
    "tasks": {
      "active": 0,
      "byStatus": {
        "queued": 0,
        "running": 0
      }
    },
    "taskAudit": {
      "errors": 0,
      "warnings": 20
    }
  }

  Client-side timeout behavior:

  - status --timeout 15000: timeout
  - status --timeout 30000 --params '{"includeChannelSummary":false}': timeout
  - status --timeout 60000: succeeded, but wall time was ~61.5s

  ———

  ## Rollback to 2026.5.2

  Main package:

  OpenClaw 2026.5.2 (8b2a6e5)

  Discord plugin:

  @openclaw/discord 2026.5.2

  Startup:

  19:42:11 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 37.6s)
  19:42:13 [gateway] ready

  There were still startup/boot-wave warnings, but they settled:

  19:43:58 [diagnostic] liveness warning:
  reasons=event_loop_delay
  eventLoopDelayP99Ms=2158
  eventLoopDelayMaxMs=5859.4
  eventLoopUtilization=0.817
  cpuCoreRatio=0.839
  active=1 waiting=0 queued=0

  After boot wave, stability improved:

  {
    "active": 0,
    "waiting": 0,
    "queued": 0,
    "eventLoopUtilization": 0.039,
    "cpuCoreRatio": 0.058
  }

  Health:

  openclaw health
  Discord: configured
  Agents: main (default), belzebub, balbina, bernadeta, julia, resilia

  No event-loop degradation was reported by health.

  Status timings on 5.2:

  19:47:12 [ws] ⇄ res ✓ status 6435ms
  19:47:19 [ws] ⇄ res ✓ status 7396ms
  19:48:06 [ws] ⇄ res ✓ status 7311ms

  Status payload on 5.2:

  {
    "runtimeVersion": "2026.5.2",
    "eventLoop": null,
    "tasks": {
      "active": 0,
      "byStatus": {
        "queued": 0,
        "running": 0
      }
    },
    "taskAudit": {
      "errors": 0,
      "warnings": 20
    }
  }

  Channel probe on 5.2:

  Gateway reachable.
  - Discord default: enabled, configured, running, connected, bot:@Edward, works, audit ok
  - Discord julia: enabled, configured, running, connected, bot:@Julia, works, audit ok
  - Discord resilia: enabled, configured, running, connected, bot:@Resilia, works, audit ok

  CPU samples:

  - 5.3-1 during degraded state: up to ~1.0 core
  - 5.2 after boot wave: ~0.01 core over 15s
  - 5.2 after status/channel tests: ~0.11 core over 20s

  ## Interpretation

  The failure appears tied to the gateway status path and/or Discord account probing inside that path. In 2026.5.4 and 2026.5.3-1, status blocks the event loop for ~49-58s and delays timers badly enough to trigger Discord fetch timeouts and heartbeat ACK timeouts. In 2026.5.2, the
  same operational setup completes status in ~6-8s and does not report event-loop degradation after startup.

### Steps to reproduce

update to 2026.5.4

### Expected behavior

No load and timeouts after couple of minutes (or hours) after version update when CPU is 0% in the VM.

### Actual behavior

extremely slow reaction from openclaw bot or response, codex detected critical timeout and didn't even suggest testing crons or any of my usual workload before fixing / rollback

### OpenClaw version

2026.5.4 , 2026.5.3-1

### Operating system

DEBIAN_VERSION_FULL=13.4  on RPI

### Install method

npm

### Model

codex 5.5 high

### Provider / routing chain

N/A issue occurs without any load, before work chain start

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Environment:

  • Host: Linux, systemd user service
  • Gateway bind: loopback, 127.0.0.1:18789
  • Node runtime in logs: Node 22.22.2
  • Active Discord providers: Edward/default, Julia, Resilia
  • Last known good recovery target: [email protected] + @openclaw/[email protected]

Summary: After updating from 2026.5.3-1 to 2026.5.4, the gateway started but became operationally unstable. Local RPC calls over loopback, especially status, stalled around 49-54s, causing client-side timeouts. health returned but reported degraded event loop/CPU. Discord probes produced /users/@me fetch timeouts and Discord gateway heartbeat ACK timeouts. Rolling back to 2026.5.3-1 did not fix the runtime issue. Rolling back further to 2026.5.2 restored acceptable behavior.

Version matrix:

  • 2026.5.4

    • openclaw health: returns, but reports Gateway event loop: degraded reasons=event_loop_utilization,cpu
    • gateway call status --timeout 15000 --json: client timeout
    • gateway call status --timeout 30000 --params '{"includeChannelSummary":false}' --json: client timeout
    • Gateway log still completed status after about 49.2s-50.1s
    • Fetch timeout examples: Discord /users/@me, timer delayed about 39.2s-40.4s, “likely event-loop starvation”
  • 2026.5.3-1

    • First issue after rollback: config written by 5.4 contained plugins.bundledDiscovery=compat; 5.3-1 rejected it with Unrecognized key: "bundledDiscovery"
    • After removing only that key, gateway started
    • openclaw health: returns, but still reports degraded event loop/CPU
    • gateway call status --timeout 15000 --json: client timeout
    • gateway call status --timeout 30000 --params '{"includeChannelSummary":false}' --json: client timeout
    • gateway call status --timeout 60000 --json: succeeds, but only after about 61.5s
    • Gateway log status durations observed around 48.6s-53.9s
    • Discord heartbeat ACK timeouts still occurred
  • 2026.5.2

    • Gateway started cleanly after boot wave
    • openclaw health: succeeds, no event-loop degradation reported
    • gateway call status --timeout 15000 --json: succeeds
    • Gateway log for single status: 7311ms
    • Earlier status checks: about 6435ms and 7396ms
    • channels status --probe --timeout 15000: Edward, Julia, Resilia all running, connected, works, audit ok
    • tasks.active=0, taskAudit.errors=0
    • Fresh CPU sample after tests: about 0.11 core over 20s

Expected: status over local loopback should complete within the configured timeout, ideally under 10-15s, without event-loop degradation or Discord heartbeat disruption.

Actual: In 2026.5.4 and 2026.5.3-1, status blocks for ~50s and causes degraded event-loop/CPU reports plus Discord heartbeat/fetch timeout symptoms. 2026.5.2 restores acceptable behavior.

Additional technical details / logs

All timestamps below are local time, CEST, on 2026-05-05.

Update interruption

During openclaw update from 2026.5.3-1 to 2026.5.4, the managed gateway was stopped:

17:25:32 [gateway] signal SIGTERM received
17:25:32 [gateway] received SIGTERM; shutting down

This likely interrupted an in-flight Discord response from Edward.

———

## 2026.5.4 symptoms

Gateway started, but the runtime degraded soon after startup.

Startup:

17:28:40 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 14.8s)

Early event-loop starvation:

17:28:46 [fetch-timeout] fetch timeout after 2500ms (elapsed 4671ms)
timer delayed 2171ms, likely event-loop starvation
operation=fetchWithTimeout url=https://registry.npmjs.org/openclaw/latest

Liveness warning:

17:29:05 [diagnostic] liveness warning:
reasons=event_loop_delay,event_loop_utilization,cpu
eventLoopDelayP99Ms=5641.3
eventLoopDelayMaxMs=5641.3
eventLoopUtilization=1
cpuCoreRatio=1.223
active=1 waiting=0 queued=0
phase=channels.discord.start-account

Discord heartbeat failure:

17:29:49 [discord] gateway error: Error: Gateway heartbeat ACK timeout

Slow status handler:

17:36:46 [ws] ⇄ res ✓ status 58615ms
17:36:46 [fetch-timeout] fetch timeout after 10000ms (elapsed 58644ms)
timer delayed 48644ms, likely event-loop starvation
operation=fetchWithTimeout url=https://discord.com/api/v10/users/@me

After controlled restart, issue persisted:

17:37:45 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 13.1s)
17:38:11 [diagnostic] liveness warning:
reasons=event_loop_delay,event_loop_utilization,cpu
eventLoopDelayP99Ms=3911.2
eventLoopUtilization=0.953
cpuCoreRatio=1.175

Another slow status:

17:43:44 [ws] ⇄ res ✓ status 51135ms
17:43:44 [fetch-timeout] fetch timeout after 10000ms (elapsed 51231ms)
timer delayed 41231ms, likely event-loop starvation
url=https://discord.com/api/v10/users/@me

Client-side results:

- gateway call status --timeout 15000 --json: timeout
- gateway call status --timeout 30000 --params '{"includeChannelSummary":false}' --json: timeout
- Gateway still eventually completed the handler after ~49-58s.

External Discord API from the host was fast without auth:

- /gateway: ~0.14s
- /users/@me: HTTP 401 in ~0.27s

So this did not look like a plain network outage to Discord.

———

## Rollback to 2026.5.3-1

Main package:

OpenClaw 2026.5.3-1 (2eae30e)

Discord plugin:

@openclaw/discord 2026.5.3

First rollback problem: config incompatibility caused by 5.4:

Gateway failed to start: Error: Invalid config at /home/mani/.openclaw/openclaw.json.
plugins: Unrecognized key: "bundledDiscovery"

The key was:

"plugins": {
  "bundledDiscovery": "compat"
}

After removing only that key, gateway started:

18:52:22 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 14.0s)

But runtime issue persisted:

18:52:49 [diagnostic] liveness warning:
reasons=event_loop_delay,event_loop_utilization,cpu
eventLoopDelayP99Ms=6350.2
eventLoopDelayMaxMs=6350.2
eventLoopUtilization=0.999
cpuCoreRatio=1.193

Slow status examples:

18:54:05 [ws] ⇄ res ✓ status 50099ms
18:54:05 [fetch-timeout] fetch timeout after 10000ms (elapsed 50135ms)
timer delayed 40135ms, likely event-loop starvation
url=https://discord.com/api/v10/users/@me
18:54:05 [discord] gateway error: Error: Gateway heartbeat ACK timeout

18:58:45 [ws] ⇄ res ✓ status 53877ms
18:58:45 [fetch-timeout] fetch timeout after 10000ms (elapsed 53888ms)
timer delayed 43888ms, likely event-loop starvation

18:59:58 [ws] ⇄ res ✓ status 50234ms
18:59:58 [fetch-timeout] fetch timeout after 10000ms (elapsed 50251ms)
timer delayed 40251ms, likely event-loop starvation
18:59:58 [discord] gateway error: Error: Gateway heartbeat ACK timeout

19:01:44 [ws] ⇄ res ✓ status 48657ms
19:01:44 [fetch-timeout] fetch timeout after 10000ms (elapsed 48671ms)
timer delayed 38671ms, likely event-loop starvation

Status payload from 5.3-1 with --timeout 60000:

{
  "runtimeVersion": "2026.5.3-1",
  "eventLoop": {
    "degraded": true,
    "reasons": ["event_loop_utilization", "cpu"],
    "intervalMs": 50637,
    "delayP99Ms": 0,
    "delayMaxMs": 0,
    "utilization": 1,
    "cpuCoreRatio": 1.05
  },
  "tasks": {
    "active": 0,
    "byStatus": {
      "queued": 0,
      "running": 0
    }
  },
  "taskAudit": {
    "errors": 0,
    "warnings": 20
  }
}

Client-side timeout behavior:

- status --timeout 15000: timeout
- status --timeout 30000 --params '{"includeChannelSummary":false}': timeout
- status --timeout 60000: succeeded, but wall time was ~61.5s

———

## Rollback to 2026.5.2

Main package:

OpenClaw 2026.5.2 (8b2a6e5)

Discord plugin:

@openclaw/discord 2026.5.2

Startup:

19:42:11 [gateway] http server listening (3 plugins: discord, memory-core, memory-wiki; 37.6s)
19:42:13 [gateway] ready

There were still startup/boot-wave warnings, but they settled:

19:43:58 [diagnostic] liveness warning:
reasons=event_loop_delay
eventLoopDelayP99Ms=2158
eventLoopDelayMaxMs=5859.4
eventLoopUtilization=0.817
cpuCoreRatio=0.839
active=1 waiting=0 queued=0

After boot wave, stability improved:

{
  "active": 0,
  "waiting": 0,
  "queued": 0,
  "eventLoopUtilization": 0.039,
  "cpuCoreRatio": 0.058
}

Health:

openclaw health
Discord: configured
Agents: main (default), belzebub, balbina, bernadeta, julia, resilia

No event-loop degradation was reported by health.

Status timings on 5.2:

19:47:12 [ws] ⇄ res ✓ status 6435ms
19:47:19 [ws] ⇄ res ✓ status 7396ms
19:48:06 [ws] ⇄ res ✓ status 7311ms

Status payload on 5.2:

{
  "runtimeVersion": "2026.5.2",
  "eventLoop": null,
  "tasks": {
    "active": 0,
    "byStatus": {
      "queued": 0,
      "running": 0
    }
  },
  "taskAudit": {
    "errors": 0,
    "warnings": 20
  }
}

Channel probe on 5.2:

Gateway reachable.
- Discord default: enabled, configured, running, connected, bot:@Edward, works, audit ok
- Discord julia: enabled, configured, running, connected, bot:@Julia, works, audit ok
- Discord resilia: enabled, configured, running, connected, bot:@Resilia, works, audit ok

CPU samples:

- 5.3-1 during degraded state: up to ~1.0 core
- 5.2 after boot wave: ~0.01 core over 15s
- 5.2 after status/channel tests: ~0.11 core over 20s

## Interpretation

The failure appears tied to the gateway status path and/or Discord account probing inside that path. In 2026.5.4 and 2026.5.3-1, status blocks the event loop for ~49-58s and delays timers badly enough to trigger Discord fetch timeouts and heartbeat ACK timeouts. In 2026.5.2, the
same operational setup completes status in ~6-8s and does not report event-loop degradation after startup.

### Steps to reproduce

update to 2026.5.4

### Expected behavior

No load and timeouts after couple of minutes (or hours) after version update when CPU is 0% in the VM.

### Actual behavior

extremely slow reaction from openclaw bot or response, codex detected critical timeout and didn't even suggest testing crons or any of my usual workload before fixing / rollback

### OpenClaw version

2026.5.4 , 2026.5.3-1

### Operating system

DEBIAN_VERSION_FULL=13.4  on RPI

### Install method

npm

### Model

codex 5.5 high

### Provider / routing chain

N/A issue occurs without any load, before work chain start

### Additional provider/model setup details

_No response_

### Logs, screenshots, and evidence

```shell

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The issue can likely be resolved by rolling back to version 2026.5.2, as it is the last known good version where the gateway started cleanly and the status call completed within the expected timeframe without event-loop degradation or Discord heartbeat disruption.

Guidance

  • Roll back to version 2026.5.2 to verify if the issue is resolved, as it has been identified as the last known good version.
  • Investigate the changes made in versions 2026.5.3-1 and 2026.5.4 to identify the root cause of the event-loop degradation and Discord heartbeat timeouts.
  • Check the configuration files for any incompatible changes introduced in version 2026.5.4, such as the bundledDiscovery key, and remove or modify them as necessary.
  • Monitor the system's CPU usage and event-loop utilization after rolling back to version 2026.5.2 to ensure that the issue is fully resolved.

Example

No code snippet is provided as the issue seems to be related to version compatibility and configuration rather than a specific code error.

Notes

The issue appears to be specific to versions 2026.5.3-1 and 2026.5.4, and rolling back to version 2026.5.2 may not be a permanent solution. Further investigation is needed to identify the root cause and implement a fix for the newer versions.

Recommendation

Apply the workaround of rolling back to version 2026.5.2, as it has been verified to resolve the issue, and then investigate the changes made in the newer versions to implement a permanent fix.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

No load and timeouts after couple of minutes (or hours) after version update when CPU is 0% in the VM.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING