openclaw - 💡(How to fix) Fix 2026.4.27: Gateway main event loop blocks at 100% CPU on Telegram message due to repeated filesystem/runtime-deps work [3 comments, 4 participants]

openclaw2026-04-29 09:05:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#74240•Fetched 2026-04-30 06:26:53

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3cross-referenced ×2closed ×1mentioned ×1

After upgrading an OpenClaw Docker install to 2026.4.27, each Telegram message can trigger a gateway main-thread CPU spike. During the spike the gateway becomes partially unresponsive: /healthz times out, Control UI websocket requests stall/disconnect, and Telegram replies are delayed.

The model run itself is fast. The delay appears to happen in OpenClaw gateway processing around the Telegram run, not in Telegram API or Codex model latency.

Error Message

When a Telegram message is sent:

Root Cause

The model run itself is fast. The delay appears to happen in OpenClaw gateway processing around the Telegram run, not in Telegram API or Codex model latency.

Code Example

liveness warning: reasons=event_loop_delay interval=52s eventLoopDelayP99Ms=58.8 eventLoopDelayMaxMs=42077.3 eventLoopUtilization=0.831 cpuCoreRatio=0.866 active=1 waiting=0 queued=1

---

liveness warning: reasons=event_loop_delay interval=82s eventLoopDelayMaxMs=56304.3 active=1 waiting=0 queued=1

---

session.started   08:58:28.371Z
context.compiled  08:58:28.415Z
prompt.submitted  08:58:28.428Z
model.completed   08:58:30.165Z
session.ended     08:58:30.171Z

---

% time     seconds  usecs/call     calls    errors syscall
88.53   30.477732      170266       179        28 futex
 2.33    0.802426          22     36396           statx
 1.99    0.686575          19     35920           read
 1.87    0.644418          30     21316      3047 openat
 1.41    0.484498          26     18268           close
 0.84    0.289390          95      3024         3 unlink
 0.51    0.176517          58      3021           sendfile
 0.37    0.126377          20      6042           fstat
 0.34    0.116129          38      3026           chmod
 0.29    0.099074          32      3021           ftruncate
 0.23    0.078246          25      3021      3021 link
 0.19    0.065061          21      3021           utimensat
 0.18    0.063364          20      3024           lstat
 0.16    0.055842          18      3021           fchmod
 0.16    0.054501          17      3037      3035 mkdir
 0.16    0.053755          17      3021           fchown
 0.10    0.034548          11      3021      3021 copy_file_range
100.00  34.426566         221    155448     12173 total

RAW_BUFFERClick to expand / collapse

Summary

The model run itself is fast. The delay appears to happen in OpenClaw gateway processing around the Telegram run, not in Telegram API or Codex model latency.

Environment

OpenClaw: 2026.4.27
Commit: dcd665cd05
Runtime: Docker / docker compose
Node: v24.14.0
Host kernel: Linux 6.8.0-110-generic
Gateway command: node dist/index.js gateway --bind lan --port 18789
Model provider: openai-codex/gpt-5.5
Channel affected: Telegram direct chat
WhatsApp not linked during later tests

Observed behavior

When a Telegram message is sent:

Gateway CPU jumps to ~100% or more
/healthz times out for ~40-80s
OpenClaw logs liveness warnings:
- event_loop_delay
- event_loop_utilization
- cpu
Telegram shows typing, then typing stops after TTL, and the answer arrives later
Control UI can temporarily disconnect or stall

Example diagnostic log:

liveness warning: reasons=event_loop_delay interval=52s eventLoopDelayP99Ms=58.8 eventLoopDelayMaxMs=42077.3 eventLoopUtilization=0.831 cpuCoreRatio=0.866 active=1 waiting=0 queued=1

Another run:

liveness warning: reasons=event_loop_delay interval=82s eventLoopDelayMaxMs=56304.3 active=1 waiting=0 queued=1

Important finding

The Codex model call is not the slow part.

Example trace:

session.started   08:58:28.371Z
context.compiled  08:58:28.415Z
prompt.submitted  08:58:28.428Z
model.completed   08:58:30.165Z
session.ended     08:58:30.171Z

So the model completed in ~1.7s, but the gateway was blocked around the same message for much longer.

strace during spike

Host-side strace -f -c attached to the Node gateway process during the CPU spike showed heavy filesystem activity:

% time     seconds  usecs/call     calls    errors syscall
88.53   30.477732      170266       179        28 futex
 2.33    0.802426          22     36396           statx
 1.99    0.686575          19     35920           read
 1.87    0.644418          30     21316      3047 openat
 1.41    0.484498          26     18268           close
 0.84    0.289390          95      3024         3 unlink
 0.51    0.176517          58      3021           sendfile
 0.37    0.126377          20      6042           fstat
 0.34    0.116129          38      3026           chmod
 0.29    0.099074          32      3021           ftruncate
 0.23    0.078246          25      3021      3021 link
 0.19    0.065061          21      3021           utimensat
 0.18    0.063364          20      3024           lstat
 0.16    0.055842          18      3021           fchmod
 0.16    0.054501          17      3037      3035 mkdir
 0.16    0.053755          17      3021           fchown
 0.10    0.034548          11      3021      3021 copy_file_range
100.00  34.426566         221    155448     12173 total

This suggests repeated filesystem materialization/cache/runtime-deps/plugin work during each Telegram message.

Why this looks like a regression

The system was usable before the update. After upgrading to 2026.4.27, the gateway now appears to perform expensive filesystem/plugin/runtime work during normal Telegram message processing.

There were plugin/runtime-deps changes in 2026.4.27, so this may be related to the new plugin runtime dependency/cache/materialization path.

Expected behavior

Handling a Telegram message should not block the gateway main event loop for tens of seconds, especially when the model call completes in 1-5 seconds.

Actual behavior

The gateway main event loop blocks, healthchecks time out, Control UI becomes unstable, and Telegram replies are delayed.

Additional notes

openclaw doctor --fix completed.
Plugin registry: 70/116 enabled plugins indexed.
Doctor found orphan session transcript files, but the issue also reproduces in a fresh Telegram session with low context usage.
The active trajectory file grew to several MB, but context usage was reported as only ~9%, so context size alone does not explain the behavior.

extent analysis

TL;DR

The gateway's main event loop is blocked due to expensive filesystem operations, likely related to plugin/runtime-deps changes in the 2026.4.27 update.

Guidance

Investigate the plugin/runtime-deps changes in 2026.4.27 to identify potential causes of the expensive filesystem operations.
Review the strace output to understand the specific filesystem calls causing the bottleneck.
Consider disabling or optimizing plugins to reduce filesystem activity during Telegram message processing.
Monitor the gateway's performance with a smaller set of enabled plugins to isolate the issue.

Example

No code snippet is provided as the issue is related to a specific version update and plugin configuration.

Notes

The issue appears to be a regression introduced in the 2026.4.27 update, and the expensive filesystem operations are likely related to the new plugin runtime dependency/cache/materialization path.

Recommendation

Apply a workaround by optimizing or disabling plugins to reduce filesystem activity during Telegram message processing, as the root cause is likely related to the plugin/runtime-deps changes in the 2026.4.27 update.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Handling a Telegram message should not block the gateway main event loop for tens of seconds, especially when the model call completes in 1-5 seconds.

#api #tensor shape #autograd error #model save/load #optimization

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix 2026.4.27: Gateway main event loop blocks at 100% CPU on Telegram message due to repeated filesystem/runtime-deps work [3 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Observed behavior

Important finding

strace during spike

Why this looks like a regression

Expected behavior

Actual behavior

Additional notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix 2026.4.27: Gateway main event loop blocks at 100% CPU on Telegram message due to repeated filesystem/runtime-deps work [3 comments, 4 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Code Example

Summary

Environment

Observed behavior

Important finding

strace during spike

Why this looks like a regression

Expected behavior

Actual behavior

Additional notes

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING