openclaw - 💡(How to fix) Fix v2026.4.26: Gateway busy-loops on bundled openai SDK directory walk; stops accepting connections [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73331Fetched 2026-04-29 06:20:54
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Author
Timeline (top)
closed ×1commented ×1

After upgrading to v2026.4.26, the gateway boots, briefly serves /health, and within minutes becomes unresponsive while pegging one CPU core. strace shows an inescapable directory walk over /usr/lib/node_modules/openclaw/node_modules/openai/resources/{audio,beta,chat,conversations,containers,evals,fine-tuning,graders,realtime,responses,skills,uploads,vector-stores,webhooks}, repeating in the same alphabetical order indefinitely. The plugin runtime cache size freezes — no forward progress. Listening socket Recv-Q fills with queued connections that are never accept()ed.

Rolling back to v2026.4.24 restores normal operation.

Error Message

No error or warning is emitted while the gateway hangs. Health-monitor's 60s startup grace and 300s interval pass silently.

  • The [gateway] agent model: custom-opencode-go-extras/deepseek-v4-flash log line is unrelated — it appears at startup on both v4.24 and v4.26 simply because that custom provider is registered. It is not a model-resolution error. High for any deployment that auto-upgrades to v2026.4.26. The gateway becomes unresponsive without emitting an error, which makes the failure mode hard to diagnose without strace.

Root Cause

Suspected root cause

Fix Action

Workaround

Rollback to v2026.4.24:

sudo systemctl stop openclaw
sudo npm install -g [email protected]
# v4.26's install partially overwrites shared paths under
# ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.24-<hash>/dist/extensions/node_modules/openclaw/plugin-sdk/
# so the v4.24 cache must be rebuilt:
rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.24-*
rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-*
sudo systemctl start openclaw
# Gateway runs `npm install` for bundled plugin deps and is ready in ~90–130s.

Code Example

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 21.98    0.117403         598       196           statx
 19.93    0.106469         335       317           read
 16.80    0.089729         509       176           openat
 16.15    0.086274          89       969           access
 12.22    0.065258         372       175           close
  2.62    0.013981        1165        12        12 link
  2.40    0.012822         320        40           fstat
  1.94    0.010346         862        12           unlink
  1.73    0.009221         614        15        14 mkdir
  1.19    0.006361         530        12        12 fchown
  0.90    0.004833         402        12           chmod
  0.75    0.003990         332        12           fchmod
  0.71    0.003809         317        12           copy_file_range
  0.56    0.002972         247        12           ftruncate
...
total    0.534178         263      2030        38

---

openat(AT_FDCWD, "node_modules/openai/resources/beta/threads/runs", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/beta/realtime", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/beta/chatkit", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/audio", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/webhooks", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/vector-stores", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
... [continues with skills, responses, realtime, graders, fine-tuning, evals, conversations, containers, chat, beta]
... then re-enters from the top:
openat(AT_FDCWD, "node_modules/openai/resources/beta/threads/runs", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/beta/realtime", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
...

---

2026-04-28T12:26:41.871+07:00 [gateway] loading configuration…
2026-04-28T12:26:42.235+07:00 [gateway] starting...
2026-04-28T12:29:20.193+07:00 [gateway] starting HTTP server...
2026-04-28T12:29:22.682+07:00 [gateway] agent model: custom-opencode-go-extras/deepseek-v4-flash
2026-04-28T12:29:22.925+07:00 [gateway] http server listening (3 plugins: memory-core, memory-wiki, telegram; 160.4s)
2026-04-28T12:29:23.752+07:00 [gateway] ready
2026-04-28T12:29:24.080+07:00 [telegram] [default] starting provider (@AntonioCOCBot)
2026-04-28T12:29:25.694+07:00 [telegram] menu text exceeded the conservative 5700-character payload budget; shortening descriptions to keep 57 commands visible.
[no further log entries — gateway hangs while telegram polling is in flight]

---

sudo systemctl stop openclaw
sudo npm install -g openclaw@2026.4.24
# v4.26's install partially overwrites shared paths under
# ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.24-<hash>/dist/extensions/node_modules/openclaw/plugin-sdk/
# so the v4.24 cache must be rebuilt:
rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.24-*
rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-*
sudo systemctl start openclaw
# Gateway runs `npm install` for bundled plugin deps and is ready in ~90–130s.
RAW_BUFFERClick to expand / collapse

v2026.4.26: Gateway busy-loops on bundled openai SDK directory walk; stops accepting connections

Summary

After upgrading to v2026.4.26, the gateway boots, briefly serves /health, and within minutes becomes unresponsive while pegging one CPU core. strace shows an inescapable directory walk over /usr/lib/node_modules/openclaw/node_modules/openai/resources/{audio,beta,chat,conversations,containers,evals,fine-tuning,graders,realtime,responses,skills,uploads,vector-stores,webhooks}, repeating in the same alphabetical order indefinitely. The plugin runtime cache size freezes — no forward progress. Listening socket Recv-Q fills with queued connections that are never accept()ed.

Rolling back to v2026.4.24 restores normal operation.

Environment

  • OpenClaw: 2026.4.26 (be8c246), installed via sudo npm install -g [email protected]
  • Prior known-good: 2026.4.24 (cbcfdf6)
  • Node: 22.22.2
  • OS: Linux 6.8.0-106-generic (Debian-family, systemd-managed gateway)
  • Filesystem: ext4 on /dev/sda1, no exotic mounts
  • Plugins enabled: memory-core, memory-wiki, telegram, plus bundled defaults (acpx, bonjour, browser, device-pair, phone-control, talk-voice). qqbot pinned enabled: false. Feishu/whatsapp inherit gating from prior config (require >=2026.4.25).
  • One custom provider: models.providers.custom-opencode-go-extras proxying through https://opencode.ai/zen/go/v1 (OpenAI-compat endpoint).
  • 5 agents, 3 cron jobs, no MCP servers, no TTS configured.

Steps to reproduce

  1. From a known-good v2026.4.24 install, take a config backup: cp ~/.openclaw/openclaw.json ~/.openclaw/openclaw.json.bak.
  2. sudo npm install -g [email protected]
  3. openclaw doctor --fix — minor changes only on this host: removed retired agents.defaults.llm.idleTimeoutSeconds block and archived 1 orphan transcript. Pre-staged ~157 MB of plugin runtime deps under ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-<hash>/.
  4. sudo systemctl restart openclaw
  5. Wait until log shows [gateway] http server listening (3 plugins: memory-core, memory-wiki, telegram; 160.4s) and [gateway] ready. curl http://127.0.0.1:18789/health returns 200 in ~30ms.
  6. Continue waiting 1–3 minutes. /health starts timing out; openclaw cron list returns gateway timeout after 30000ms.

Expected behavior

Gateway remains responsive after the [gateway] ready log line, accepts incoming WebSocket connections from the local CLI, and continues to background-stage plugin runtime dependencies without stalling the event loop.

Actual behavior

  • /health and CLI WS calls intermittently fail, then permanently time out.
  • ss -ltnp shows Recv-Q > 0 on the listening socket — connections queued, never accepted.
  • ps: gateway PID at ~55–112% CPU, growing CPU-time, climbing RSS (peaked at 1.3 GB during the incident).
  • Plugin runtime cache size frozen at 369 MB across 10+ second observations — write activity occurs but appears to overwrite the same files.

Diagnostic evidence

strace -p <pid> -c (4-second sample)

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 21.98    0.117403         598       196           statx
 19.93    0.106469         335       317           read
 16.80    0.089729         509       176           openat
 16.15    0.086274          89       969           access
 12.22    0.065258         372       175           close
  2.62    0.013981        1165        12        12 link
  2.40    0.012822         320        40           fstat
  1.94    0.010346         862        12           unlink
  1.73    0.009221         614        15        14 mkdir
  1.19    0.006361         530        12        12 fchown
  0.90    0.004833         402        12           chmod
  0.75    0.003990         332        12           fchmod
  0.71    0.003809         317        12           copy_file_range
  0.56    0.002972         247        12           ftruncate
...
total    0.534178         263      2030        38

Heavy directory traversal (statx, openat, read, access) plus per-iteration write ops (link, unlink, mkdir, chmod, copy_file_range, ftruncate). The write counts repeat on each sample — not draining.

strace -p <pid> -e trace=openat (representative slice)

openat(AT_FDCWD, "node_modules/openai/resources/beta/threads/runs", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/beta/realtime", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/beta/chatkit", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/audio", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/webhooks", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/vector-stores", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
... [continues with skills, responses, realtime, graders, fine-tuning, evals, conversations, containers, chat, beta]
... then re-enters from the top:
openat(AT_FDCWD, "node_modules/openai/resources/beta/threads/runs", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
openat(AT_FDCWD, "node_modules/openai/resources/beta/realtime", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 36
...

The walk traverses the same 14 top-level subdirectories of openai/resources/ (plus their nested beta/*, chat/*, containers/*, evals/*, fine-tuning/*) in identical order, repeating without termination.

Gateway log (relevant)

2026-04-28T12:26:41.871+07:00 [gateway] loading configuration…
2026-04-28T12:26:42.235+07:00 [gateway] starting...
2026-04-28T12:29:20.193+07:00 [gateway] starting HTTP server...
2026-04-28T12:29:22.682+07:00 [gateway] agent model: custom-opencode-go-extras/deepseek-v4-flash
2026-04-28T12:29:22.925+07:00 [gateway] http server listening (3 plugins: memory-core, memory-wiki, telegram; 160.4s)
2026-04-28T12:29:23.752+07:00 [gateway] ready
2026-04-28T12:29:24.080+07:00 [telegram] [default] starting provider (@AntonioCOCBot)
2026-04-28T12:29:25.694+07:00 [telegram] menu text exceeded the conservative 5700-character payload budget; shortening descriptions to keep 57 commands visible.
[no further log entries — gateway hangs while telegram polling is in flight]

No error or warning is emitted while the gateway hangs. Health-monitor's 60s startup grace and 300s interval pass silently.

Suspected root cause

The release notes for v2026.4.26 include several plugin-discovery / manifest-cache changes that could plausibly cause this. Candidates:

  • "Follow symlinked plugin directories in global and workspace plugin roots" (Plugins/Discovery)
  • "Reuse manifest records already loaded for bundled web provider candidate discovery" (Plugins/Web)
  • "Resolve runtime manifest-contract plugin owners from one plugin index" (Plugins/Contracts)
  • "Reuse one manifest registry pass while resolving bundled document and web-content extractor plugins" (Plugins/Extractors)
  • "Stage bundled plugin runtime dependencies before Gateway startup" (Plugins/Install)

The OpenAI SDK's resources/ tree has many cross-importing barrel files (index.js files that re-export sibling and child resources). A discovery walker that enters this tree from node_modules/openai and naively follows each barrel's exports would re-discover the same files via different paths, never converging unless the walker dedupes by realpath or by (dev, inode).

The write-side syscalls (link, copy_file_range, mkdir, chmod) suggest the walker is also mirroring entries into dist/extensions/node_modules/openclaw/plugin-sdk/ (or a similar staging path). That mirror operation appears to create or overwrite files that are then re-visited on the next pass, sustaining the loop.

Workaround

Rollback to v2026.4.24:

sudo systemctl stop openclaw
sudo npm install -g [email protected]
# v4.26's install partially overwrites shared paths under
# ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.24-<hash>/dist/extensions/node_modules/openclaw/plugin-sdk/
# so the v4.24 cache must be rebuilt:
rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.24-*
rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-2026.4.26-*
sudo systemctl start openclaw
# Gateway runs `npm install` for bundled plugin deps and is ready in ~90–130s.

Notes

  • The [gateway] agent model: custom-opencode-go-extras/deepseek-v4-flash log line is unrelated — it appears at startup on both v4.24 and v4.26 simply because that custom provider is registered. It is not a model-resolution error.
  • Doctor's --fix change to meta.lastTouchedVersion: 2026.4.26 survives a v4.24 rollback; restoring the pre-upgrade config silences the subsequent Config was last written by a newer OpenClaw warnings.
  • I can attach a longer strace log or full gateway log if useful — held back to keep this report scannable.

Severity

High for any deployment that auto-upgrades to v2026.4.26. The gateway becomes unresponsive without emitting an error, which makes the failure mode hard to diagnose without strace.

extent analysis

TL;DR

The issue can be temporarily resolved by rolling back to version v2026.4.24 due to a suspected infinite loop in the plugin discovery process introduced in v2026.4.26.

Guidance

  • The suspected root cause is related to changes in plugin-discovery and manifest-cache handling in v2026.4.26, specifically how the system handles symlinked plugin directories and reuses manifest records.
  • To verify the issue, monitor system resources (CPU, memory) and network connections after upgrading to v2026.4.26 and observe if the gateway becomes unresponsive.
  • The provided strace output indicates an infinite loop in directory traversal, which can be used to further diagnose the issue.
  • Consider reporting this issue to the OpenClaw developers for a permanent fix, as rolling back to v2026.4.24 is only a temporary workaround.

Example

No specific code snippet is provided as the issue seems to be related to the internal workings of the OpenClaw plugin discovery mechanism. However, the strace output gives insight into the system calls being made, which could be useful for debugging.

Notes

The issue seems to be specific to the v2026.4.26 version of OpenClaw, and rolling back to v2026.4.24 resolves the problem. It's essential to wait for an official fix from the OpenClaw developers to ensure the stability and security of the system.

Recommendation

Apply the workaround by rolling back to v2026.4.24 until a fixed version is released, as the current issue renders the gateway unresponsive without emitting an error, making it hard to diagnose without additional tools like strace.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Gateway remains responsive after the [gateway] ready log line, accepts incoming WebSocket connections from the local CLI, and continues to background-stage plugin runtime dependencies without stalling the event loop.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix v2026.4.26: Gateway busy-loops on bundled openai SDK directory walk; stops accepting connections [1 comments, 2 participants]