openclaw - 💡(How to fix) Fix Severe Event Loop Blocking During Agent Run Startup (~90s per turn) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75984Fetched 2026-05-03 04:43:36
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
2
Author
Timeline (top)
closed ×1commented ×1labeled ×1

Root Cause

The long spans in the trace are not due to slow I/O but are side-effects of synchronous operations that block the event loop:

  1. Tool creation and plugin lazy loading. createOpenClawCodingTools instantiates all built‑in + plugin + skill tools synchronously. When a bundled plugin (e.g., acpx) is activated for the first time, it installs ~42 npm dependencies synchronously, blocking the event loop for tens of seconds.
  2. Provider runtime resolution. resolveProviderRuntimePlugin walks the plugin registry synchronously to resolve runtime providers, adding further delay.
  3. Because these operations block the event loop, subsequent async await calls (auth resolution, system prompt concatenation, SSE connections) are delayed until the loop becomes free. This makes each stage appear to take 10–25 s even though their actual computation is <1 s.

Fix Action

Fix / Workaround

  • Startup stages (attempt‑dispatch): ~25 s total, but the auth and attempt‑dispatch spans each appear to take 12 s, even though auth key resolution and MCP server connections are fast in isolation.
  • Prep stages (stream‑ready): ~68 s total, with long durations reported for core-plugin-tools, bundle-tools, system-prompt, and stream-setup.
  1. Configure an OpenClaw gateway with a large number of workspace skills (e.g., 15–21 skills) and the bundled plugins enabled.
  2. Send a message through the responses API (e.g., POST /v1/responses).
  3. Observe the trace logs for that run: startup stages (auth, attempt-dispatch) and prep stages (core-plugin-tools, bundle-tools, system-prompt, stream-setup) together take ~90 s before streaming begins.
  4. Check liveness diagnostics: eventLoopDelayMaxMs exceeds 30 s and CPU utilization is pegged on one core during startup.

In normal operation, the OpenClaw agent should begin streaming responses within a second or so after a request is sent. Startup stages (auth, attempt-dispatch, core-plugin-tools, bundle-tools, system-prompt, stream-setup) should complete quickly without blocking the event loop. Metrics like eventLoopDelayMaxMs should remain below 100 ms, and the CPU utilization should not be pegged on a single core.

Code Example

See root cause analysis and observed behavior sections above for timings and liveness diagnostics. Sample liveness warning: `eventLoopDelayMaxMs=55532.6 eventLoopUtilization=1 cpuCoreRatio=1.023`. Startup stage timings were captured in the summary. Additional logs can be provided if required.
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Environment

  • OpenClaw version: 2026.4.29
  • Node.js: v22.22.1
  • OS: Linux (Docker container with ample CPU and RAM)

Observed Behavior

Every agent run in gateway mode exhibits very long startup latency (~90 seconds) before streaming begins. The trace output splits this delay into two phases:

  • Startup stages (attempt‑dispatch): ~25 s total, but the auth and attempt‑dispatch spans each appear to take 12 s, even though auth key resolution and MCP server connections are fast in isolation.
  • Prep stages (stream‑ready): ~68 s total, with long durations reported for core-plugin-tools, bundle-tools, system-prompt, and stream-setup.

Liveness diagnostics show the Node.js event loop is completely blocked during these stages, with eventLoopDelayMaxMs > 30 s and CPU utilization pegged on a single core.

Root Cause Analysis

The long spans in the trace are not due to slow I/O but are side-effects of synchronous operations that block the event loop:

  1. Tool creation and plugin lazy loading. createOpenClawCodingTools instantiates all built‑in + plugin + skill tools synchronously. When a bundled plugin (e.g., acpx) is activated for the first time, it installs ~42 npm dependencies synchronously, blocking the event loop for tens of seconds.
  2. Provider runtime resolution. resolveProviderRuntimePlugin walks the plugin registry synchronously to resolve runtime providers, adding further delay.
  3. Because these operations block the event loop, subsequent async await calls (auth resolution, system prompt concatenation, SSE connections) are delayed until the loop becomes free. This makes each stage appear to take 10–25 s even though their actual computation is <1 s.

Impact

This issue makes OpenClaw impractical for interactive use cases because every agent turn incurs ~90 s of startup overhead before any output is streamed.

Suggested Fixes

  • Yield to the event loop between heavy synchronous operations in createOpenClawCodingTools and plugin loading using setImmediate() / process.nextTick().
  • Pre-install plugin runtime dependencies at gateway startup (e.g., in a model-prewarm sidecar) instead of on first agent run.
  • Add finer-grained tracing in core-plugin-tools to expose which internal operations consume most time.
  • Cache tool objects across agent turns within a session so that tool creation and plugin loading run once per session.

Additional Context

I reproduced this on a clean gateway with 21 workspace skills and default bundled plugins. Disabling unused plugins or reducing the number of skills reduced the total latency by ~10–15%, but the fundamental blocking remained because core tool creation and plugin activation are synchronous.

Steps to reproduce

  1. Configure an OpenClaw gateway with a large number of workspace skills (e.g., 15–21 skills) and the bundled plugins enabled.
  2. Send a message through the responses API (e.g., POST /v1/responses).
  3. Observe the trace logs for that run: startup stages (auth, attempt-dispatch) and prep stages (core-plugin-tools, bundle-tools, system-prompt, stream-setup) together take ~90 s before streaming begins.
  4. Check liveness diagnostics: eventLoopDelayMaxMs exceeds 30 s and CPU utilization is pegged on one core during startup.

Expected behavior

In normal operation, the OpenClaw agent should begin streaming responses within a second or so after a request is sent. Startup stages (auth, attempt-dispatch, core-plugin-tools, bundle-tools, system-prompt, stream-setup) should complete quickly without blocking the event loop. Metrics like eventLoopDelayMaxMs should remain below 100 ms, and the CPU utilization should not be pegged on a single core.

Actual behavior

When running an agent with many workspace skills, the first token of the response is not streamed until roughly 90 seconds after the request is made. Trace logs show that startup and preparation stages (auth, attempt-dispatch, core-plugin-tools, bundle-tools, system-prompt, stream-setup) each appear to take 10–23 s even though the individual operations are fast. Liveness warnings report the event loop is fully blocked: eventLoopDelayMaxMs is often ~55 s and eventLoopUtilization is pegged near 1. CPU utilization sits on one core. Streaming proceeds normally after the delay.

OpenClaw version

2026.4.29

Operating system

Linux (Docker container with 96 CPU cores and 772 GB RAM)

Install method

docker

Model

minimax/text-01

Provider / routing chain

openclaw -> cloudflare-ai-gateway -> minimax

Additional provider/model setup details

Large number of skills cause event loop blocking. Observed with default route openclaw -> cloudflare-ai-gateway -> minimax. Reproduced across other models (openrouter/anthropic/claude-opus-4 and anthropic/claude-sonnet-4.5) and on self‑hosted/direct model connections. Config uses ~21 skills; issue persists with 8+ skills.

Logs, screenshots, and evidence

See root cause analysis and observed behavior sections above for timings and liveness diagnostics. Sample liveness warning: `eventLoopDelayMaxMs=55532.6 eventLoopUtilization=1 cpuCoreRatio=1.023`. Startup stage timings were captured in the summary. Additional logs can be provided if required.

Impact and severity

Affected: All users running OpenClaw agents with many workspace skills (>15) and the default plugin set. Severity: High – adds roughly 90 seconds of latency before any response is streamed, making interactive use impractical. Frequency: Always reproduced with high skill counts (observed on every request with 21 skills). Consequence: Agents appear unresponsive for over a minute before streaming starts, leading to user frustration and potential timeouts.

Additional information

First observed in version 2026.4.29 (may have started earlier). No known good version when running many skills. Issue appears to coincide with plugin lazy loading and synchronous tool creation. Workaround: reduce the number of workspace skills (e.g., <8) or pre‑load plugin runtime dependencies at startup.

extent analysis

TL;DR

Yield to the event loop between heavy synchronous operations in createOpenClawCodingTools and plugin loading to reduce startup latency.

Guidance

  1. Implement asynchronous tool creation: Modify createOpenClawCodingTools to use setImmediate() or process.nextTick() to yield to the event loop between synchronous operations.
  2. Pre-install plugin runtime dependencies: Run a model-prewarm sidecar at gateway startup to pre-install plugin dependencies, reducing the load time during agent runs.
  3. Add finer-grained tracing: Enhance tracing in core-plugin-tools to identify which internal operations consume the most time, helping to optimize performance.
  4. Cache tool objects: Cache tool objects across agent turns within a session to minimize the overhead of tool creation and plugin loading.

Example

// Example of using setImmediate() to yield to the event loop
function createOpenClawCodingTools() {
  // ...
  const tools = [];
  // Create tools synchronously, but yield to the event loop between each creation
  for (const tool of builtInTools) {
    tools.push(createTool(tool));
    setImmediate(() => {}); // Yield to the event loop
  }
  // ...
}

Notes

The provided guidance assumes that the issue is primarily caused by synchronous operations blocking the event loop. However, the actual implementation details may vary, and additional optimizations might be necessary.

Recommendation

Apply the suggested workarounds, such as yielding to the event loop and pre-installing plugin dependencies, to reduce the startup latency and improve the overall performance of OpenClaw agents.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

In normal operation, the OpenClaw agent should begin streaming responses within a second or so after a request is sent. Startup stages (auth, attempt-dispatch, core-plugin-tools, bundle-tools, system-prompt, stream-setup) should complete quickly without blocking the event loop. Metrics like eventLoopDelayMaxMs should remain below 100 ms, and the CPU utilization should not be pegged on a single core.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Severe Event Loop Blocking During Agent Run Startup (~90s per turn) [1 comments, 2 participants]