openclaw - ✅(Solved) Fix [Bug]: First-load RPC fanout: tts.status monopolizes event loop ~1.5s and applyPluginAutoEnable recomputes 8× per fanout [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#81355Fetched 2026-05-14 03:33:02
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
2
Author
Timeline (top)
commented ×1cross-referenced ×1

On cold start, dashboard / UI clients issue 9–10 RPCs concurrently against the gateway. Two independent issues cause this fanout to take 1.3–2.7 s instead of completing in parallel:

  • (A) "tts.status" is declared async but contains zero await expressions. It runs ~1.5 s of synchronous code (TTS config resolution, provider scanning, plus a synchronous readFileSync inside readPrefs) before returning, monopolizing the event loop and starving every sibling handler on the same connection.
  • (B) applyPluginAutoEnable(...) is invoked 8 times per fanout with the same config object reference and the same process.env — ~75 ms × 8 ≈ 600 ms of redundant pure-CPU work.

Together these account for ~2.1 s of avoidable main-thread occupancy on every cold-start fanout. They are logically independent and can be addressed in separate PRs.

Root Cause

Because tts.status enters its handler in the same tick as four sibling handlers (sessions.list, status, models.list, usage.cost) but never yields, all sibling handlers' awaits resolve only after tts.status returns. The dashboard's channels.status request, which arrived in the same WS frame batch, does not even enter its handler until 1.5 s after the others. This single handler accounts for the entire "front-block" segment of the cold-start fanout.

Fix Action

Fix / Workaround

Estimated impact of fixing both bugs (extrapolated from the probe data, not measured under a patched build):

PR fix notes

PR #81389: perf(config): cache applyPluginAutoEnable result per config/env identity

Description (problem / solution / changelog)

Summary

Addresses Bug (B) from #81355.

applyPluginAutoEnable is a pure function called 8 times during a single dashboard cold-start fanout with the same (config, env) object references. Each call costs ~75 ms of synchronous CPU work, totaling ~600 ms of redundant computation.

Changes

Add a two-level WeakMap cache to applyPluginAutoEnable in src/config/plugin-auto-enable.apply.ts, keyed on (config, env) object identity:

  • When both config and env are present and their references match a cached entry, the cached result is returned immediately.
  • When either is missing (undefined), the function falls back to uncached computation (preserving existing behavior).
  • WeakMap keys ensure entries are garbage-collected automatically when config snapshots rotate.

The original computation logic is extracted into a private computeAutoEnable helper with no behavioral changes.

Impact

Per the issue's measurements: 7 of 8 calls per fanout become cache hits, saving ~525 ms of main-thread time.

Real behavior proof

Behavior addressed: applyPluginAutoEnable returns cached results for identical (config, env) references, eliminating redundant computation during dashboard fanout.

Real environment tested: Linux 6.17.0-22-generic (x64), Node.js v24.14.1, OpenClaw gateway with vitest v4.1.6.

Exact steps or command run after fix:

  1. Applied the patch to src/config/plugin-auto-enable.apply.ts
  2. Ran npx vitest run src/config/plugin-auto-enable.apply.test.ts --reporter=verbose on a real OpenClaw checkout
  3. Verified cache hit behavior and timing

Evidence after fix:

$ npx vitest run src/config/plugin-auto-enable.apply.test.ts --reporter=verbose

 ✓ applyPluginAutoEnable caching > returns the same result for the same config and env references  224ms
 ✓ applyPluginAutoEnable caching > recomputes when config reference changes  389ms
 ✓ applyPluginAutoEnable caching > works without config or env (no cache, no crash)  467ms
 ✓ applyPluginAutoEnable caching > cached calls are faster than uncached calls  212ms

 Test Files  1 passed (1)
      Tests  4 passed (4)

The timing test confirms: 7 cached calls complete faster than 1 uncached call. The identity check (result1 === result2) proves the exact same object is returned from cache.

Observed result after fix: Cache hits return immediately with O(1) WeakMap lookup. 7 cached calls + overhead < 1 uncached call time. Object identity is preserved across cached calls.

What was not tested: Full dashboard cold-start fanout latency measurement (requires instrumented gateway with hrtime probes as described in the issue). The cache correctness and performance benefit are verified via the tests above.

Testing

Added src/config/plugin-auto-enable.apply.test.ts with 4 tests:

  1. Same (config, env) references → returns identical (cached) result object
  2. Different config references → recomputes (cache miss)
  3. Missing config/env → no caching, no crash
  4. Timing proof: 7 cached calls complete faster than 1 uncached call

Note: This PR addresses only Bug (B) from #81355. Bug (A) (tts.status event-loop blocking) is independent and can be addressed separately.

Changed files

  • src/config/plugin-auto-enable.apply.test.ts (added, +73/-0)
  • src/config/plugin-auto-enable.apply.ts (modified, +28/-0)

Code Example

"tts.status": async ({ respond, context }) => {
  try {
    const cfg = context.getRuntimeConfig();
    const config = resolveTtsConfig(cfg);                        // ~200 ms
    const prefsPath = resolveTtsPrefsPath(config);
    const provider = getTtsProvider(config, prefsPath);          // ~347 ms (readPrefs → readFileSync)
    const persona = getTtsPersona(config, prefsPath);
    const autoMode = resolveTtsAutoMode({ config, prefsPath });
    const fallbackProviders = resolveTtsProviderOrder(provider, cfg)
      .slice(1)
      .filter((c) => isTtsProviderConfigured(config, c, cfg));   // ~905 ms (15 providers × isConfigured)
    const providerStates = listSpeechProviders(cfg).map(/* isConfigured per provider */); // ~114 ms
    respond(true, { /* ... */ });
  } catch (err) { /* ... */ }
}

---

HND tts.status ENTER             @0.0 ms
  TS after getRuntimeConfig      @0.1 ms
  TS after resolveTtsConfig      @198.8 ms    ← 199 ms
  TS after resolveTtsPrefsPath   @199.0 ms
  TS after getTtsProvider        @546.0 ms    ← 347 ms (readFileSync inside readPrefs)
  TS after getTtsPersona         @546.1 ms
  TS after resolveTtsAutoMode    @546.3 ms
  TS after fallbackProviders     @1451.3 ms   ← 905 ms (slowest segment)
  TS after providerStates        @1565.8 ms   ← 114 ms
HND tts.status RESP +1565.8 ms

---

export function applyPluginAutoEnable(params: {
  config?: OpenClawConfig;
  env?: NodeJS.ProcessEnv;
  manifestRegistry?: PluginManifestRegistry;
}): PluginAutoEnableResult {
  const candidates = detectPluginAutoEnableCandidates(params);
  return materializePluginAutoEnableCandidates({
    config: params.config,
    candidates,
    env: params.env,
    manifestRegistry: params.manifestRegistry,
  });
}

---

const cache = new WeakMap<object, WeakMap<object, PluginAutoEnableResult>>();

export function applyPluginAutoEnable(params) {
  const config = params.config;
  const env = params.env;
  if (config && env) {
    let inner = cache.get(config);
    if (!inner) { inner = new WeakMap(); cache.set(config, inner); }
    const hit = inner.get(env);
    if (hit) return hit;
    const result = computeAutoEnable(params);
    inner.set(env, result);
    return result;
  }
  return computeAutoEnable(params);
}
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug

Summary

On cold start, dashboard / UI clients issue 9–10 RPCs concurrently against the gateway. Two independent issues cause this fanout to take 1.3–2.7 s instead of completing in parallel:

  • (A) "tts.status" is declared async but contains zero await expressions. It runs ~1.5 s of synchronous code (TTS config resolution, provider scanning, plus a synchronous readFileSync inside readPrefs) before returning, monopolizing the event loop and starving every sibling handler on the same connection.
  • (B) applyPluginAutoEnable(...) is invoked 8 times per fanout with the same config object reference and the same process.env — ~75 ms × 8 ≈ 600 ms of redundant pure-CPU work.

Together these account for ~2.1 s of avoidable main-thread occupancy on every cold-start fanout. They are logically independent and can be addressed in separate PRs.

Steps to reproduce

  1. Fresh-start the gateway: openclaw gateway restart.
  2. Load the gateway dashboard (or any UI/MCP client that issues the standard read-only fanout).
  3. The dashboard typically issues: sessions.list, status, models.list, usage.cost, tts.status, channels.status, tools.catalog × N agents — all in the same WebSocket frame batch.
  4. Measure per-RPC latency in the client (or instrument the handlers with hrtime probes).

Expected behavior

  • Independent read-only RPCs should complete concurrently; no single handler should block sibling RPCs sharing the same connection.
  • Pure, deterministic helpers like applyPluginAutoEnable should not recompute the same answer 8 times for the same input within a single fanout.

Actual behavior

Measured cold-start fanout on v2026.5.7, gateway freshly restarted, single dashboard load:

HandlerRESP time
tts.status1566 ms
channels.status646 ms (handler ENTER deferred ~1.5 s after WS frame arrival)
models.list2177 ms
status2296 ms
usage.cost2592 ms
sessions.list2662 ms
tools.catalog × 3186 / 216 / 224 ms (serialized, back-to-back)

Total wall time ~2.7 s. A main-thread heartbeat probe (5 ms setTimeout, alerts when the gap exceeds 80 ms) fires continuously across the entire 2.7 s window — the event loop never yields.


Bug (A) — tts.status handler synchronously blocks the event loop ~1.5 s

Source: src/gateway/server-methods/tts.ts:29

"tts.status": async ({ respond, context }) => {
  try {
    const cfg = context.getRuntimeConfig();
    const config = resolveTtsConfig(cfg);                        // ~200 ms
    const prefsPath = resolveTtsPrefsPath(config);
    const provider = getTtsProvider(config, prefsPath);          // ~347 ms (readPrefs → readFileSync)
    const persona = getTtsPersona(config, prefsPath);
    const autoMode = resolveTtsAutoMode({ config, prefsPath });
    const fallbackProviders = resolveTtsProviderOrder(provider, cfg)
      .slice(1)
      .filter((c) => isTtsProviderConfigured(config, c, cfg));   // ~905 ms (15 providers × isConfigured)
    const providerStates = listSpeechProviders(cfg).map(/* isConfigured per provider */); // ~114 ms
    respond(true, { /* ... */ });
  } catch (err) { /* ... */ }
}

The handler is async, but the body contains no await expression. Every helper invoked is synchronous; several call readFileSync (readPrefs in extensions/speech-core/runtime-api.ts) or do synchronous provider enumeration via isConfigured. The handler therefore executes ~1.5 s of pure synchronous CPU + sync I/O on the event-loop thread before returning — no microtask interleaves during this window.

Per-segment probe data (cold-start, gateway-restarted run):

HND tts.status ENTER             @0.0 ms
  TS after getRuntimeConfig      @0.1 ms
  TS after resolveTtsConfig      @198.8 ms    ← 199 ms
  TS after resolveTtsPrefsPath   @199.0 ms
  TS after getTtsProvider        @546.0 ms    ← 347 ms (readFileSync inside readPrefs)
  TS after getTtsPersona         @546.1 ms
  TS after resolveTtsAutoMode    @546.3 ms
  TS after fallbackProviders     @1451.3 ms   ← 905 ms (slowest segment)
  TS after providerStates        @1565.8 ms   ← 114 ms
HND tts.status RESP +1565.8 ms

Because tts.status enters its handler in the same tick as four sibling handlers (sessions.list, status, models.list, usage.cost) but never yields, all sibling handlers' awaits resolve only after tts.status returns. The dashboard's channels.status request, which arrived in the same WS frame batch, does not even enter its handler until 1.5 s after the others. This single handler accounts for the entire "front-block" segment of the cold-start fanout.

Suggested fixes (any subset would help, in roughly descending impact):

  1. Convert the synchronous I/O helpers to async (readPrefsfs.promises.readFile) and await them — yielding several times during the handler's execution.
  2. Parallelize isConfigured across providers (each call is independent of the others) via Promise.all. The current .filter(...isTtsProviderConfigured) is the single largest segment (~900 ms across 15 providers).
  3. Cache isConfigured(provider, cfg) for the lifetime of a single cfg reference — useful because both fallbackProviders and providerStates enumerate the same providers back-to-back.
  4. Even as a stopgap, insert await Promise.resolve() between the heavy synchronous segments to let sibling handlers interleave.

Bug (B) — applyPluginAutoEnable recomputes the same result 8× per fanout

Source: src/config/plugin-auto-enable.apply.ts:34

export function applyPluginAutoEnable(params: {
  config?: OpenClawConfig;
  env?: NodeJS.ProcessEnv;
  manifestRegistry?: PluginManifestRegistry;
}): PluginAutoEnableResult {
  const candidates = detectPluginAutoEnableCandidates(params);
  return materializePluginAutoEnableCandidates({
    config: params.config,
    candidates,
    env: params.env,
    manifestRegistry: params.manifestRegistry,
  });
}

The function is pure on its inputs (config, env, manifestRegistry). During one dashboard fanout, it is invoked 8 times across the read-only RPC paths:

CallerCall count
channels.status (entry + getRuntimeSnapshot inside the handler)2
tools.catalog × 3 agents (each calls it twice via ensureStandalonePluginToolRegistryLoaded + resolvePluginTools)6
Total per fanout8

Identity check via WeakMap instrumentation on the inputs:

  • All 8 calls during a fanout receive the same config object referencecontext.getRuntimeConfig() returns an identity-stable snapshot within a fanout window.
  • All 8 calls receive params.env === process.env (same identity).

So every call recomputes an answer that already exists. Single-call cost is ~75 ms (≈55 ms detect + ≈22 ms materialize), giving 8 × 75 ms ≈ 600 ms of redundant synchronous CPU per fanout.

Suggested fix — two-level WeakMap keyed on object identity:

const cache = new WeakMap<object, WeakMap<object, PluginAutoEnableResult>>();

export function applyPluginAutoEnable(params) {
  const config = params.config;
  const env = params.env;
  if (config && env) {
    let inner = cache.get(config);
    if (!inner) { inner = new WeakMap(); cache.set(config, inner); }
    const hit = inner.get(env);
    if (hit) return hit;
    const result = computeAutoEnable(params);
    inner.set(env, result);
    return result;
  }
  return computeAutoEnable(params);
}

Because both keys are WeakMap-able objects, entries are collected automatically when a new runtime config snapshot rotates in. manifestRegistry is identity-stable for the same config in our measurements, so the two-level key on (config, env) is sufficient; a single-level WeakMap<config, result> would also work in practice and is even simpler.

Measured hit rate on a real fanout: 7 of 8 calls become cache hits, saving ~525 ms.

OpenClaw version

2026.5.7 (commit eeef486449)

Operating system

WSL2 (Ubuntu 24.04 on Windows 11), Node.js v22.21.1

Model

N/A

Provider / routing chain

N/A

Install method

npm install -g openclaw (running as a systemd user service)

Logs, screenshots, and evidence

All latency numbers above come from hrtime probes inserted at the handler call sites in a freshly restarted gateway during a single dashboard load. No sensitive paths or credentials are included.

Additional information

The two bugs compound: while tts.status holds the event loop for ~1.5 s, sibling handlers' lazy-import I/O (statusloadStatusSummaryRuntimeModule, models.listloadModelsListCatalog, etc.) can resolve I/O in the background, but their resumed microtasks queue up behind tts.status. Once tts.status returns, the siblings all resolve nearly simultaneously and immediately encounter the redundant applyPluginAutoEnable work along the channels.status and tools.catalog paths.

Estimated impact of fixing both bugs (extrapolated from the probe data, not measured under a patched build):

  • Fix (A) alone: cold-start fanout total drops from ~2.7 s to ~1.2 s (siblings can finally overlap).
  • Fix (A) + (B): drops to ~500–700 ms.

These two issues are logically independent — they share only the surface symptom ("dashboard cold start feels slow"), not their root cause. We are happy to split them into separate issues if that better fits OpenClaw's triage workflow.


Reported by the CoClaw team. This issue was discovered while developing @coclaw/openclaw-coclaw, a CoClaw channel plugin for OpenClaw.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Independent read-only RPCs should complete concurrently; no single handler should block sibling RPCs sharing the same connection.
  • Pure, deterministic helpers like applyPluginAutoEnable should not recompute the same answer 8 times for the same input within a single fanout.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING