openclaw - 💡(How to fix) Fix ACP startup sidecars saturate event loop on installs with many sessions — identity-reconcile + session-locks 450-460 s wall, eventLoopDelayP99 6 min

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Gateway startup measures sidecars.acp.identity-reconcile at >450 seconds (7.5 minutes) and sidecars.session-locks at >463 seconds (7.7 minutes) on installs with a large number of persisted ACP sessions. During this window the diagnostic emits liveness warning: ... eventLoopDelayP99Ms=360240 eventLoopDelayMaxMs=360240 eventLoopUtilization=1 — the gateway event loop is effectively non-responsive for ~6 minutes. New prompts, channel events, and active session work all queue behind this serial startup work.

The proposed direction: bound startup-phase work with explicit concurrency caps, move per-session reconcile off the awaited startup path (background worker / queued), and add a SLO on sidecars.acp.identity-reconcile and sidecars.session-locks so wall-time growth is alerted before it saturates the event loop.

Error Message

Error / log evidence

Root Cause

src/gateway/server-startup-post-attach.ts:520-540:

if (params.cfg.acp?.enabled) {
  void (async () => {
    await waitForAcpRuntimeBackendReady({ backendId: params.cfg.acp?.backend });
    const [{ getAcpSessionManager }, { ACP_SESSION_IDENTITY_RENDERER_VERSION }] =
      await Promise.all([
        import("../acp/control-plane/manager.js"),
        import("../acp/runtime/session-identifiers.js"),
      ]);
    const result = await getAcpSessionManager().reconcilePendingSessionIdentities({
      cfg: params.cfg,
    });
    // …
  })().catch(/* swallow */);
}

reconcilePendingSessionIdentities (src/acp/control-plane/manager.core.ts:230-296) is a sequential for ... of acpSessions loop, each iteration await-ing withSessionActor()ensureRuntimeHandle()reconcileRuntimeSessionIdentifiers()writeSessionMeta():

for (const session of acpSessions) {
  // …
  await this.withSessionActor(session.sessionKey, async () => {
    const { runtime, handle, meta } = await this.ensureRuntimeHandle({});
    const reconciled = await this.reconcileRuntimeSessionIdentifiers({});
    // …
  });
}

Each iteration can do runtime status fetch + disk I/O. With ~50-90 sessions and a contended event loop (other agents actively running tools at the same time), the wall-clock blows out to multiple minutes.

sidecars.session-locks (server-startup-post-attach.ts:550-584) is similarly a sequential for (const sessionsDir of sessionDirs) { await cleanStaleLockFiles({…}) } over all per-agent session directories. Same shape; same growth profile.

Fix Action

Workaround

  • Prune terminal ACP session metadata so reconcilePendingSessionIdentities has less to inspect (related: #82414, #72013).
  • Restart the gateway at low-traffic hours so the 7-minute event-loop bubble doesn't queue user prompts.

Code Example

liveness warning: reasons=event_loop_delay,event_loop_utilization
  interval=364s eventLoopDelayP99Ms=360240.4 eventLoopDelayMaxMs=360240.4
  eventLoopUtilization=1 cpuCoreRatio=0.71
  active=1 waiting=0 queued=0
  recentPhases=sidecars.acp.identity-reconcile:450139ms,
               channels.discord.is-configured:0ms,
               channels.discord.runtime:0ms,
               channels.discord.approval-bootstrap:0ms,
               channels.discord.start-account-handoff:4ms,
               sidecars.session-locks:463310ms
  work=[active=agent:ops:discord:channel:...(processing/tool_call,q=1,age=361s last=tool:bash:started)]

---

if (params.cfg.acp?.enabled) {
  void (async () => {
    await waitForAcpRuntimeBackendReady({ backendId: params.cfg.acp?.backend });
    const [{ getAcpSessionManager }, { ACP_SESSION_IDENTITY_RENDERER_VERSION }] =
      await Promise.all([
        import("../acp/control-plane/manager.js"),
        import("../acp/runtime/session-identifiers.js"),
      ]);
    const result = await getAcpSessionManager().reconcilePendingSessionIdentities({
      cfg: params.cfg,
    });
    // …
  })().catch(/* swallow */);
}

---

for (const session of acpSessions) {
  // …
  await this.withSessionActor(session.sessionKey, async () => {
    const { runtime, handle, meta } = await this.ensureRuntimeHandle({});
    const reconciled = await this.reconcileRuntimeSessionIdentifiers({});
    // …
  });
}

---

const CONCURRENCY = 8;
   const queue = [...acpSessions];
   await Promise.all(
     Array.from({ length: CONCURRENCY }, async () => {
       while (queue.length) {
         const session = queue.shift();
         if (!session) return;
         // existing per-session work here
       }
     }),
   );
RAW_BUFFERClick to expand / collapse

Summary

Gateway startup measures sidecars.acp.identity-reconcile at >450 seconds (7.5 minutes) and sidecars.session-locks at >463 seconds (7.7 minutes) on installs with a large number of persisted ACP sessions. During this window the diagnostic emits liveness warning: ... eventLoopDelayP99Ms=360240 eventLoopDelayMaxMs=360240 eventLoopUtilization=1 — the gateway event loop is effectively non-responsive for ~6 minutes. New prompts, channel events, and active session work all queue behind this serial startup work.

The proposed direction: bound startup-phase work with explicit concurrency caps, move per-session reconcile off the awaited startup path (background worker / queued), and add a SLO on sidecars.acp.identity-reconcile and sidecars.session-locks so wall-time growth is alerted before it saturates the event loop.

Environment

  • OpenClaw 2026.5.20 (e510042) — npm install at ~/.local/lib/node_modules/openclaw
  • Node 25.8.1, macOS 25.3.0 (arm64)
  • Install with ~90 persisted ACP sessions across main/ops/ashley/indexer/chatgpt/chief-of-staff/agents-orchestrator agents
  • ACP backend: acpx; runtime: codex + claude

Reproduction

Approximate (full reproduction needs the same session fanout as our install):

  1. Install OpenClaw on a host with acp.enabled: true and accumulate many ACP sessions across multiple agents (~50+).
  2. Restart the gateway (launchctl kickstart -k gui/$(id -u)/ai.openclaw.gateway).
  3. Within minutes of startup, observe liveness warning events with recentPhases=sidecars.acp.identity-reconcile:NNN ms,…sidecars.session-locks:MMM ms where NNN, MMM are 100s of seconds and eventLoopDelayP99Ms ≥ 30s.

Error / log evidence

From /Users/agent/.openclaw/logs/gateway.log.20260522-180005, 2026-05-22 15:21:41 ICT:

liveness warning: reasons=event_loop_delay,event_loop_utilization
  interval=364s eventLoopDelayP99Ms=360240.4 eventLoopDelayMaxMs=360240.4
  eventLoopUtilization=1 cpuCoreRatio=0.71
  active=1 waiting=0 queued=0
  recentPhases=sidecars.acp.identity-reconcile:450139ms,
               channels.discord.is-configured:0ms,
               channels.discord.runtime:0ms,
               channels.discord.approval-bootstrap:0ms,
               channels.discord.start-account-handoff:4ms,
               sidecars.session-locks:463310ms
  work=[active=agent:ops:discord:channel:...(processing/tool_call,q=1,age=361s last=tool:bash:started)]

The diagnostic-phase tracker (diagnostic-phase-D1Ieo0f0.js) records these as wall-clock durations of withDiagnosticPhase / measureStartup calls — the IIFE genuinely took 450 s to settle. The matching acp startup identity reconcile (renderer=...): checked=0 resolved=0 failed=0 summary line is absent (logged only if checked > 0), so the 450 s is upstream of the inner loop — almost certainly the chained dynamic import() + getAcpSessionManager().reconcilePendingSessionIdentities() + the in-loop withSessionActor() awaits, all blocked behind the saturated event loop while other work runs.

Root cause

src/gateway/server-startup-post-attach.ts:520-540:

if (params.cfg.acp?.enabled) {
  void (async () => {
    await waitForAcpRuntimeBackendReady({ backendId: params.cfg.acp?.backend });
    const [{ getAcpSessionManager }, { ACP_SESSION_IDENTITY_RENDERER_VERSION }] =
      await Promise.all([
        import("../acp/control-plane/manager.js"),
        import("../acp/runtime/session-identifiers.js"),
      ]);
    const result = await getAcpSessionManager().reconcilePendingSessionIdentities({
      cfg: params.cfg,
    });
    // …
  })().catch(/* swallow */);
}

reconcilePendingSessionIdentities (src/acp/control-plane/manager.core.ts:230-296) is a sequential for ... of acpSessions loop, each iteration await-ing withSessionActor()ensureRuntimeHandle()reconcileRuntimeSessionIdentifiers()writeSessionMeta():

for (const session of acpSessions) {
  // …
  await this.withSessionActor(session.sessionKey, async () => {
    const { runtime, handle, meta } = await this.ensureRuntimeHandle({});
    const reconciled = await this.reconcileRuntimeSessionIdentifiers({});
    // …
  });
}

Each iteration can do runtime status fetch + disk I/O. With ~50-90 sessions and a contended event loop (other agents actively running tools at the same time), the wall-clock blows out to multiple minutes.

sidecars.session-locks (server-startup-post-attach.ts:550-584) is similarly a sequential for (const sessionsDir of sessionDirs) { await cleanStaleLockFiles({…}) } over all per-agent session directories. Same shape; same growth profile.

Suggested fix

  1. Bound concurrency in reconcilePendingSessionIdentities with an explicit limit (e.g. 4-8 parallel withSessionActor calls), so wall-clock scales sublinearly with session count:

    const CONCURRENCY = 8;
    const queue = [...acpSessions];
    await Promise.all(
      Array.from({ length: CONCURRENCY }, async () => {
        while (queue.length) {
          const session = queue.shift();
          if (!session) return;
          // existing per-session work here
        }
      }),
    );
  2. Yield to the event loop between iterations even in sequential mode, using setImmediate() or await scheduler.yield() (Node 22+). This lets other work (Discord events, in-flight tool calls) progress during the reconcile sweep.

  3. Move to a true background worker thread for the disk-scan + runtime-status portions, communicating back via the existing diagnostic event bus. The control-plane state mutation still has to land on the main loop, but the I/O does not.

  4. Add SLO/warning when sidecars.acp.identity-reconcile or sidecars.session-locks durations exceed e.g. 30 s, with the count of sessions inspected attached. A warning that fires before saturation is more useful than a liveness warning that fires during saturation.

Workaround

  • Prune terminal ACP session metadata so reconcilePendingSessionIdentities has less to inspect (related: #82414, #72013).
  • Restart the gateway at low-traffic hours so the 7-minute event-loop bubble doesn't queue user prompts.

Severity

P2 — gateway is functionally unresponsive for several minutes after restart on installs with normal accumulated session history. Stacks badly with #84076/#82640-class incidents: a wedged bash tool call from one session sits in recovery=none while the event loop is busy reconciling identities for unrelated sessions. Also surfaces as "going dark" / "did not respond" complaints from users typing during the window.

Related

  • #72013 — ACP startup identity reconcile warns on terminal one-shot sessions (open; about noise, not about wall time)
  • #82414 — reconcilePendingSessionIdentities counts vanished-backer sessions as "failed" indefinitely; no prune path (closed)
  • #40566 — ACP startup: identity reconcile runs before acpx backend ready (closed)
  • #73655 — Gateway leak triad on plugin restart: Manifest EADDRINUSE retry loop, signal-handler accumulation, sync I/O on session JSONL → WS handshake starvation (closed; same family of event-loop-starvation root causes)
  • #78402 — Gateway repeatedly closes connections (1000/1005/1006) due to event-loop starvation caused by stuck tool call (closed; symptomatic, different root)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix ACP startup sidecars saturate event loop on installs with many sessions — identity-reconcile + session-locks 450-460 s wall, eventLoopDelayP99 6 min