openclaw - 💡(How to fix) Fix sessions.usage and usage.cost still cause 16–48s event-loop stalls on 2026.5.27 (regression on #82773)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

Possible mitigations:

  • Increase COST_USAGE_CACHE_TTL_MS (e.g., 300s) so the full scan fires less often.
  • Consider whether the initial discoverAllSessions readdir+parse can be truly non-blocking (offloaded from the event loop), since even "background" refresh appears to monopolize the thread during the discovery phase.
  • Add a configurable TTL so operators with large session directories can tune this without bundle patches.
RAW_BUFFERClick to expand / collapse

Hi OpenClaw team. Thank you for the background-refresh work in #82778. We're running 2026.5.27 and seeing significant event-loop stalls from sessions.usage and usage.cost despite the fix being included in our version.

Environment:

  • OpenClaw 2026.5.27
  • ~1,600 session transcript files across agents
  • Host: Linux 6.17, 32 GB RAM, Node.js 25.9.0
  • Gateway heap limit: 4 GB (--max-old-space-size=4096)

Observations:

The Control UI triggers usage.cost + sessions.usage on every webchat connection. After the 30-second cache expires, the next request performs a full O(n) scan of all session files:

RPCWarm cache (post-restart)Cold cache
usage.cost156ms15,972–16,770ms
sessions.usage1,535ms24,364–48,456ms

Combined: ~65 seconds of event-loop stall per cold UI connection. During these scans, openclaw doctor health probes time out at 3s and all gateway RPCs are blocked.

Cache behavior:

COST_USAGE_CACHE_TTL_MS is hardcoded at 30 seconds (3e4 in usage-CGq21urE.js line 71). The background-refresh architecture from #82778 is present, but the short TTL causes frequent re-scans. With ~1,600 session files, even a "background" refresh of the full set is expensive enough to starve the event loop during the initial readdir + file-stat + first-message-parse pass (discoverAllSessions in session-cost-usage-7DA3wtDY.js).

Repro steps:

  1. Accumulate ~1,600 session files across multiple agents.
  2. Open the Control UI (webchat) and wait >30 seconds.
  3. Reconnect or refresh the UI.
  4. Observe usage.cost taking 16s+ and sessions.usage taking 24–48s in gateway logs.

Possible mitigations:

  • Increase COST_USAGE_CACHE_TTL_MS (e.g., 300s) so the full scan fires less often.
  • Consider whether the initial discoverAllSessions readdir+parse can be truly non-blocking (offloaded from the event loop), since even "background" refresh appears to monopolize the thread during the discovery phase.
  • Add a configurable TTL so operators with large session directories can tune this without bundle patches.

Related: #82773, #82778, #76650

Thanks for the great work on OpenClaw!

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix sessions.usage and usage.cost still cause 16–48s event-loop stalls on 2026.5.27 (regression on #82773)