openclaw - 💡(How to fix) Fix sessions.usage and usage.cost still cause 16–48s event-loop stalls on 2026.5.27 (regression on #82773)

openclaw2026-05-29 05:03:24

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fix / Workaround

Possible mitigations:

Increase COST_USAGE_CACHE_TTL_MS (e.g., 300s) so the full scan fires less often.
Consider whether the initial discoverAllSessions readdir+parse can be truly non-blocking (offloaded from the event loop), since even "background" refresh appears to monopolize the thread during the discovery phase.
Add a configurable TTL so operators with large session directories can tune this without bundle patches.

RAW_BUFFERClick to expand / collapse

Hi OpenClaw team. Thank you for the background-refresh work in #82778. We're running 2026.5.27 and seeing significant event-loop stalls from sessions.usage and usage.cost despite the fix being included in our version.

Environment:

OpenClaw 2026.5.27
~1,600 session transcript files across agents
Host: Linux 6.17, 32 GB RAM, Node.js 25.9.0
Gateway heap limit: 4 GB (--max-old-space-size=4096)

Observations:

The Control UI triggers usage.cost + sessions.usage on every webchat connection. After the 30-second cache expires, the next request performs a full O(n) scan of all session files:

RPC	Warm cache (post-restart)	Cold cache
`usage.cost`	156ms	15,972–16,770ms
`sessions.usage`	1,535ms	24,364–48,456ms

Combined: ~65 seconds of event-loop stall per cold UI connection. During these scans, openclaw doctor health probes time out at 3s and all gateway RPCs are blocked.

Cache behavior:

COST_USAGE_CACHE_TTL_MS is hardcoded at 30 seconds (3e4 in usage-CGq21urE.js line 71). The background-refresh architecture from #82778 is present, but the short TTL causes frequent re-scans. With ~1,600 session files, even a "background" refresh of the full set is expensive enough to starve the event loop during the initial readdir + file-stat + first-message-parse pass (discoverAllSessions in session-cost-usage-7DA3wtDY.js).

Repro steps:

Accumulate ~1,600 session files across multiple agents.
Open the Control UI (webchat) and wait >30 seconds.
Reconnect or refresh the UI.
Observe usage.cost taking 16s+ and sessions.usage taking 24–48s in gateway logs.

Possible mitigations:

Increase COST_USAGE_CACHE_TTL_MS (e.g., 300s) so the full scan fires less often.
Consider whether the initial discoverAllSessions readdir+parse can be truly non-blocking (offloaded from the event loop), since even "background" refresh appears to monopolize the thread during the discovery phase.
Add a configurable TTL so operators with large session directories can tune this without bundle patches.

Related: #82773, #82778, #76650

Thanks for the great work on OpenClaw!

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering