openclaw - 💡(How to fix) Fix Gateway sessionStore maintenance synchronously blocks event loop for 30-60s, causes GC starvation and OOM [1 comments, 2 participants]

openclaw2026-04-27 13:29:53

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72826•Fetched 2026-04-28 06:31:52

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jared-rebel

Participants

jared-rebel

steipete

Timeline (top)

closed ×1commented ×1cross-referenced ×1

Error Message

OpenClaw gateway exhibits monotonic heap growth and crashes with V8 fatal error Reached heap limit Allocation failed - JavaScript heap out of memory, on the order of every ~90 minutes on a single-host install with ~50 cron jobs running.

Root Cause

Root cause (summarized)

Fix Action

Fix / Workaround

Workarounds applied locally

Code Example

Runtime_CreateDataProperty
    → DescriptorArray::CopyUpTo
      → Map::CopyAddDescriptors
        → JSObject::DefineOwnPropertyIgnoreAttributes

RAW_BUFFERClick to expand / collapse

Symptoms

Heap RSS grows linearly at ~11 MB/min until OOM

V8 stacks at the moment of OOM:

Runtime_CreateDataProperty
  → DescriptorArray::CopyUpTo
    → Map::CopyAddDescriptors
      → JSObject::DefineOwnPropertyIgnoreAttributes

i.e. the GC is starving while object descriptor maps grow without compaction.

Event-loop blocks of 30–60 seconds are observed every ~2 minutes during heavy cron periods.

Trigger

session.maintenance.maxEntries is hit on <agent>/sessions/sessions.json. When the store is at the cap, every new session ingestion triggers a synchronous maintenance pass:

JSON.parse of the (now ~1.3–1.5 MB) sessions.json
structuredClone of all entries
Sort by updatedAt
Trim to cap
Synchronous write back to disk

This whole pipeline runs on the main event loop. With per-run-isolated cron sessions, a single agent (e.g. haiku-utility) can produce 1.5–2k new session entries per day, hitting the cap continuously and firing this pass every couple of minutes.

Frequency observed

On one host:

55 cap-hits in 112 minutes (one cap-hit ≈ one full sync maintenance pass)
Lock durations (per-pass event-loop block, measured from process.hrtime around the maintenance call): 36 s, 58 s, 60 s, …
5,176 <sessionId>.jsonl files in ~/.openclaw/agents/haiku-utility/sessions/ accumulated over 3 days, while sessions.json was capped at 50 entries throughout.

When the loop stalls for that long, GC cycles stall too. Heap is unable to compact descriptor arrays for newly-instantiated session/handler objects, and the process eventually OOMs. Restarting the gateway reclaims memory; the cycle then repeats.

Root cause (summarized)

session.maintenance is a synchronous, in-loop, full-file rewrite hot path that fires on every cap-hit. With high cron throughput producing fresh sessions, that path is hit far more often than it was apparently designed for.

Suggested fixes (any one would unblock; combination is ideal)

Chunk maintenance via setImmediate / async iteration. Yield the loop between parse/sort/trim/write phases so other work (channel ingress, scheduling, GC) can interleave.
Move the maintenance pass to a worker thread. The store on disk is the canonical state; the worker can read, compute the new trimmed set, and atomically swap the file via rename(2).
Async I/O with backpressure. Replace the sync read/write with fsPromises and a single inflight-promise per agent so concurrent ingressions queue rather than each kicking off another full pass.
Coalesce cap-hits. When the store is over cap, schedule one maintenance pass on a debounced timer (e.g. 5 s) rather than running per-ingestion. Multiple ingestions inside the debounce window share the same pass.
Bound the data structure. A streaming/append-only log + periodic compaction would avoid full-file rewrites altogether.

Workarounds applied locally

Persistent session keys for cron runs. sessionTarget: "session:cron-<slug>" instead of sessionTarget: "isolated" for 49 of 50 enabled non-one-shot agentTurn crons. This collapses ~1.7 k sessions/day per heavy agent down to ~1 session per cron, dramatically reducing cap-hit frequency.
- Note: sessionTarget: "isolated" always forces a new sessionId per run regardless of sessionKey (forceNew: input.job.sessionTarget === "isolated" in server.impl-*.js). Folks who set sessionKey expecting reuse are silently getting per-run sessions.
Drop session.maintenance.maxEntries from 50 to 25. Smaller working set means each maintenance pass is faster and the file size stays smaller.

Reproduction

Easiest minimal repro: schedule a kind: agentTurn cron with sessionTarget: "isolated" running every minute on one agent. Within a day the agent's sessions.json will hit the cap and you can observe the sync maintenance pass via --prof or by attaching clinic flame.

Impact

Single-host installs with even moderate cron usage (≤ 50 jobs, mix of intervals from minutes to days) appear to hit this consistently. The combination of "sync I/O on hot loop" + "cap-hit every couple of minutes" is enough to pin GC and cause OOM on a 4 GB host within ~90 minutes.

Happy to share heap snapshots / cron config / sessions.json sample if it helps.

extent analysis

TL;DR

Implement one of the suggested fixes, such as chunking maintenance via setImmediate or moving the maintenance pass to a worker thread, to prevent synchronous maintenance passes from blocking the event loop and causing heap growth.

Guidance

Identify the most suitable fix from the suggested options, considering the specific use case and performance requirements.
Implement the chosen fix, ensuring that it is properly tested and validated to prevent regressions.
Monitor the system's performance and heap growth after applying the fix to verify its effectiveness.
Consider combining multiple fixes for optimal results, as suggested in the issue.

Example

No code snippet is provided, as the issue does not contain sufficient information to create a specific example.

Notes

The issue highlights the importance of asynchronous I/O and event loop management in preventing heap growth and crashes. The suggested fixes aim to address the root cause of the problem, but may require additional testing and validation to ensure their effectiveness.

Recommendation

Apply workaround: Implement one of the suggested fixes, such as chunking maintenance via setImmediate or moving the maintenance pass to a worker thread, to prevent synchronous maintenance passes from blocking the event loop and causing heap growth. This is recommended because it directly addresses the identified root cause of the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#ISR setup #authentication setup #request error #file not found #serialization error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Gateway sessionStore maintenance synchronously blocks event loop for 30-60s, causes GC starvation and OOM [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause (summarized)

Fix Action

Fix / Workaround

Workarounds applied locally

Code Example

Symptoms

Trigger

Frequency observed

Root cause (summarized)

Suggested fixes (any one would unblock; combination is ideal)

Workarounds applied locally

Reproduction

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Gateway sessionStore maintenance synchronously blocks event loop for 30-60s, causes GC starvation and OOM [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Root cause (summarized)

Fix Action

Fix / Workaround

Workarounds applied locally

Code Example

Symptoms

Trigger

Frequency observed

Root cause (summarized)

Suggested fixes (any one would unblock; combination is ideal)

Workarounds applied locally

Reproduction

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING