openclaw - 💡(How to fix) Fix Gateway heap OOM: cron runtime contexts retain sessions.json and skillsSnapshot.prompt [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Observed on OpenClaw 2026.5.7 during gateway uptime on 2026-05-21.

Problem

  • The gateway crashed with V8 heap exhaustion, not a kernel OOM.
  • Heap snapshots showed ~2.0 GiB retained by ~113k copies of the skill prompt string.
  • The retainer chain runs through skillsSnapshot.prompt -> skillsSnapshot -> session entry -> session store -> cronSession.store -> cron run context.
  • The on-disk sessions.json footprint was much smaller (~4.7 MiB of prompt text in the main store), so this is runtime retention, not just durable store size.

What existing maintenance already does

  • OpenClaw session maintenance and the archive job help with old session files and sessions.json pruning.
  • That is useful for disk pressure, but it does not release the live cron/subagent execution context that keeps the store graph reachable.

Recommended fix

  • Add a shallow completion finalizer for isolated cron/subagent runs.
  • Release cronSession.store after the final durable write.
  • Clear the run context from agent-events after completion.
  • Keep the finalizer shallow and O(1), with no deep traversal and no hot-path disk rewrite.
  • Keep backward compatibility for existing sessions.json entries.

Why this matters

  • The failure recurred twice in one day, so disk cleanup cadence alone is not enough.
  • The right fix is runtime lifecycle cleanup, which should reduce GC pressure and avoid latency spikes.

Validation performed locally

  • Focused tests passed in the isolated worktree for the cleanup path and existing model-switch regression coverage.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING