openclaw - 💡(How to fix) Fix Gateway heap pressure from oversized session-store hydration

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The gateway memory pressure we were seeing is not primarily the old resolvedSkills compounding bug. The live pressure comes from oversized session-store hydration and cache retention:

  • sessions.json is large enough to matter by itself.
  • The runtime was keeping parsed session-store object graphs and immutable snapshots in memory after load.
  • Startup and recovery paths repeatedly touch the same large session-store data during bootstrap, restore, and maintenance.

That combination makes the gateway retain far more of the session tree than it needs to stay functional.

Scope note: this issue write-up is limited to the session-store updates in this branch. It explicitly excludes the separate bootstrap-cache patch and the separate webgate patch from the other branch.

Root Cause

The core bug is unbounded runtime retention of large session-store objects and snapshots. The bootstrap-cache retention fix is a separate issue and should be tracked independently.

Fix Action

Fix / Workaround

Scope note: this issue write-up is limited to the session-store updates in this branch. It explicitly excludes the separate bootstrap-cache patch and the separate webgate patch from the other branch.

User observation: heavy-load behavior also seemed to improve noticeably after the patch, with the gateway feeling snappier during normal use. That is currently anecdotal and should be treated as an observation, not a measured result.

  • an overnight soak
  • a full quarantine/restart-recovery cleanup pass after this specific patch
RAW_BUFFERClick to expand / collapse

Gateway heap pressure from oversized session-store hydration

Summary

The gateway memory pressure we were seeing is not primarily the old resolvedSkills compounding bug. The live pressure comes from oversized session-store hydration and cache retention:

  • sessions.json is large enough to matter by itself.
  • The runtime was keeping parsed session-store object graphs and immutable snapshots in memory after load.
  • Startup and recovery paths repeatedly touch the same large session-store data during bootstrap, restore, and maintenance.

That combination makes the gateway retain far more of the session tree than it needs to stay functional.

Scope note: this issue write-up is limited to the session-store updates in this branch. It explicitly excludes the separate bootstrap-cache patch and the separate webgate patch from the other branch.

Evidence

  • The previous crash was a V8 heap OOM, not a native crash.
  • The on-disk session trees are multi-gigabyte:
    • /root/.openclaw/agents/main/sessions is about 2.6G
    • /root/.openclaw/agents/main/sessions_quarantine is about 2.9G
  • The store load path reads and parses the whole session store before normalization and cache publication.
  • The store cache needs to be size-aware because oversized stores should not stay resident as parsed object graphs.

Root Cause

The core bug is unbounded runtime retention of large session-store objects and snapshots. The bootstrap-cache retention fix is a separate issue and should be tracked independently.

Fix Direction

  1. Keep oversized session-store objects out of the live object and snapshot caches.
  2. Preserve the on-disk store as the source of truth so large stores can still be reloaded on demand.
  3. Add regression coverage that proves:
    • oversized stores do not stay cached in memory,
    • repeated loads fall back to disk,
    • cache reads with an oversized size hint invalidate stale runtime caches.
  4. If long-run monitoring still shows upward drift, follow up on restart-recovery and quarantine scanning.

Files Changed

  • /root/.openclaw/openclaw-fork-rebase/src/config/sessions/store-cache.ts
  • /root/.openclaw/openclaw-fork-rebase/src/config/sessions.cache.test.ts

Verification Notes

Runtime proof showed:

  • after save, runtime cache entries stayed at 0
  • repeated loads returned equal payloads but different references
  • direct oversized cache reads returned null
  • runtime snapshot and serialized cache counts remained 0 for the oversized store

The live gateway also stayed healthy and recovered back down after spikes during normal use. The conservative internal critical warnings still fired at times, but the post-GC floor remained bounded in the observed window and did not show the earlier runaway crash pattern.

User observation: heavy-load behavior also seemed to improve noticeably after the patch, with the gateway feeling snappier during normal use. That is currently anecdotal and should be treated as an observation, not a measured result.

What is not yet proven:

  • an overnight soak
  • a full quarantine/restart-recovery cleanup pass after this specific patch

Those can be added later if the long-run floor starts drifting.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway heap pressure from oversized session-store hydration