openclaw - ✅(Solved) Fix [Bug]: SessionManager.fileEntries grows unbounded in memory, causing 1.4GB+ heap growth in long-running sessions [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58802Fetched 2026-04-08 02:32:30
View on GitHub
Comments
2
Participants
2
Timeline
8
Reactions
0
Author
Assignees
Timeline (top)
commented ×2cross-referenced ×2assigned ×1closed ×1

There's a slow leak hiding in the persistence layer that most people are working around without knowing what's causing it.

SessionManager.fileEntries is an in-memory array that mirrors every line written to the JSONL session transcript. Every message, tool call, tool result, and metadata entry gets pushed to it. Nothing ever removes them. Over hours of normal use, this array quietly accumulates hundreds of thousands of objects, eventually holding 1.4GB+ of retained heap. V8 can't collect any of it because SessionManager holds a live reference to the array for the lifetime of the process.

The tricky part: the LLM context window stays bounded. replaceMessages() keeps Agent._state.messages at a manageable size during compaction. So from the model's perspective, everything looks fine. But underneath, fileEntries is a parallel data structure in the persistence layer that compaction never touches. Two layers, only one with a ceiling.

This is likely the root cause behind multiple reported memory and session-bloat issues, including #6190, #13758, #2254, and #20910, which all share the same symptom pattern. It's a contributing factor in several others (see Related Issues below).

Root Cause

SessionManager.fileEntries (source: badlogic/pi-mono, packages/coding-agent/src/core/session-manager.ts) is a parallel data structure to Agent._state.messages. While Agent._state.messages is bounded by replaceMessages() (called during compaction, keeping the LLM context window at a manageable size), SessionManager.fileEntries is an unbounded in-memory mirror of the JSONL transcript that is never pruned. Neither native compaction nor context engine compaction touches it. This affects all OpenClaw deployments with long-running sessions, regardless of configuration.

Every appendMessage(), appendAssistantMessage(), tool result write, and metadata entry pushes to this array. The data is simultaneously persisted to the JSONL file on disk, but the in-memory array is never pruned and is retained for the lifetime of the SessionManager. For permanent sessions (main DM, social agent channels), the SessionManager lives for the lifetime of the process, so fileEntries grows without bound until the gateway restarts.

Fix Action

Workaround

Periodic gateway restarts clear all SessionManager instances and their fileEntries arrays. A watchdog cron that kills the gateway child process at a configurable RSS threshold (e.g., 2GB) provides automated mitigation:

RSS=$(awk '/VmRSS/{print $2}' /proc/$(pgrep -f openclaw-gateway)/status 2>/dev/null)
[ "${RSS:-0}" -gt 2097152 ] && sudo systemctl restart openclaw

Reducing --max-old-space-size from the default to match the watchdog threshold (e.g., 2048) forces V8 to run more aggressive GC on the genuinely temporary allocations (per-turn context assembly, HTTP response buffers), which slows the growth rate but does not stop it.

PR fix notes

PR #2749: fix: cap in-memory fileEntries array to prevent unbounded heap growth

Description (problem / solution / changelog)

Problem

SessionManager.fileEntries is a private FileEntry[] that mirrors every JSONL session entry in memory. It grows via push() in _appendEntry() and is never pruned. In long-running gateway sessions (9+ hours of normal use), this array silently accumulates thousands of entries containing full message bodies, tool results, and metadata, causing heap growth of 1GB+ with no upper bound.

The on-disk JSONL transcript is append-only and that's correct. But the in-memory mirror has no reason to retain the full history. Most read paths either use the byId Map for tree traversal or only need recent entries. Compaction summarizes older context for the LLM but never touches fileEntries. Two layers of session state, only one with a ceiling.

Heap snapshot analysis of a production session showed 99.5% of 250K+ retained message objects tracing back through fileEntries -> SessionManager -> AgentSession.

Full investigation with V8 heap snapshots, retainer analysis, and related issue survey: openclaw/openclaw#58802

Fix

Add a configurable sliding window cap (maxFileEntries, default 1000) on the in-memory fileEntries array. After each _appendEntry() and after setSessionFile() bulk loads, if the array exceeds the limit, the oldest entries (after the header at index 0) are spliced out and evicted from byId, labelsById, and labelTimestampsById.

The on-disk JSONL file is append-only and unaffected. Full session history is preserved on disk. Only the in-memory representation is bounded.

What changes

  • New _pruneIfNeeded() private method, called after _appendEntry() and after setSessionFile() loads entries from disk
  • New maxFileEntries private field, set via constructor (default: DEFAULT_MAX_FILE_ENTRIES = 1000)
  • Static factory methods (create, open, continueRecent, inMemory, forkFrom) accept optional maxFileEntries parameter
  • DEFAULT_MAX_FILE_ENTRIES exported for downstream configuration

What does NOT change

  • On-disk JSONL format and append-only persistence behavior
  • newSession() and createBranchedSession() (they replace fileEntries entirely)
  • Public API shape (getEntries(), getHeader(), getBranch(), buildSessionContext(), etc.)
  • All existing test behavior (876 tests pass, 0 failures)

Why 1000 as the default

  • A typical interactive turn produces 2-4 entries (user message, assistant message, optional tool calls). 1000 entries covers ~250-500 turns, well beyond what any single LLM context window can hold.
  • Compaction typically fires at 50K-200K tokens, which maps to ~100-400 entries. 1000 provides generous headroom above the compaction window.
  • OpenClaw's session.maintenance.maxEntries defaults to 500 for the session index file, so 1000 for the in-memory transcript mirror is consistent.
  • At ~1-10KB per entry, 1000 entries = 1-10MB of retained heap. Compared to unbounded growth toward 1GB+, that's a hard ceiling that stays invisible.

Trade-offs

  • Branch navigation to very old entries: If an entry has been pruned from memory, getEntry(id) returns undefined and getBranch(id) produces a truncated path. This only affects TUI users navigating to entries older than the window. The JSONL file on disk retains full history. Callers that need old entries can reload via loadEntriesFromFile().
  • getTree() shows fewer branches: Only entries within the window appear in the tree view. Disk has full history.
  • External access: OpenClaw's session-manager-init.ts accesses sm.fileEntries directly (bypassing TypeScript private). The array is still a plain FileEntry[] with the same shape, just bounded. OpenClaw's existing pattern of resetting sm.fileEntries = [header] continues to work.

Related issues

  • openclaw/openclaw#13758: Gateway accumulates memory over long sessions (1.9GB RSS after 13h). Comment by echoVic identifies SessionManager caching as likely primary cause.
  • openclaw/openclaw#6190: Session log growing and bot hanging up (master issue for session bloat).
  • openclaw/openclaw#4948: Multiple in-memory caches grow unbounded (same class of bug, fixed).
  • openclaw/openclaw#17820: Cron runs never clean up agent-events Maps (same pattern, ~68MB/hr growth).
  • openclaw/openclaw#51031: Tool calls hang after compaction; shows sessionManager.appendMessage and pendingState map divergence.
  • openclaw/openclaw#24800: Auto-compaction not triggered during tool-use loops.
  • openclaw/openclaw#33553: Feature request for configurable sliding window to cap conversation history.

Testing

  • 15 new tests in test/session-manager-pruning.test.ts covering:
    • Cap enforcement and header preservation
    • Most-recent-entries retention
    • Pruned entry eviction from byId
    • Leaf accessibility after pruning
    • newSession/createBranchedSession after pruning
    • Compaction within capped sessions
    • Persisted sessions (in-memory pruned, disk retains full JSONL)
    • getBranch, getTree, buildSessionContext after pruning
    • Minimum cap (1 entry)
    • Default cap applied when unspecified
  • All 876 existing tests pass with zero changes

Changed files

  • packages/coding-agent/src/core/session-manager.ts (modified, +67/-12)
  • packages/coding-agent/test/session-manager-pruning.test.ts (added, +254/-0)

Code Example

string:"message" (or "toolResult", "anthropic-messages")
Object (message entry with .type, .role, .api properties)
Array[N] (fileEntries array, indices up to 12,000+)
SessionManager.fileEntries
AgentSession.sessionManager

---

RSS=$(awk '/VmRSS/{print $2}' /proc/$(pgrep -f openclaw-gateway)/status 2>/dev/null)
[ "${RSS:-0}" -gt 2097152 ] && sudo systemctl restart openclaw
RAW_BUFFERClick to expand / collapse

Bug type

Memory leak (process RSS grows until OOM or watchdog restart)

Summary

There's a slow leak hiding in the persistence layer that most people are working around without knowing what's causing it.

SessionManager.fileEntries is an in-memory array that mirrors every line written to the JSONL session transcript. Every message, tool call, tool result, and metadata entry gets pushed to it. Nothing ever removes them. Over hours of normal use, this array quietly accumulates hundreds of thousands of objects, eventually holding 1.4GB+ of retained heap. V8 can't collect any of it because SessionManager holds a live reference to the array for the lifetime of the process.

The tricky part: the LLM context window stays bounded. replaceMessages() keeps Agent._state.messages at a manageable size during compaction. So from the model's perspective, everything looks fine. But underneath, fileEntries is a parallel data structure in the persistence layer that compaction never touches. Two layers, only one with a ceiling.

This is likely the root cause behind multiple reported memory and session-bloat issues, including #6190, #13758, #2254, and #20910, which all share the same symptom pattern. It's a contributing factor in several others (see Related Issues below).

Environment

  • OpenClaw v2026.3.24
  • Node v24.14.0
  • Ubuntu 24.04, 16GB RAM
  • Telegram channel, 7 agents (1 main + 5 social + 1 coding subagent)
  • 5 custom plugins (mem0 fork, LCM fork, sticky-context, privacy-guardrail, read-guardrail)
  • Context engine plugin with ownsCompaction: true (LCM/lossless-claw fork)

Steps to reproduce

  1. Run OpenClaw gateway with any channel for 6+ hours with moderate tool use (web_search, web_fetch, exec)
  2. Monitor the gateway child process RSS: cat /proc/$(pgrep -f openclaw-gateway)/status | grep VmRSS
  3. Observe RSS growing from ~300-400MB baseline to 1.5-2.5GB+ over the session
  4. Growth is activity-correlated: heavy tool-call bursts cause rapid spikes, quiet periods show minimal (but nonzero) growth

Note: systemctl show openclaw --property=MainPID returns the parent process (~124MB, stable). The actual gateway child process must be found via pgrep -f openclaw-gateway.

Evidence: V8 heap snapshots with retainer analysis

We chased this across three debugging sessions over several days before taking heap snapshots. The usual suspects (plugin re-registration, hook runners, registry caches) all came back clean. What finally told the story was comparing two V8 heap snapshots of the same gateway process (PID 4158913) using --heapsnapshot-signal=SIGUSR2:

SnapshotUptimeRSSHeap file sizeNode count
Baseline+2 min387 MB183 MB1,779,901
Grown+7 hr2,419 MB747 MB10,863,005

Delta: +9,083,104 nodes, +1,429 MB self-size growth.

What's growing

ConstructorInstance deltaSize delta% of growth
Object+1,837,480+112 MB7.8%
message+250,604+6 MB(self-size small, but messages reference large string content)
heap number+1,483,368+24 MB1.7%
Array+455,828+15 MB1.0%
anthropic-messages+121,601+5 MB0.4%
toolCall+99,294+2 MB(nested in messages)
toolResult+98,925+3 MB(nested in messages)
assistant+121,453+4 MB(nested in messages)
Large strings (>50KB)+2,047+229 MB16.0%

Large strings include raw web_search HTML results (40+ MB in 102 strings), base64 encoded images (25+ MB), and LLM response content. Separately, the session store duplicates system prompt/skills text per session entry (47 MB across 4,495 copies in skillsSnapshot fields). That's a distinct waste from the fileEntries issue but compounds the overall footprint.

What's NOT growing

We checked every suspect we could think of. None of them are the cause.

SuspectInstance deltaSize deltaVerdict
Plugin instances (OSSProvider)+68+4 KBNot the cause
LcmContextEngine+68+10 KBNot the cause
hookRunner00Singleton, stable
typedHooks00Stable
registryCache00Stable
Session store cacheN/A~2.7 MBNot the cause

Plugin re-registration (+68 instances over 7 hours) contributes less than 20 KB total, 0.001% of the 1.4 GB growth.

The retainer chain

This is what made the root cause unambiguous. We sampled 200 instances across message, anthropic-messages, and toolResult from different positions in the heap. 99.5% share the identical retainer pattern:

string:"message" (or "toolResult", "anthropic-messages")
  → Object (message entry with .type, .role, .api properties)
    → Array[N] (fileEntries array, indices up to 12,000+)
      → SessionManager.fileEntries
        → AgentSession.sessionManager

Every retained object traces back through fileEntries to SessionManager to AgentSession. One array. One field. That's the leak.

Scale

  • 73 SessionManager instances in the grown snapshot (7 agents, but each accumulates multiple sessions: cron jobs with isolatedSession: true create unique session keys, plus direct/group/subagent sessions)
  • 20 largest fileEntries arrays hold ~11,500 entries each
  • Estimated 100-400 MB in the top 20 sessions alone
  • Combined with string content referenced by the entries: ~1.4 GB total

Root cause

SessionManager.fileEntries (source: badlogic/pi-mono, packages/coding-agent/src/core/session-manager.ts) is a parallel data structure to Agent._state.messages. While Agent._state.messages is bounded by replaceMessages() (called during compaction, keeping the LLM context window at a manageable size), SessionManager.fileEntries is an unbounded in-memory mirror of the JSONL transcript that is never pruned. Neither native compaction nor context engine compaction touches it. This affects all OpenClaw deployments with long-running sessions, regardless of configuration.

Every appendMessage(), appendAssistantMessage(), tool result write, and metadata entry pushes to this array. The data is simultaneously persisted to the JSONL file on disk, but the in-memory array is never pruned and is retained for the lifetime of the SessionManager. For permanent sessions (main DM, social agent channels), the SessionManager lives for the lifetime of the process, so fileEntries grows without bound until the gateway restarts.

Why existing mitigations don't help

OpenClaw v2026.2.23 added disk-side session maintenance (session.maintenance), and pi-coding-agent has context compaction (Dec 2025). These address disk storage and LLM context respectively. Neither addresses the in-memory fileEntries array:

  • replaceMessages(): Bounds Agent._state.messages (the LLM context window). Does not touch SessionManager.fileEntries (the persistence layer).
  • contextPruning (cache-ttl mode): Trims tool results from the in-memory context before LLM calls. Does not rewrite JSONL history. Does not touch fileEntries.
  • session.maintenance (pruneAfter, maxEntries): Prunes the sessions.json store (metadata). Does not prune in-memory arrays of active sessions.
  • --max-old-space-size: Forces V8 to GC more aggressively, but these objects are reachable (live references from SessionManager). GC cannot collect them regardless of pressure.
  • session?.dispose() in cron/subagent teardown: Works correctly for short-lived sessions. But for permanent sessions (main DM, social agent channels), dispose() is never called because the session runs for the lifetime of the process.

Every mitigation that exists targets a different layer. The persistence layer's in-memory mirror falls through all of them.

Additional amplifier: ownsCompaction disables native compaction globally

When a context engine registers with ownsCompaction: true, the gateway disables Pi's internal auto-compaction trigger (proactive mid-run compaction) via setCompactionEnabled(false). This is global, not per-session. The overflow recovery path still calls contextEngine.compact(), but for sessions the context engine ignores (cron sessions, social agent sessions via ignoreSessionPatterns), the engine returns { compacted: false, reason: "session excluded" } and the gateway does not fall back to native compaction. Both compaction paths (proactive and overflow) end up doing nothing for ignored sessions, which means their fileEntries arrays grow without any check.

Suggested fix

Cap or eliminate the in-memory JSONL mirror in SessionManager (badlogic/pi-mono, packages/coding-agent/src/core/session-manager.ts). The data is already on disk in the JSONL file. Options:

  1. Don't hold fileEntries in memory after write. If no code path reads from the in-memory array after appending (needs verification), switch to append-only disk writes with on-demand JSONL parsing when reads are needed.

  2. Cap fileEntries with a sliding window. When the array exceeds N entries (e.g., 200), shift from the front. Older entries remain on disk in the JSONL file. This is the least disruptive change if reads from fileEntries do occur.

  3. Periodic trim to match context window size. After each turn, trim fileEntries to match the agent's bounded message count (e.g., the last 200 entries). This keeps the in-memory mirror useful for recent lookups without unbounded growth.

Additionally: when a context engine declines to compact an ignored session (compacted: false), the gateway should fall back to native compaction rather than giving up. This prevents unbounded growth in sessions the context engine doesn't manage.

Related issues

Master/parent issues

  • #6190: "Session Log growing and bot hanging up" (common ancestor for session bloat reports; #2254 and #6650 dup to this)

Same symptom (RSS growth from session accumulation), no prior heap analysis

  • #13758: 1.9GB + 69% CPU after 13h. Note: commenter echoVic identified session-manager-cache.ts as the likely primary cause, which is the right neighborhood. Our contribution is pinpointing the specific field (fileEntries) and mechanism (unbounded in-memory JSONL mirror) via heap snapshot retainer analysis. Closed as dup of #6413.
  • #6413: "Gateway process massive virtual memory leak (22GB+)." Parent of #13758 via dup chain, but was closed by stale bot without a fix. Reported VIRT growth, not RSS.
  • #24689: 6GB peak in 26 min (mixed causes reported, including Docker health check misconfiguration in one case; closed by stale bot)
  • #2254: Large session files, 396KB single tool result in JSONL (disk-side of the same problem; closed as dup of #6190)
  • #20910: 7,069 messages, 20MB transcript, death spiral (extreme case of unbounded growth; open/stale)

Same class of bug (unbounded in-memory structures)

  • #4948: Multiple in-memory caches grow unbounded (fixed via PR #11093; same CWE-400 pattern)
  • #17820: Cron runs never clean up agent-events Maps (~68 MB/hr, reaching 1-2GB over 10-12 hours; fixed)

Related (address content size or compaction gaps, not fileEntries directly)

  • #6650: Tool result session bloat proposal (addresses content size but not array growth; closed as dup of #6190)
  • #24800: Auto-compaction not triggered in tool-use loops (fixed via PR #29371; explains why compaction fails to bound growth in tool-heavy sessions)
  • #43767: Heartbeat loads unbounded session history (open; another entry point to same problem)

Not fileEntries (different root causes, included for completeness)

  • #42662: 3GB in 7 min crash loop. Root cause confirmed as Node.js v24 regression, resolved by downgrading to Node v22. Shares OOM symptoms but is a different bug.
  • #43193: Cron/subagent session accumulation. Documented expected behavior (audit trail), not a bug. Addressed by --delete-after-run flag and sessionRetention config.

Workaround

Periodic gateway restarts clear all SessionManager instances and their fileEntries arrays. A watchdog cron that kills the gateway child process at a configurable RSS threshold (e.g., 2GB) provides automated mitigation:

RSS=$(awk '/VmRSS/{print $2}' /proc/$(pgrep -f openclaw-gateway)/status 2>/dev/null)
[ "${RSS:-0}" -gt 2097152 ] && sudo systemctl restart openclaw

Reducing --max-old-space-size from the default to match the watchdog threshold (e.g., 2048) forces V8 to run more aggressive GC on the genuinely temporary allocations (per-turn context assembly, HTTP response buffers), which slows the growth rate but does not stop it.

extent analysis

TL;DR

The most likely fix for the memory leak issue is to cap or eliminate the in-memory JSONL mirror in SessionManager by implementing a sliding window or periodic trimming to match the context window size.

Guidance

  1. Verify the root cause: Confirm that the SessionManager.fileEntries array is the source of the memory leak by analyzing V8 heap snapshots and retainer analysis.
  2. Implement a sliding window: Limit the size of the fileEntries array by shifting older entries from the front when the array exceeds a certain size (e.g., 200 entries).
  3. Periodic trimming: Trim the fileEntries array to match the agent's bounded message count (e.g., the last 200 entries) after each turn to prevent unbounded growth.
  4. Fall back to native compaction: When a context engine declines to compact an ignored session, the gateway should fall back to native compaction to prevent unbounded growth.

Example

// Example of implementing a sliding window for fileEntries array
class SessionManager {
  // ...
  fileEntries = [];
  maxFileEntries = 200;

  appendMessage(message) {
    // ...
    this.fileEntries.push(message);
    if (this.fileEntries.length > this.maxFileEntries) {
      this.fileEntries.shift();
    }
  }
}

Notes

  • The provided workaround of periodic gateway restarts can help mitigate the issue but does not address the root cause.
  • Reducing --max-old-space-size can slow the growth rate but does not stop it.
  • The fix should be applied to the SessionManager class in the badlogic/pi-mono package.

Recommendation

Apply a workaround by implementing a sliding window or periodic trimming for the fileEntries array to prevent unbounded growth. This will help mitigate the memory leak issue until a permanent fix can be implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING