openclaw - ✅(Solved) Fix Subagent sessions accumulate in memory, causing 3GB+ RSS [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#69628Fetched 2026-04-22 07:49:53
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

Root Cause

  • Every sessions_spawn(mode="run") creates a new session file
  • Gateway loads ALL session history into memory (JSON parsed, 3-5x disk size)
  • 10 hours of subagent work can produce 500+ session files (281MB disk → ~1.5GB RAM)
  • No automatic cleanup for completed subagent sessions

Fix Action

Fixed

PR fix notes

PR #69887: fix: archive completed run-mode subagents

Description (problem / solution / changelog)

Problem

Completed sessions_spawn(mode="run") subagents were not being reaped reliably after they finished. That left completed child sessions and transcripts in the live session store, which contributes directly to the session accumulation and RSS growth reported in #69632 / #69628.

There were two gaps in the cleanup path:

  • run-mode subagents with cleanup: "keep" did not consistently carry archive timing, so they could miss the normal archive sweep path
  • post-completion announce cleanup could fail when embedded-run state or chat.history lagged behind persisted transcript state, leaving cleanupCompletedAt unset and blocking keep-mode sweep

What this changes

  • assign archive timing to all run-mode subagents, including cleanup: "keep"
  • only sweep keep-mode runs after cleanup bookkeeping has actually completed
  • during stale post-completion cleanup, cap embedded-run settle waits and recover output from the persisted transcript when live gateway history has not caught up yet
  • add regression coverage for archive bookkeeping and transcript-based cleanup recovery

Why this fixes the reported issue

This makes completed run-mode subagents actually transition out of the live session store within the configured archive window instead of accumulating indefinitely.

In other words: this PR fixes a subagent reaping gap. It does not redesign session storage or lazy loading in this PR; it makes sure finished run-mode child sessions get archived once delivery/cleanup is done.

Testing

  • pnpm check:changed --staged
  • pnpm repro:subagent run 69632
  • pnpm test src/agents/subagent-announce.test.ts
  • pnpm test src/agents/subagent-registry.archive.e2e.test.ts
  • pnpm test src/agents/subagent-registry-lifecycle.test.ts

Fixes #69632 Fixes #69628

Changed files

  • src/agents/subagent-announce-output.ts (modified, +39/-1)
  • src/agents/subagent-announce.test.ts (modified, +249/-0)
  • src/agents/subagent-announce.ts (modified, +19/-3)
  • src/agents/subagent-registry-lifecycle.test.ts (modified, +1/-0)
  • src/agents/subagent-registry-lifecycle.ts (modified, +12/-0)
  • src/agents/subagent-registry-run-manager.ts (modified, +2/-10)
  • src/agents/subagent-registry.archive.e2e.test.ts (modified, +77/-5)
  • src/agents/subagent-registry.ts (modified, +3/-0)
RAW_BUFFERClick to expand / collapse

Problem

Gateway RSS grows to 3GB+ due to all session history being loaded into memory.

Root Cause

  • Every sessions_spawn(mode="run") creates a new session file
  • Gateway loads ALL session history into memory (JSON parsed, 3-5x disk size)
  • 10 hours of subagent work can produce 500+ session files (281MB disk → ~1.5GB RAM)
  • No automatic cleanup for completed subagent sessions

Evidence

  • 896 session files, 281MB on disk
  • sessions.json index file: 45MB
  • Gateway RSS: ~3.2GB
  • Main contributor: session data JSON parsing (~1-1.5GB)

Steps to Reproduce

  1. Run 100+ subagent tasks over several hours
  2. Observe gateway RSS growing continuously
  3. Restart gateway: RSS drops but grows again as sessions are reloaded

Suggested Fixes

  1. Auto-cleanup completed subagent sessions after result is delivered
  2. Lazy-load session history (only load when needed)
  3. Session TTL config (e.g., prune completed sessions older than 1 hour)
  4. Streaming JSON parser instead of full parse into memory
  5. Move session storage to SQLite with pagination

Environment

  • OpenClaw latest
  • macOS arm64, 16GB RAM
  • Node.js v23.11.0

extent analysis

TL;DR

Implementing a session cleanup mechanism, such as auto-cleanup of completed subagent sessions or using a session TTL, can help mitigate the growing Gateway RSS issue.

Guidance

  • Investigate the feasibility of implementing a session TTL (time-to-live) configuration to automatically prune completed sessions older than a specified time frame (e.g., 1 hour) to reduce memory usage.
  • Consider lazy-loading session history, loading sessions only when needed, to decrease the amount of data loaded into memory at once.
  • Evaluate the use of a streaming JSON parser to parse session data in chunks, rather than loading the entire dataset into memory, to reduce memory consumption.
  • Review the suggested fixes and prioritize them based on implementation complexity and potential impact on the Gateway RSS growth issue.

Example

No code example is provided as the issue does not contain specific code snippets that can be modified or used as a reference.

Notes

The effectiveness of these suggestions may vary depending on the specific requirements and constraints of the system, such as the need for historical session data and the frequency of subagent task execution.

Recommendation

Apply a workaround, specifically implementing a session cleanup mechanism, as it is a more feasible and immediate solution to mitigate the growing Gateway RSS issue, rather than waiting for a potential fix in a future version.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Subagent sessions accumulate in memory, causing 3GB+ RSS [1 pull requests, 1 participants]