openclaw - ✅(Solved) Fix Sub-agent session store entries persist after archiveAfterMinutes sweep (sessions.delete failure silently swallowed) [1 pull requests, 3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#49000Fetched 2026-04-08 00:49:54
View on GitHub
Comments
3
Participants
4
Timeline
11
Reactions
0
Timeline (top)
referenced ×6commented ×3cross-referenced ×2

When agents.defaults.subagents.archiveAfterMinutes triggers a sweep, the subagent run record is deleted from runs.json but the corresponding session store entry in sessions.json persists indefinitely if the sessions.delete gateway RPC fails.

Root Cause

In subagent-registry.ts, sweepSubagentRuns():

  1. The run record is deleted from the in-memory map and persisted to disk
  2. sessions.delete is called via callGateway()
  3. The call is wrapped in try/catch {} that silently swallows all errors

If the gateway is temporarily unavailable (restart, timeout, connection issue), the session store entry is never cleaned up, and there's no retry or reconciliation mechanism.

Fix Action

Fixed

PR fix notes

PR #49004: fix(subagent-registry): prevent orphaned session store entries after sweep

Description (problem / solution / changelog)

Problem

Fixes #49000

When sweepSubagentRuns() runs, it:

  1. Deletes the run record from the in-memory map and persists it to disk
  2. Calls sessions.delete via callGateway()
  3. Wraps the call in try/catch {} that silently swallows all errors

If the gateway is temporarily unavailable (restart, timeout, connection issue), the run record is gone but the sessions.json entry persists forever with no retry path.

On a system with archiveAfterMinutes: 720 we observed 5 orphaned session store entries aged 3–40 hours with no corresponding runs.json record.

Changes

Fix 1 — sweepSubagentRuns(): delete run record only after successful sessions.delete

Move subagentRuns.delete(runId) to after the sessions.delete call succeeds. On failure, log a warning and continue to the next entry — the run record is preserved and the next sweep interval will retry automatically.

Before (buggy): run record removed first, then session.delete called, errors swallowed silently.

After (fixed): session.delete called first; on error, log warning + continue (retry next interval); run record only removed on success.

Fix 2 — reconcileOrphanedSessionStoreEntries(): startup cleanup of pre-existing orphans

New async function called from initSubagentRegistry(). On startup, after run records are restored from disk, it:

  1. Builds the set of known childSessionKey values from runs.json
  2. Scans all session store paths for entries matching the subagent session key pattern (isSubagentSessionKey)
  3. Any entry with a subagent key but no corresponding runs.json record is deleted via sessions.delete

This handles the backlog of orphans that accumulated before this fix was deployed, and serves as a safety net for any future edge cases.

Testing

  • TypeScript type-check passes (tsc --noEmit: clean)
  • Both changes are narrow and consistent with the existing pattern in resolveSubagentRunOrphanReason() (same store loading, same key resolution)
  • reconcileOrphanedSessionStoreEntries wraps all store reads and gateway calls in try/catch so startup is never blocked or broken

Changed files

  • src/agents/subagent-registry.ts (modified, +109/-9)
RAW_BUFFERClick to expand / collapse

Description

When agents.defaults.subagents.archiveAfterMinutes triggers a sweep, the subagent run record is deleted from runs.json but the corresponding session store entry in sessions.json persists indefinitely if the sessions.delete gateway RPC fails.

Root Cause

In subagent-registry.ts, sweepSubagentRuns():

  1. The run record is deleted from the in-memory map and persisted to disk
  2. sessions.delete is called via callGateway()
  3. The call is wrapped in try/catch {} that silently swallows all errors

If the gateway is temporarily unavailable (restart, timeout, connection issue), the session store entry is never cleaned up, and there's no retry or reconciliation mechanism.

Expected Behavior

Session store entries should be reliably cleaned up when a subagent run is swept.

Actual Behavior

Session store entries accumulate indefinitely. On a system with archiveAfterMinutes: 720, we observed 5 orphaned session store entries aged 3-40 hours with no corresponding runs.json record.

Suggested Fix

  1. Either move subagentRuns.delete() to after successful sessions.delete, or implement retry
  2. Add reverse orphan reconciliation in reconcileOrphanedRestoredRuns() — scan sessions.json for subagent-keyed entries that have no matching runs.json record, and clean them up on startup

Environment

  • OpenClaw 2026.3.13
  • macOS arm64
  • archiveAfterMinutes: 720

Reproduction

  1. Set agents.defaults.subagents.archiveAfterMinutes to a small value (e.g., 5)
  2. Spawn several subagents (mode: "run")
  3. Wait for sweep interval (60s)
  4. Restart gateway while sweep is in progress, OR trigger gateway restart via config change during sweep window
  5. Observe: runs.json entries cleaned up, but sessions.json entries persist

extent analysis

Fix Plan

To fix the issue, we will implement a retry mechanism for the sessions.delete call and add a reverse orphan reconciliation in reconcileOrphanedRestoredRuns().

Step 1: Implement Retry Mechanism

We will use a library like async-retry to implement a retry mechanism for the sessions.delete call.

import retry from 'async-retry';

// In subagent-registry.ts, sweepSubagentRuns() function
async function sweepSubagentRuns() {
  // ...
  try {
    await retry(async () => {
      await callGateway('sessions.delete', { subagentId: id });
    }, {
      retries: 3,
      minTimeout: 1000,
      maxTimeout: 5000,
    });
  } catch (error) {
    // Log the error and continue
    console.error('Error deleting session:', error);
  }
  // ...
}

Step 2: Add Reverse Orphan Reconciliation

We will add a reverse orphan reconciliation in reconcileOrphanedRestoredRuns() to scan sessions.json for subagent-keyed entries that have no matching runs.json record, and clean them up on startup.

// In subagent-registry.ts, reconcileOrphanedRestoredRuns() function
async function reconcileOrphanedRestoredRuns() {
  const sessions = await loadSessions();
  const runs = await loadRuns();

  Object.keys(sessions).forEach((subagentId) => {
    if (!runs[subagentId]) {
      // Delete the orphaned session store entry
      await callGateway('sessions.delete', { subagentId });
    }
  });
}

Verification

To verify that the fix worked, you can follow these steps:

  • Set agents.defaults.subagents.archiveAfterMinutes to a small value (e.g., 5)
  • Spawn several subagents (mode: "run")
  • Wait for sweep interval (60s)
  • Restart gateway while sweep is in progress, OR trigger gateway restart via config change during sweep window
  • Observe: Both runs.json and sessions.json entries should be cleaned up

Extra Tips

Make sure to handle errors properly and log them for debugging purposes. Also, consider adding a timeout for the retry mechanism to prevent infinite retries.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Sub-agent session store entries persist after archiveAfterMinutes sweep (sessions.delete failure silently swallowed) [1 pull requests, 3 comments, 4 participants]