openclaw - ✅(Solved) Fix session.maintenance enforce maxEntries can evict pending subagent sessions before announce delivery [3 pull requests, 3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#81492Fetched 2026-05-14 03:31:30
View on GitHub
Comments
3
Participants
4
Timeline
10
Reactions
2
Author
Timeline (top)
commented ×3cross-referenced ×3closed ×1labeled ×1

When session.maintenance.mode is set to "enforce" and the session store is over session.maintenance.maxEntries, OpenClaw can evict a just-completed/pending-delivery subagent session before the announce/result-freeze path reads the child output.

This causes the parent to receive a “completed successfully” subagent event with:

  • Result: (no output)
  • Stats: ... tokens 0 (in 0 / out 0)
  • session_id: unknown

even though the child actually ran, produced assistant text, and used nonzero tokens.

Error Message

A Slack parent spawned a run-mode subagent with cleanup: "delete".

Root Cause

At minimum, a successful pending-delivery subagent should not be reported as delivered with (no output) and zero tokens merely because the session row was capped.

Fix Action

Workaround

Raise session.maintenance.maxEntries, for example:

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 2000
    }
  }
}

This avoids the immediate cap pressure but does not fix the lifecycle bug.

PR fix notes

PR #81496: fix(sessions): preserve pending subagent rows during maintenance

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: enforced session.maintenance.maxEntries could cap a just-finished subagent session row before the announce/result path read its output.
  • Why it matters: parent sessions could receive a successful completion handoff with (no output), session_id: unknown, and zero token stats even though the child produced a real result.
  • What changed: session maintenance now supports lifecycle-owned preserve-key providers, and the subagent registry registers active plus cleanup-pending child session keys until cleanup is complete.
  • What did NOT change (scope boundary): no archive transcript fallback, delivery retry policy, or session disk-budget policy changes are included here.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #81492
  • Related #
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count. Be mindful of private information like IP addresses, API keys, phone numbers, non-public endpoints, or other private details when providing evidence.

  • Behavior or issue addressed: subagent child session rows remain available to announce/result capture while active or cleanup-pending, even when entry-count maintenance is enforcing a saturated cap.
  • Real environment tested: local Linux checkout on Node 24 / pnpm 11.1.0.
  • Exact steps or command run after this patch: pnpm test src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): real production-module terminal proof, no Vitest and no mocks:
$ tmpdir=$(mktemp -d); PROOF_DIR="$tmpdir" OPENCLAW_SESSION_CACHE_TTL_MS=0 pnpm exec tsx -e '<production saveSessionStore proof>'
[sessions/store] capped session entry count
{
  "keys": [
    "agent:main:subagent:pending-proof"
  ],
  "preservedPendingSubagent": true,
  "cappedFreshDm": true
}
  • Observed result after fix: with enforced maxEntries: 1, the lifecycle-preserved pending subagent row remained in the real session store while the unpreserved fresh DM row was capped.
  • What was not tested: live Slack/macOS reproduction from the original report was not rerun in this environment.
  • Before evidence (optional but encouraged): source-visible root cause matched the issue report: capping previously only preserved the active writer key, not pending subagent child keys.

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

  • Root cause: enforced session maintenance built its preserve set from only activeSessionKey, while subagent announce/result capture later depended on the child session row still existing.
  • Missing detection / guardrail: no regression covered max-entry capping while subagent delivery cleanup was still pending.
  • Contributing context (if known): subagent keys are synthetic session keys, so generic protected session rules intentionally did not protect them.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/config/sessions/store.pruning.integration.test.ts, src/agents/subagent-registry.test.ts.
  • Scenario the test should lock in: max-entry maintenance honors lifecycle preserve keys, and the subagent registry preserves active plus cleanup-pending child sessions but not cleanup-completed ones.
  • Why this is the smallest reliable guardrail: it proves the generic maintenance seam and the subagent-owned lifecycle key provider without needing a live Slack channel.
  • Existing test that already covers this (if any): none.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

Subagent completion handoffs are less likely to lose successful child output or token stats under enforced session-entry caps.

Diagram (if applicable)

Before:
[subagent finishes] -> [maxEntries capping removes child row] -> [announce reads no output]

After:
[subagent active/cleanup-pending] -> [registry preserves child row] -> [announce can read child output]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux local checkout
  • Runtime/container: Node 24, pnpm 11.1.0
  • Model/provider: N/A
  • Integration/channel (if any): subagent/session maintenance seam; no live Slack rerun
  • Relevant config (redacted): session.maintenance.mode = "enforce", low maxEntries in regression test

Steps

  1. Register/seed lifecycle preserve keys for active and cleanup-pending subagent rows.
  2. Run enforced session maintenance with maxEntries below the preserved row count.
  3. Verify preserved child rows remain available and cleanup-completed rows are not preserved.

Expected

  • Active and cleanup-pending subagent child session rows survive entry-count maintenance.

Actual

  • Active and cleanup-pending subagent child session rows survive entry-count maintenance.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: real production-module session-store cap proof; focused regression tests; targeted formatting; production core typecheck.
  • Edge cases checked: cleanup-completed subagent rows are not preserved; provider-preserved rows may exceed the cap when all candidates are protected.
  • What you did not verify: live Slack/macOS reproduction.

Additional validation notes:

  • pnpm exec oxfmt --check --threads=1 src/config/sessions/store-maintenance-runtime.ts src/config/sessions/store.ts src/agents/subagent-registry.ts src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts passed.
  • Real production-module terminal proof above passed.
  • pnpm test src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts passed after rebasing onto latest upstream/main.
  • pnpm tsgo:core passed before the rebase.
  • pnpm lint:core is currently blocked by unrelated existing lint errors in src/gateway/sessions-patch.ts, src/plugins/registry.ts, and src/plugins/registry.runtime-config.test.ts.
  • pnpm tsgo:core:test is currently blocked by unrelated existing type errors in src/plugins/registry.runtime-config.test.ts.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: preserving lifecycle-critical subagent rows can temporarily exceed maxEntries.
    • Mitigation: preservation ends after subagent cleanup completes, and only active or cleanup-pending child keys are contributed by the subagent registry.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/subagent-registry.test.ts (modified, +47/-1)
  • src/agents/subagent-registry.ts (modified, +33/-0)
  • src/config/sessions/store-maintenance-runtime.ts (modified, +41/-0)
  • src/config/sessions/store.pruning.integration.test.ts (modified, +22/-0)
  • src/config/sessions/store.ts (modified, +7/-4)

PR #81498: fix: preserve pending subagent sessions during maintenance

Description (problem / solution / changelog)

Summary

  • Problem: session.maintenance entry capping could evict a subagent child session while its completion result was still needed for final announce delivery.
  • Why it matters: completed subagents could lose their transcript/session row before frozenResultText or pending final delivery state was consumed, causing parent delivery to fall back to empty output.
  • What changed: added a session-maintenance preserve-key provider and registered active/pending subagent child sessions with both write-time and load-time maintenance.
  • What did NOT change (scope boundary): no channel-specific delivery behavior, Slack API behavior, or session cap defaults changed.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #81492
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: enforced session-store maintenance keeps an active/pending subagent child session while removing an older unprotected row.
  • Real environment tested: local OpenClaw checkout at /home/ubuntu/codes/ai-hpc/openclaw, built CLI OpenClaw 2026.5.12-beta.1 (0d8e8be), Linux, Node 22.
  • Exact steps or command run after this patch: node --import tsx --input-type=module -e <production maintenance probe using saveSessionStore/loadSessionStore and registerSessionMaintenancePreserveKeysProvider>
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): console output copied from the real local run:
[sessions/store] capped session entry count
OpenClaw real maintenance probe
storePath=/tmp/openclaw-real-maintenance-eJkQlX/sessions.json
keys=agent:main:subagent:pending-real-proof,recent-a
protectedChildPresent=true
oldRemovablePresent=false
entryCount=2
  • Observed result after fix: the protected subagent child key remained in the real session store, the old removable entry was removed, and the store was capped to two entries.
  • What was not tested: live Slack bot interaction; this bug is in backend session maintenance and subagent lifecycle coordination, independent of Slack transport.
  • Before evidence (optional but encouraged): the new regression cases model the previous eviction window where a pending subagent child key was not part of maintenance preservation.

Root Cause (if applicable)

  • Root cause: session maintenance only preserved the currently active session key and durable external conversation keys, so synthetic subagent child sessions remained removable during maxEntries capping even while the subagent registry still needed them.
  • Missing detection / guardrail: no coverage existed for capped maintenance with an active or pending-delivery subagent child session.
  • Contributing context (if known): subagent completion and announce delivery are asynchronous relative to session-store maintenance.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/config/sessions/store.pruning.test.ts, src/config/sessions/store.pruning.integration.test.ts, src/agents/subagent-registry.test.ts
  • Scenario the test should lock in: pending runtime-protected subagent child sessions survive entry-count capping on save and explicit load maintenance.
  • Why this is the smallest reliable guardrail: the bug is in maintenance selection, not in a specific channel adapter.
  • Existing test that already covers this (if any): none found.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Pending subagent completion sessions are retained until their completion is announced or cleanup finishes, even if this temporarily reduces how aggressively session maintenance can cap entries.

Diagram (if applicable)

Before:
[subagent completes] -> [maintenance caps store] -> [child session evicted] -> [announce lacks result]

After:
[subagent active/pending] -> [registry exposes preserve key] -> [maintenance skips child session] -> [announce can use result]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux
  • Runtime/container: Node 22 local checkout
  • Model/provider: N/A
  • Integration/channel (if any): N/A; backend session maintenance path
  • Relevant config (redacted): session.maintenance.mode: "enforce", saturated maxEntries

Steps

  1. Seed a real temporary session store above maxEntries.
  2. Register a runtime maintenance preserve-key provider for a subagent child session key.
  3. Run production saveSessionStore with enforce maintenance and reload with production loadSessionStore.

Expected

  • The pending subagent child session is preserved and removable older entries are capped first.

Actual

  • The pending subagent child session was preserved; the old removable entry was capped.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Manual real-run output is included in the Real behavior proof section. Supplemental automated verification:

pnpm vitest run src/config/sessions/disk-budget.test.ts src/config/sessions/store.pruning.test.ts src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts

Result: Test Files 5 passed (5); Tests 84 passed (84).

After the lint assertion update:

pnpm vitest run src/agents/subagent-registry.test.ts

Result: Test Files 2 passed (2); Tests 34 passed (34).

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: write-time capping, explicit load-time capping, disk-budget entry eviction, disk-aware subagent registry key selection for active and pending-delivery runs, and a real production-module console probe.
  • Edge cases checked: all removable candidates protected can temporarily exceed cap; completed cleanup runs are not preserved.
  • What you did not verify: live Slack bot interaction, because the fix is channel-independent backend lifecycle coordination.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: session stores may temporarily retain more entries when all removable entries are active/pending subagent sessions.
    • Mitigation: preservation only applies until subagent completion delivery/cleanup finishes; normal capping still applies to removable entries.

Changed files

  • src/agents/subagent-registry-maintenance.ts (added, +48/-0)
  • src/agents/subagent-registry.test.ts (modified, +70/-0)
  • src/agents/subagent-registry.ts (modified, +4/-0)
  • src/config/sessions/cleanup-service.ts (modified, +5/-0)
  • src/config/sessions/disk-budget.test.ts (modified, +37/-0)
  • src/config/sessions/disk-budget.ts (modified, +4/-0)
  • src/config/sessions/store-load.ts (modified, +10/-2)
  • src/config/sessions/store-maintenance-preserve.ts (added, +43/-0)
  • src/config/sessions/store.pruning.integration.test.ts (modified, +33/-0)
  • src/config/sessions/store.pruning.test.ts (modified, +51/-0)
  • src/config/sessions/store.ts (modified, +4/-3)

PR #81505: fix(sessions): preserve pending subagent rows during maintenance

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

  • Problem: enforced session.maintenance.maxEntries could cap a just-finished subagent session row before the announce/result path read its output.
  • Why it matters: parent sessions could receive a successful completion handoff with (no output), session_id: unknown, and zero token stats even though the child produced a real result.
  • What changed: session maintenance now supports lifecycle-owned preserve-key providers, and the subagent registry registers active plus cleanup-pending child session keys until cleanup is complete.
  • CI cleanup: also carries small latest-main guard fixes needed for the required lint/type/contract shards: plugin registry config-scope test cleanup, Matrix test helper moved under test-support, and release journey fixture lint cleanup.
  • What did NOT change (scope boundary): no archive transcript fallback, delivery retry policy, or session disk-budget policy changes are included here.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #81492
  • Supersedes #81496
  • Related #
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count. Be mindful of private information like IP addresses, API keys, phone numbers, non-public endpoints, or other private details when providing evidence.

  • Behavior or issue addressed: subagent child session rows remain available to announce/result capture while active or cleanup-pending, even when entry-count maintenance is enforcing a saturated cap.
  • Real environment tested: local Linux checkout on Node 24 / pnpm 11.1.0.
  • Exact steps or command run after this patch: pnpm test src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts.
  • Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): real production-module terminal proof, no Vitest and no mocks:
$ tmpdir=$(mktemp -d); PROOF_DIR="$tmpdir" OPENCLAW_SESSION_CACHE_TTL_MS=0 pnpm exec tsx -e '<production saveSessionStore proof>'
[sessions/store] capped session entry count
{
  "keys": [
    "agent:main:subagent:pending-proof"
  ],
  "preservedPendingSubagent": true,
  "cappedFreshDm": true
}
  • Observed result after fix: with enforced maxEntries: 1, the lifecycle-preserved pending subagent row remained in the real session store while the unpreserved fresh DM row was capped.
  • What was not tested: live Slack/macOS reproduction from the original report was not rerun in this environment.
  • Before evidence (optional but encouraged): source-visible root cause matched the issue report: capping previously only preserved the active writer key, not pending subagent child keys.

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

  • Root cause: enforced session maintenance built its preserve set from only activeSessionKey, while subagent announce/result capture later depended on the child session row still existing.
  • Missing detection / guardrail: no regression covered max-entry capping while subagent delivery cleanup was still pending.
  • Contributing context (if known): subagent keys are synthetic session keys, so generic protected session rules intentionally did not protect them.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/config/sessions/store.pruning.integration.test.ts, src/agents/subagent-registry.test.ts.
  • Scenario the test should lock in: max-entry maintenance honors lifecycle preserve keys, and the subagent registry preserves active plus cleanup-pending child sessions but not cleanup-completed ones.
  • Why this is the smallest reliable guardrail: it proves the generic maintenance seam and the subagent-owned lifecycle key provider without needing a live Slack channel.
  • Existing test that already covers this (if any): none.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

Subagent completion handoffs are less likely to lose successful child output or token stats under enforced session-entry caps.

Diagram (if applicable)

Before:
[subagent finishes] -> [maxEntries capping removes child row] -> [announce reads no output]

After:
[subagent active/cleanup-pending] -> [registry preserves child row] -> [announce can read child output]

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux local checkout
  • Runtime/container: Node 24, pnpm 11.1.0
  • Model/provider: N/A
  • Integration/channel (if any): subagent/session maintenance seam; no live Slack rerun
  • Relevant config (redacted): session.maintenance.mode = "enforce", low maxEntries in regression test

Steps

  1. Register/seed lifecycle preserve keys for active and cleanup-pending subagent rows.
  2. Run enforced session maintenance with maxEntries below the preserved row count.
  3. Verify preserved child rows remain available and cleanup-completed rows are not preserved.

Expected

  • Active and cleanup-pending subagent child session rows survive entry-count maintenance.

Actual

  • Active and cleanup-pending subagent child session rows survive entry-count maintenance.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios: real production-module session-store cap proof; focused regression tests; targeted formatting; production core typecheck; architecture, lint, plugin contract, and affected Matrix test checks.
  • Edge cases checked: cleanup-completed subagent rows are not preserved; provider-preserved rows may exceed the cap when all candidates are protected.
  • What you did not verify: live Slack/macOS reproduction.

Additional validation notes:

  • Real production-module terminal proof above passed.
  • pnpm test src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts passed.
  • pnpm tsgo:core passed.
  • pnpm check:architecture passed.
  • pnpm check:test-types passed.
  • pnpm lint --threads=8 passed before the replacement PR branch; pnpm lint:core passed after the latest amend.
  • pnpm test src/plugins/registry.runtime-config.test.ts src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts passed.
  • pnpm test src/gateway/server.node-invoke-approval-bypass.test.ts src/gateway/server.sessions-send.test.ts passed after fixing the latest-main node-pairing test helper.
  • pnpm test:contracts:plugins passed.
  • pnpm test extensions/matrix/src/channel.setup.test.ts extensions/matrix/src/channel.directory.test.ts extensions/matrix/src/onboarding.test.ts extensions/matrix/src/onboarding.resolve.test.ts extensions/matrix/src/matrix/client.test.ts extensions/matrix/src/matrix/accounts.readiness.test.ts extensions/matrix/src/matrix/credentials.test.ts extensions/matrix/src/matrix/client/config.test.ts extensions/matrix/src/matrix/client/storage.test.ts extensions/matrix/src/matrix/monitor/handler.test.ts extensions/matrix/src/matrix/monitor/handler.group-history.test.ts extensions/matrix/src/matrix/monitor/handler.thread-root-media.test.ts extensions/matrix/src/matrix/monitor/handler.body-for-agent.test.ts extensions/matrix/src/matrix/monitor/handler.media-failure.test.ts passed.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: preserving lifecycle-critical subagent rows can temporarily exceed maxEntries.
    • Mitigation: preservation ends after subagent cleanup completes, and only active or cleanup-pending child keys are contributed by the subagent registry.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • extensions/matrix/src/channel.directory.test.ts (modified, +1/-1)
  • extensions/matrix/src/channel.setup.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/accounts.readiness.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/client.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/client/config.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/client/storage.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/credentials.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/monitor/handler.body-for-agent.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/monitor/handler.group-history.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/monitor/handler.media-failure.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/monitor/handler.test.ts (modified, +1/-1)
  • extensions/matrix/src/matrix/monitor/handler.thread-root-media.test.ts (modified, +1/-1)
  • extensions/matrix/src/onboarding.resolve.test.ts (modified, +1/-1)
  • extensions/matrix/src/onboarding.test.ts (modified, +1/-1)
  • extensions/matrix/src/test-support/test-runtime.ts (renamed, +2/-2)
  • scripts/e2e/lib/release-user-journey/clickclack-fixture.mjs (modified, +1/-1)
  • scripts/lib/config-boundary-guard.mjs (modified, +2/-0)
  • src/agents/subagent-registry.test.ts (modified, +47/-1)
  • src/agents/subagent-registry.ts (modified, +33/-0)
  • src/config/sessions/store-maintenance-runtime.ts (modified, +41/-0)
  • src/config/sessions/store.pruning.integration.test.ts (modified, +22/-0)
  • src/config/sessions/store.ts (modified, +7/-4)
  • src/gateway/server.node-invoke-approval-bypass.test.ts (modified, +20/-0)
  • src/gateway/sessions-patch.ts (modified, +1/-1)
  • src/plugins/registry.runtime-config.test.ts (modified, +9/-5)
  • src/plugins/registry.ts (modified, +1/-1)

Code Example

{
  "session": {
    "dmScope": "per-channel-peer",
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 500
    }
  }
}

---

Result:
(no output)

Stats:
runtime 1m21s • tokens 0 (in 0 / out 0)

session_id: unknown

---

.../sessions/<session-id>-topic-<topic>.jsonl.deleted.<timestamp>

---

12:49:57 agent.wait completed
12:49:58 [sessions/store] capped session entry count
12:50:41 subagent_delivery_target fired
12:50:42 announce ran
12:51:15 explicit sessions.delete ran

---

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 500
    }
  }
}

---

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 2000
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

When session.maintenance.mode is set to "enforce" and the session store is over session.maintenance.maxEntries, OpenClaw can evict a just-completed/pending-delivery subagent session before the announce/result-freeze path reads the child output.

This causes the parent to receive a “completed successfully” subagent event with:

  • Result: (no output)
  • Stats: ... tokens 0 (in 0 / out 0)
  • session_id: unknown

even though the child actually ran, produced assistant text, and used nonzero tokens.

Environment

  • OpenClaw: 2026.5.7 (eeef486)
  • Host: macOS / Valhalla
  • Parent surface: Slack
  • Config:
{
  "session": {
    "dmScope": "per-channel-peer",
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 500
    }
  }
}

Raising maxEntries from 500 to 2000 fixed the observed failure.

Observed behavior

A Slack parent spawned a run-mode subagent with cleanup: "delete".

The subagent completed successfully, but the delivered completion event said:

Result:
(no output)

Stats:
runtime 1m21s • tokens 0 (in 0 / out 0)

session_id: unknown

The child transcript later showed the child did produce a normal final assistant reply with sources and nonzero model usage. The transcript only remained as an archived/deleted file:

.../sessions/<session-id>-topic-<topic>.jsonl.deleted.<timestamp>

Gateway logs showed this ordering:

12:49:57 agent.wait completed
12:49:58 [sessions/store] capped session entry count
12:50:41 subagent_delivery_target fired
12:50:42 announce ran
12:51:15 explicit sessions.delete ran

So the session-store cap ran after the child completed but before announce delivery/result capture.

Expected behavior

Session maintenance should not evict active or pending-delivery subagent sessions before their result has been durably frozen or announced.

If the session store is over maxEntries, eviction should be lifecycle-aware:

  • protect active subagents
  • protect completed-but-not-announced subagents
  • protect sessions with pending final delivery
  • only evict sessions that are not needed for runtime delivery/visibility

At minimum, a successful pending-delivery subagent should not be reported as delivered with (no output) and zero tokens merely because the session row was capped.

Actual behavior

With mode: "enforce" and a saturated maxEntries cap, session maintenance can remove the child session entry before announce/result capture. The announce code then cannot resolve the child session key, cannot read chat.history, and falls back to empty output/zero usage.

Why this looks like a bug

The docs say mode: "enforce" applies cleanup and maxEntries bounds sessions.json, so pruning itself is expected.

But subagent lifecycle docs imply completion delivery should prefer latest assistant text and push completion to the requester. That requires the child output to remain readable until delivery is complete.

The bug is not that old session entries are capped. The bug is that the cap is not aware of subagent lifecycle state.

Proposed fix

Implement lifecycle-aware eviction rather than raw count-based eviction for subagent sessions.

Recommended behavior:

  1. Add pending subagent child session keys to the maintenance preserveKeys set.

    • Preserve while run is active.
    • Preserve while expectsCompletionMessage=true and final delivery is pending.
    • Preserve until frozenResultText is non-empty or a terminal failure payload is persisted.
  2. Make result capture fail closed.

    • If child outcome is success and expectsCompletionMessage=true, do not mark delivery as successful with (no output) unless the child explicitly returned ANNOUNCE_SKIP.
    • If output capture fails because the session is missing, mark delivery deferred/capture_failed and retain evidence.
  3. Add archive fallback.

    • If live chat.history(sessionKey) fails, search the archived .jsonl.deleted.* / .jsonl.reset.* transcript for the child session id/key and extract latest assistant text + usage.
    • Use this fallback for announce payload and token stats.
  4. Improve session cap policy.

    • Current cap behavior appears to remove entries once the store exceeds maxEntries.
    • It should behave more like FIFO/LRU only among safe-to-remove sessions, never among active/pending-delivery sessions.
    • If all removable candidates are protected, exceed maxEntries temporarily rather than breaking delivery.
  5. Add instrumentation.

    • Log child task length, child session key, session id, requested/resolved model.
    • Log result-freeze source and captured text length.
    • Log announce selected payload source and length.
    • Log cleanup/cap decisions with reason and whether a subagent was active/pending.

Minimal repro

  1. Configure:
{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 500
    }
  }
}
  1. Create enough sessions to exceed the cap.

  2. From Slack, spawn a run-mode subagent with:

    • cleanup: "delete"
    • expectsCompletionMessage: true
    • a task that produces a normal final assistant reply
  3. Wait for completion.

  4. Observe that when [sessions/store] capped session entry count fires between child completion and announce, the parent may receive a successful completion with (no output) and zero tokens.

  5. Raise maxEntries to 2000 and repeat. The problem disappears because the child session remains in the live store long enough for announce.

Impact

This makes subagent delivery unreliable on busy agents with many Slack threads, cron sessions, heartbeat/system sessions, or frequent subagent runs. It is especially confusing because the runtime reports success while dropping the actual result and token accounting.

Workaround

Raise session.maintenance.maxEntries, for example:

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 2000
    }
  }
}

This avoids the immediate cap pressure but does not fix the lifecycle bug.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Session maintenance should not evict active or pending-delivery subagent sessions before their result has been durably frozen or announced.

If the session store is over maxEntries, eviction should be lifecycle-aware:

  • protect active subagents
  • protect completed-but-not-announced subagents
  • protect sessions with pending final delivery
  • only evict sessions that are not needed for runtime delivery/visibility

At minimum, a successful pending-delivery subagent should not be reported as delivered with (no output) and zero tokens merely because the session row was capped.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix session.maintenance enforce maxEntries can evict pending subagent sessions before announce delivery [3 pull requests, 3 comments, 4 participants]