Session maintenance should not evict active or pending-delivery subagent sessions before their result has been durably frozen or announced. If the session store is over `maxEntries`, eviction should be lifecycle-aware: - protect active subagents - protect completed-but-not-announced subagents - protect sessions with pending final delivery - only evict sessions that are not needed for runtime delivery/visibility At minimum, a successful pending-delivery subagent should not be reported as delivered with `(no output)` and zero tokens merely because the session row was capped.

openclaw - ✅(Solved) Fix session.maintenance enforce maxEntries can evict pending subagent sessions before announce delivery [3 pull requests, 3 comments, 4 participants]

openclaw2026-05-13 17:49:35

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#81492•Fetched 2026-05-14 03:31:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

commented ×3cross-referenced ×3closed ×1labeled ×1

When session.maintenance.mode is set to "enforce" and the session store is over session.maintenance.maxEntries, OpenClaw can evict a just-completed/pending-delivery subagent session before the announce/result-freeze path reads the child output.

This causes the parent to receive a “completed successfully” subagent event with:

Result: (no output)
Stats: ... tokens 0 (in 0 / out 0)
session_id: unknown

even though the child actually ran, produced assistant text, and used nonzero tokens.

Error Message

A Slack parent spawned a run-mode subagent with cleanup: "delete".

Root Cause

At minimum, a successful pending-delivery subagent should not be reported as delivered with (no output) and zero tokens merely because the session row was capped.

Fix Action

Workaround

Raise session.maintenance.maxEntries, for example:

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 2000
    }
  }
}

This avoids the immediate cap pressure but does not fix the lifecycle bug.

PR fix notes

PR #81496: fix(sessions): preserve pending subagent rows during maintenance

Repository: openclaw/openclaw
Author: CaptainTimon
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/81496

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

Problem: enforced session.maintenance.maxEntries could cap a just-finished subagent session row before the announce/result path read its output.
Why it matters: parent sessions could receive a successful completion handoff with (no output), session_id: unknown, and zero token stats even though the child produced a real result.
What changed: session maintenance now supports lifecycle-owned preserve-key providers, and the subagent registry registers active plus cleanup-pending child session keys until cleanup is complete.
What did NOT change (scope boundary): no archive transcript fallback, delivery retry policy, or session disk-budget policy changes are included here.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #81492
Related #
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count. Be mindful of private information like IP addresses, API keys, phone numbers, non-public endpoints, or other private details when providing evidence.

Behavior or issue addressed: subagent child session rows remain available to announce/result capture while active or cleanup-pending, even when entry-count maintenance is enforcing a saturated cap.
Real environment tested: local Linux checkout on Node 24 / pnpm 11.1.0.
Exact steps or command run after this patch: pnpm test src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts.
Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): real production-module terminal proof, no Vitest and no mocks:

$ tmpdir=$(mktemp -d); PROOF_DIR="$tmpdir" OPENCLAW_SESSION_CACHE_TTL_MS=0 pnpm exec tsx -e '<production saveSessionStore proof>'
[sessions/store] capped session entry count
{
  "keys": [
    "agent:main:subagent:pending-proof"
  ],
  "preservedPendingSubagent": true,
  "cappedFreshDm": true
}

Observed result after fix: with enforced maxEntries: 1, the lifecycle-preserved pending subagent row remained in the real session store while the unpreserved fresh DM row was capped.
What was not tested: live Slack/macOS reproduction from the original report was not rerun in this environment.
Before evidence (optional but encouraged): source-visible root cause matched the issue report: capping previously only preserved the active writer key, not pending subagent child keys.

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause: enforced session maintenance built its preserve set from only activeSessionKey, while subagent announce/result capture later depended on the child session row still existing.
Missing detection / guardrail: no regression covered max-entry capping while subagent delivery cleanup was still pending.
Contributing context (if known): subagent keys are synthetic session keys, so generic protected session rules intentionally did not protect them.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/config/sessions/store.pruning.integration.test.ts, src/agents/subagent-registry.test.ts.
Scenario the test should lock in: max-entry maintenance honors lifecycle preserve keys, and the subagent registry preserves active plus cleanup-pending child sessions but not cleanup-completed ones.
Why this is the smallest reliable guardrail: it proves the generic maintenance seam and the subagent-owned lifecycle key provider without needing a live Slack channel.
Existing test that already covers this (if any): none.
If no new test is added, why not: N/A.

User-visible / Behavior Changes

Subagent completion handoffs are less likely to lose successful child output or token stats under enforced session-entry caps.

Diagram (if applicable)

Before:
[subagent finishes] -> [maxEntries capping removes child row] -> [announce reads no output]

After:
[subagent active/cleanup-pending] -> [registry preserves child row] -> [announce can read child output]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Linux local checkout
Runtime/container: Node 24, pnpm 11.1.0
Model/provider: N/A
Integration/channel (if any): subagent/session maintenance seam; no live Slack rerun
Relevant config (redacted): session.maintenance.mode = "enforce", low maxEntries in regression test

Steps

Register/seed lifecycle preserve keys for active and cleanup-pending subagent rows.
Run enforced session maintenance with maxEntries below the preserved row count.
Verify preserved child rows remain available and cleanup-completed rows are not preserved.

Expected

Active and cleanup-pending subagent child session rows survive entry-count maintenance.

Actual

Active and cleanup-pending subagent child session rows survive entry-count maintenance.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: real production-module session-store cap proof; focused regression tests; targeted formatting; production core typecheck.
Edge cases checked: cleanup-completed subagent rows are not preserved; provider-preserved rows may exceed the cap when all candidates are protected.
What you did not verify: live Slack/macOS reproduction.

Additional validation notes:

pnpm exec oxfmt --check --threads=1 src/config/sessions/store-maintenance-runtime.ts src/config/sessions/store.ts src/agents/subagent-registry.ts src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts passed.
Real production-module terminal proof above passed.
pnpm test src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts passed after rebasing onto latest upstream/main.
pnpm tsgo:core passed before the rebase.
pnpm lint:core is currently blocked by unrelated existing lint errors in src/gateway/sessions-patch.ts, src/plugins/registry.ts, and src/plugins/registry.runtime-config.test.ts.
pnpm tsgo:core:test is currently blocked by unrelated existing type errors in src/plugins/registry.runtime-config.test.ts.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: preserving lifecycle-critical subagent rows can temporarily exceed maxEntries.
- Mitigation: preservation ends after subagent cleanup completes, and only active or cleanup-pending child keys are contributed by the subagent registry.

Changed files

CHANGELOG.md (modified, +1/-0)
src/agents/subagent-registry.test.ts (modified, +47/-1)
src/agents/subagent-registry.ts (modified, +33/-0)
src/config/sessions/store-maintenance-runtime.ts (modified, +41/-0)
src/config/sessions/store.pruning.integration.test.ts (modified, +22/-0)
src/config/sessions/store.ts (modified, +7/-4)

PR #81498: fix: preserve pending subagent sessions during maintenance

Repository: openclaw/openclaw
Author: ai-hpc
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/81498

Description (problem / solution / changelog)

Summary

Problem: session.maintenance entry capping could evict a subagent child session while its completion result was still needed for final announce delivery.
Why it matters: completed subagents could lose their transcript/session row before frozenResultText or pending final delivery state was consumed, causing parent delivery to fall back to empty output.
What changed: added a session-maintenance preserve-key provider and registered active/pending subagent child sessions with both write-time and load-time maintenance.
What did NOT change (scope boundary): no channel-specific delivery behavior, Slack API behavior, or session cap defaults changed.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #81492
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

Behavior or issue addressed: enforced session-store maintenance keeps an active/pending subagent child session while removing an older unprotected row.
Real environment tested: local OpenClaw checkout at /home/ubuntu/codes/ai-hpc/openclaw, built CLI OpenClaw 2026.5.12-beta.1 (0d8e8be), Linux, Node 22.
Exact steps or command run after this patch: node --import tsx --input-type=module -e <production maintenance probe using saveSessionStore/loadSessionStore and registerSessionMaintenancePreserveKeysProvider>
Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): console output copied from the real local run:

[sessions/store] capped session entry count
OpenClaw real maintenance probe
storePath=/tmp/openclaw-real-maintenance-eJkQlX/sessions.json
keys=agent:main:subagent:pending-real-proof,recent-a
protectedChildPresent=true
oldRemovablePresent=false
entryCount=2

Observed result after fix: the protected subagent child key remained in the real session store, the old removable entry was removed, and the store was capped to two entries.
What was not tested: live Slack bot interaction; this bug is in backend session maintenance and subagent lifecycle coordination, independent of Slack transport.
Before evidence (optional but encouraged): the new regression cases model the previous eviction window where a pending subagent child key was not part of maintenance preservation.

Root Cause (if applicable)

Root cause: session maintenance only preserved the currently active session key and durable external conversation keys, so synthetic subagent child sessions remained removable during maxEntries capping even while the subagent registry still needed them.
Missing detection / guardrail: no coverage existed for capped maintenance with an active or pending-delivery subagent child session.
Contributing context (if known): subagent completion and announce delivery are asynchronous relative to session-store maintenance.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/config/sessions/store.pruning.test.ts, src/config/sessions/store.pruning.integration.test.ts, src/agents/subagent-registry.test.ts
Scenario the test should lock in: pending runtime-protected subagent child sessions survive entry-count capping on save and explicit load maintenance.
Why this is the smallest reliable guardrail: the bug is in maintenance selection, not in a specific channel adapter.
Existing test that already covers this (if any): none found.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Pending subagent completion sessions are retained until their completion is announced or cleanup finishes, even if this temporarily reduces how aggressively session maintenance can cap entries.

Diagram (if applicable)

Before:
[subagent completes] -> [maintenance caps store] -> [child session evicted] -> [announce lacks result]

After:
[subagent active/pending] -> [registry exposes preserve key] -> [maintenance skips child session] -> [announce can use result]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Linux
Runtime/container: Node 22 local checkout
Model/provider: N/A
Integration/channel (if any): N/A; backend session maintenance path
Relevant config (redacted): session.maintenance.mode: "enforce", saturated maxEntries

Steps

Seed a real temporary session store above maxEntries.
Register a runtime maintenance preserve-key provider for a subagent child session key.
Run production saveSessionStore with enforce maintenance and reload with production loadSessionStore.

Expected

The pending subagent child session is preserved and removable older entries are capped first.

Actual

The pending subagent child session was preserved; the old removable entry was capped.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Manual real-run output is included in the Real behavior proof section. Supplemental automated verification:

pnpm vitest run src/config/sessions/disk-budget.test.ts src/config/sessions/store.pruning.test.ts src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts

Result: Test Files 5 passed (5); Tests 84 passed (84).

After the lint assertion update:

pnpm vitest run src/agents/subagent-registry.test.ts

Result: Test Files 2 passed (2); Tests 34 passed (34).

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: write-time capping, explicit load-time capping, disk-budget entry eviction, disk-aware subagent registry key selection for active and pending-delivery runs, and a real production-module console probe.
Edge cases checked: all removable candidates protected can temporarily exceed cap; completed cleanup runs are not preserved.
What you did not verify: live Slack bot interaction, because the fix is channel-independent backend lifecycle coordination.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: session stores may temporarily retain more entries when all removable entries are active/pending subagent sessions.
- Mitigation: preservation only applies until subagent completion delivery/cleanup finishes; normal capping still applies to removable entries.

Changed files

src/agents/subagent-registry-maintenance.ts (added, +48/-0)
src/agents/subagent-registry.test.ts (modified, +70/-0)
src/agents/subagent-registry.ts (modified, +4/-0)
src/config/sessions/cleanup-service.ts (modified, +5/-0)
src/config/sessions/disk-budget.test.ts (modified, +37/-0)
src/config/sessions/disk-budget.ts (modified, +4/-0)
src/config/sessions/store-load.ts (modified, +10/-2)
src/config/sessions/store-maintenance-preserve.ts (added, +43/-0)
src/config/sessions/store.pruning.integration.test.ts (modified, +33/-0)
src/config/sessions/store.pruning.test.ts (modified, +51/-0)
src/config/sessions/store.ts (modified, +4/-3)

PR #81505: fix(sessions): preserve pending subagent rows during maintenance

Repository: openclaw/openclaw
Author: CaptainTimon
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/81505

Description (problem / solution / changelog)

Summary

Describe the problem and fix in 2–5 bullets:

Problem: enforced session.maintenance.maxEntries could cap a just-finished subagent session row before the announce/result path read its output.
Why it matters: parent sessions could receive a successful completion handoff with (no output), session_id: unknown, and zero token stats even though the child produced a real result.
What changed: session maintenance now supports lifecycle-owned preserve-key providers, and the subagent registry registers active plus cleanup-pending child session keys until cleanup is complete.
CI cleanup: also carries small latest-main guard fixes needed for the required lint/type/contract shards: plugin registry config-scope test cleanup, Matrix test helper moved under test-support, and release journey fixture lint cleanup.
What did NOT change (scope boundary): no archive transcript fallback, delivery retry policy, or session disk-budget policy changes are included here.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #81492
Supersedes #81496
Related #
This PR fixes a bug or regression

Real behavior proof (required for external PRs)

Behavior or issue addressed: subagent child session rows remain available to announce/result capture while active or cleanup-pending, even when entry-count maintenance is enforcing a saturated cap.
Real environment tested: local Linux checkout on Node 24 / pnpm 11.1.0.
Exact steps or command run after this patch: pnpm test src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts.
Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output): real production-module terminal proof, no Vitest and no mocks:

$ tmpdir=$(mktemp -d); PROOF_DIR="$tmpdir" OPENCLAW_SESSION_CACHE_TTL_MS=0 pnpm exec tsx -e '<production saveSessionStore proof>'
[sessions/store] capped session entry count
{
  "keys": [
    "agent:main:subagent:pending-proof"
  ],
  "preservedPendingSubagent": true,
  "cappedFreshDm": true
}

Observed result after fix: with enforced maxEntries: 1, the lifecycle-preserved pending subagent row remained in the real session store while the unpreserved fresh DM row was capped.
What was not tested: live Slack/macOS reproduction from the original report was not rerun in this environment.
Before evidence (optional but encouraged): source-visible root cause matched the issue report: capping previously only preserved the active writer key, not pending subagent child keys.

Root Cause (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause: enforced session maintenance built its preserve set from only activeSessionKey, while subagent announce/result capture later depended on the child session row still existing.
Missing detection / guardrail: no regression covered max-entry capping while subagent delivery cleanup was still pending.
Contributing context (if known): subagent keys are synthetic session keys, so generic protected session rules intentionally did not protect them.

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should catch this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/config/sessions/store.pruning.integration.test.ts, src/agents/subagent-registry.test.ts.
Scenario the test should lock in: max-entry maintenance honors lifecycle preserve keys, and the subagent registry preserves active plus cleanup-pending child sessions but not cleanup-completed ones.
Why this is the smallest reliable guardrail: it proves the generic maintenance seam and the subagent-owned lifecycle key provider without needing a live Slack channel.
Existing test that already covers this (if any): none.
If no new test is added, why not: N/A.

User-visible / Behavior Changes

Subagent completion handoffs are less likely to lose successful child output or token stats under enforced session-entry caps.

Diagram (if applicable)

Before:
[subagent finishes] -> [maxEntries capping removes child row] -> [announce reads no output]

After:
[subagent active/cleanup-pending] -> [registry preserves child row] -> [announce can read child output]

Security Impact (required)

New permissions/capabilities? (Yes/No) No
Secrets/tokens handling changed? (Yes/No) No
New/changed network calls? (Yes/No) No
Command/tool execution surface changed? (Yes/No) No
Data access scope changed? (Yes/No) No
If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

OS: Linux local checkout
Runtime/container: Node 24, pnpm 11.1.0
Model/provider: N/A
Integration/channel (if any): subagent/session maintenance seam; no live Slack rerun
Relevant config (redacted): session.maintenance.mode = "enforce", low maxEntries in regression test

Steps

Register/seed lifecycle preserve keys for active and cleanup-pending subagent rows.
Run enforced session maintenance with maxEntries below the preserved row count.
Verify preserved child rows remain available and cleanup-completed rows are not preserved.

Expected

Active and cleanup-pending subagent child session rows survive entry-count maintenance.

Actual

Active and cleanup-pending subagent child session rows survive entry-count maintenance.

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios: real production-module session-store cap proof; focused regression tests; targeted formatting; production core typecheck; architecture, lint, plugin contract, and affected Matrix test checks.
Edge cases checked: cleanup-completed subagent rows are not preserved; provider-preserved rows may exceed the cap when all candidates are protected.
What you did not verify: live Slack/macOS reproduction.

Additional validation notes:

Real production-module terminal proof above passed.
pnpm test src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts passed.
pnpm tsgo:core passed.
pnpm check:architecture passed.
pnpm check:test-types passed.
pnpm lint --threads=8 passed before the replacement PR branch; pnpm lint:core passed after the latest amend.
pnpm test src/plugins/registry.runtime-config.test.ts src/config/sessions/store.pruning.integration.test.ts src/agents/subagent-registry.test.ts passed.
pnpm test src/gateway/server.node-invoke-approval-bypass.test.ts src/gateway/server.sessions-send.test.ts passed after fixing the latest-main node-pairing test helper.
pnpm test:contracts:plugins passed.
pnpm test extensions/matrix/src/channel.setup.test.ts extensions/matrix/src/channel.directory.test.ts extensions/matrix/src/onboarding.test.ts extensions/matrix/src/onboarding.resolve.test.ts extensions/matrix/src/matrix/client.test.ts extensions/matrix/src/matrix/accounts.readiness.test.ts extensions/matrix/src/matrix/credentials.test.ts extensions/matrix/src/matrix/client/config.test.ts extensions/matrix/src/matrix/client/storage.test.ts extensions/matrix/src/matrix/monitor/handler.test.ts extensions/matrix/src/matrix/monitor/handler.group-history.test.ts extensions/matrix/src/matrix/monitor/handler.thread-root-media.test.ts extensions/matrix/src/matrix/monitor/handler.body-for-agent.test.ts extensions/matrix/src/matrix/monitor/handler.media-failure.test.ts passed.

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No) Yes
Config/env changes? (Yes/No) No
Migration needed? (Yes/No) No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: preserving lifecycle-critical subagent rows can temporarily exceed maxEntries.
- Mitigation: preservation ends after subagent cleanup completes, and only active or cleanup-pending child keys are contributed by the subagent registry.

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/matrix/src/channel.directory.test.ts (modified, +1/-1)
extensions/matrix/src/channel.setup.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/accounts.readiness.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/client.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/client/config.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/client/storage.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/credentials.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/monitor/handler.body-for-agent.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/monitor/handler.group-history.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/monitor/handler.media-failure.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/monitor/handler.test.ts (modified, +1/-1)
extensions/matrix/src/matrix/monitor/handler.thread-root-media.test.ts (modified, +1/-1)
extensions/matrix/src/onboarding.resolve.test.ts (modified, +1/-1)
extensions/matrix/src/onboarding.test.ts (modified, +1/-1)
extensions/matrix/src/test-support/test-runtime.ts (renamed, +2/-2)
scripts/e2e/lib/release-user-journey/clickclack-fixture.mjs (modified, +1/-1)
scripts/lib/config-boundary-guard.mjs (modified, +2/-0)
src/agents/subagent-registry.test.ts (modified, +47/-1)
src/agents/subagent-registry.ts (modified, +33/-0)
src/config/sessions/store-maintenance-runtime.ts (modified, +41/-0)
src/config/sessions/store.pruning.integration.test.ts (modified, +22/-0)
src/config/sessions/store.ts (modified, +7/-4)
src/gateway/server.node-invoke-approval-bypass.test.ts (modified, +20/-0)
src/gateway/sessions-patch.ts (modified, +1/-1)
src/plugins/registry.runtime-config.test.ts (modified, +9/-5)
src/plugins/registry.ts (modified, +1/-1)

Code Example

{
  "session": {
    "dmScope": "per-channel-peer",
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 500
    }
  }
}

---

Result:
(no output)

Stats:
runtime 1m21s • tokens 0 (in 0 / out 0)

session_id: unknown

---

.../sessions/<session-id>-topic-<topic>.jsonl.deleted.<timestamp>

---

12:49:57 agent.wait completed
12:49:58 [sessions/store] capped session entry count
12:50:41 subagent_delivery_target fired
12:50:42 announce ran
12:51:15 explicit sessions.delete ran

---

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 500
    }
  }
}

---

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 2000
    }
  }
}

RAW_BUFFERClick to expand / collapse

Summary

This causes the parent to receive a “completed successfully” subagent event with:

Result: (no output)
Stats: ... tokens 0 (in 0 / out 0)
session_id: unknown

even though the child actually ran, produced assistant text, and used nonzero tokens.

Environment

OpenClaw: 2026.5.7 (eeef486)
Host: macOS / Valhalla
Parent surface: Slack
Config:

{
  "session": {
    "dmScope": "per-channel-peer",
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 500
    }
  }
}

Raising maxEntries from 500 to 2000 fixed the observed failure.

Observed behavior

A Slack parent spawned a run-mode subagent with cleanup: "delete".

The subagent completed successfully, but the delivered completion event said:

Result:
(no output)

Stats:
runtime 1m21s • tokens 0 (in 0 / out 0)

session_id: unknown

The child transcript later showed the child did produce a normal final assistant reply with sources and nonzero model usage. The transcript only remained as an archived/deleted file:

.../sessions/<session-id>-topic-<topic>.jsonl.deleted.<timestamp>

Gateway logs showed this ordering:

12:49:57 agent.wait completed
12:49:58 [sessions/store] capped session entry count
12:50:41 subagent_delivery_target fired
12:50:42 announce ran
12:51:15 explicit sessions.delete ran

So the session-store cap ran after the child completed but before announce delivery/result capture.

Expected behavior

Session maintenance should not evict active or pending-delivery subagent sessions before their result has been durably frozen or announced.

If the session store is over maxEntries, eviction should be lifecycle-aware:

protect active subagents
protect completed-but-not-announced subagents
protect sessions with pending final delivery
only evict sessions that are not needed for runtime delivery/visibility

At minimum, a successful pending-delivery subagent should not be reported as delivered with (no output) and zero tokens merely because the session row was capped.

Actual behavior

With mode: "enforce" and a saturated maxEntries cap, session maintenance can remove the child session entry before announce/result capture. The announce code then cannot resolve the child session key, cannot read chat.history, and falls back to empty output/zero usage.

Why this looks like a bug

The docs say mode: "enforce" applies cleanup and maxEntries bounds sessions.json, so pruning itself is expected.

But subagent lifecycle docs imply completion delivery should prefer latest assistant text and push completion to the requester. That requires the child output to remain readable until delivery is complete.

The bug is not that old session entries are capped. The bug is that the cap is not aware of subagent lifecycle state.

Proposed fix

Implement lifecycle-aware eviction rather than raw count-based eviction for subagent sessions.

Recommended behavior:

Add pending subagent child session keys to the maintenance preserveKeys set.
- Preserve while run is active.
- Preserve while expectsCompletionMessage=true and final delivery is pending.
- Preserve until frozenResultText is non-empty or a terminal failure payload is persisted.
Make result capture fail closed.
- If child outcome is success and expectsCompletionMessage=true, do not mark delivery as successful with (no output) unless the child explicitly returned ANNOUNCE_SKIP.
- If output capture fails because the session is missing, mark delivery deferred/capture_failed and retain evidence.
Add archive fallback.
- If live chat.history(sessionKey) fails, search the archived .jsonl.deleted.* / .jsonl.reset.* transcript for the child session id/key and extract latest assistant text + usage.
- Use this fallback for announce payload and token stats.
Improve session cap policy.
- Current cap behavior appears to remove entries once the store exceeds maxEntries.
- It should behave more like FIFO/LRU only among safe-to-remove sessions, never among active/pending-delivery sessions.
- If all removable candidates are protected, exceed maxEntries temporarily rather than breaking delivery.
Add instrumentation.
- Log child task length, child session key, session id, requested/resolved model.
- Log result-freeze source and captured text length.
- Log announce selected payload source and length.
- Log cleanup/cap decisions with reason and whether a subagent was active/pending.

Minimal repro

Configure:

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 500
    }
  }
}

Create enough sessions to exceed the cap.
From Slack, spawn a run-mode subagent with:
- cleanup: "delete"
- expectsCompletionMessage: true
- a task that produces a normal final assistant reply
Wait for completion.
Observe that when [sessions/store] capped session entry count fires between child completion and announce, the parent may receive a successful completion with (no output) and zero tokens.
Raise maxEntries to 2000 and repeat. The problem disappears because the child session remains in the live store long enough for announce.

Impact

This makes subagent delivery unreliable on busy agents with many Slack threads, cron sessions, heartbeat/system sessions, or frequent subagent runs. It is especially confusing because the runtime reports success while dropping the actual result and token accounting.

Workaround

Raise session.maintenance.maxEntries, for example:

{
  "session": {
    "maintenance": {
      "mode": "enforce",
      "pruneAfter": "30d",
      "maxEntries": 2000
    }
  }
}

This avoids the immediate cap pressure but does not fix the lifecycle bug.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Session maintenance should not evict active or pending-delivery subagent sessions before their result has been durably frozen or announced.

If the session store is over maxEntries, eviction should be lifecycle-aware:

protect active subagents
protect completed-but-not-announced subagents
protect sessions with pending final delivery
only evict sessions that are not needed for runtime delivery/visibility

At minimum, a successful pending-delivery subagent should not be reported as delivered with (no output) and zero tokens merely because the session row was capped.

#api #installation #tensor shape #autograd error #model save/load

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.