openclaw - 💡(How to fix) Fix sessions_yield: subagent completion announcement fails silently with no persistent fallback (results lost after 3 retries)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a parent agent uses sessions_yield to suspend and wait for a subagent, and the subagent completes, the "announce completion back to parent" step can silently fail. After 3 rapid retries, the result is permanently lost — there is no persistent fallback.

Error Message

04:51:20 [warn] Subagent completion direct announce failed: completion agent did not deliver through the message tool 04:51:21 [warn] Subagent completion direct announce failed (retry 2) 04:51:23 [warn] Subagent completion direct announce failed (retry 3) 04:51:23 [warn] Subagent announce give up (retry-limit) retries=3 endedAgo=85s

Root Cause

maxAnnounceRetryCount: 3 is hardcoded in resolveDeferredCleanupDecision(). The registry-level retries happen with no delay (each attempt runs the LLM and gets a quick response), so all 3 are exhausted within 3 seconds. There is no persistent outbox — once the in-memory retry limit is hit, the result is gone forever.

Fix Action

Fix / Workaround

04:44:29  feishu dispatch complete (queuedFinal=true)  ← interim message sent OK
04:51:20  [warn] Subagent completion direct announce failed: completion agent did not deliver through the message tool
04:51:21  [warn] Subagent completion direct announce failed (retry 2)
04:51:23  [warn] Subagent completion direct announce failed (retry 3)
04:51:23  [warn] Subagent announce give up (retry-limit) retries=3 endedAgo=85s
04:53:33  memory pressure: level=critical rss=3.72GB threshold=3.22GB
04:57:20  subagent suspended delivery discarded reason=expired

Code Example

04:44:29  feishu dispatch complete (queuedFinal=true)  ← interim message sent OK
04:51:20  [warn] Subagent completion direct announce failed: completion agent did not deliver through the message tool
04:51:21  [warn] Subagent completion direct announce failed (retry 2)
04:51:23  [warn] Subagent completion direct announce failed (retry 3)
04:51:23  [warn] Subagent announce give up (retry-limit) retries=3 endedAgo=85s
04:53:33  memory pressure: level=critical rss=3.72GB threshold=3.22GB
04:57:20  subagent suspended delivery discarded reason=expired
RAW_BUFFERClick to expand / collapse

Summary

When a parent agent uses sessions_yield to suspend and wait for a subagent, and the subagent completes, the "announce completion back to parent" step can silently fail. After 3 rapid retries, the result is permanently lost — there is no persistent fallback.

Reproduction

  1. Parent agent receives a user message in a Feishu group chat
  2. Parent sends interim reply ("I am researching, please wait") and calls sessions_yield
  3. Subagent completes research (~85 seconds later)
  4. Announce mechanism attempts to re-invoke parent agent 3 times in rapid succession (~1-3s apart)
  5. All 3 attempts fail with: completion agent did not deliver through the message tool
  6. Registry hits maxAnnounceRetryCount: 3give up (retry-limit)result permanently discarded

Log Evidence

04:44:29  feishu dispatch complete (queuedFinal=true)  ← interim message sent OK
04:51:20  [warn] Subagent completion direct announce failed: completion agent did not deliver through the message tool
04:51:21  [warn] Subagent completion direct announce failed (retry 2)
04:51:23  [warn] Subagent completion direct announce failed (retry 3)
04:51:23  [warn] Subagent announce give up (retry-limit) retries=3 endedAgo=85s
04:53:33  memory pressure: level=critical rss=3.72GB threshold=3.22GB
04:57:20  subagent suspended delivery discarded reason=expired

Root Cause

maxAnnounceRetryCount: 3 is hardcoded in resolveDeferredCleanupDecision(). The registry-level retries happen with no delay (each attempt runs the LLM and gets a quick response), so all 3 are exhausted within 3 seconds. There is no persistent outbox — once the in-memory retry limit is hit, the result is gone forever.

Expected Behavior (industry standard pattern)

Leading agent frameworks (LangGraph checkpoint, Transactional Outbox) handle this by:

  1. Persist result to disk before announcing — subagent writes completion to a durable store before attempting delivery
  2. Separate delivery worker — a background process polls the outbox and retries independently of the LLM announce flow
  3. Survive failures — if announce fails, result stays in outbox and is retried on next tick; result is never lost

Proposed Fix

Option A (robust): Write subagent completion payload to a persistent outbox (e.g., ~/.openclaw/pending_completions/) before attempting LLM announce. On success: remove from outbox. On failure: leave in outbox, retry on next sweep (e.g., every 60s). On startup: scan outbox for undelivered completions.

Option B (quick): Expose maxAnnounceRetryCount and announceRetryDelayMs as config options under agents.defaults.subagents so users can at least tune retry behavior.

Environment

  • OpenClaw version: 2026.5.19
  • Channel: Feishu group chat
  • Agent model: claude-opus-4-7
  • Platform: macOS darwin 25.4.0

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING