openclaw - 💡(How to fix) Fix Sub-agent notification storm: randomUUID() defeats idempotency dedup + infinite retry loop [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#52149Fetched 2026-04-08 01:15:04
View on GitHub
Comments
2
Participants
3
Timeline
2
Reactions
0
Timeline (top)
commented ×2

Root Cause

Two compounding issues in the sub-agent announce/cleanup flow:

Code Example

Sub-agent completes -> runSubagentAnnounceFlow -> sendAnnounce (new UUID)
-> Parent session busy -> didAnnounce=false -> cleanupHandled=false
-> Next cycle: beginSubagentCleanup fires again -> repeat

Meanwhile: notification arrives -> model responds NO_REPLY -> session turn completes
-> gateway sees session available -> delivers next queued notification -> repeat
RAW_BUFFERClick to expand / collapse

Bug Description

Sub-agent completion notifications can enter an infinite re-delivery loop, consuming hundreds of millions of tokens. We experienced a ~$161 incident (436M cache read tokens over 6.7 hours) from a single sub-agent completion being re-announced ~4,450 times.

Root Cause

Two compounding issues in the sub-agent announce/cleanup flow:

1. sendAnnounce uses crypto.randomUUID() for idempotency key

The gateway has idempotency-key-based deduplication (5-min TTL), but sendAnnounce generates a fresh UUID on every call, making every re-delivery look unique. The dedup mechanism exists but is actively defeated by its caller.

2. finalizeSubagentCleanup retries infinitely

When didAnnounce is false (announce failed because parent session was busy), cleanupHandled is reset to false unconditionally. This allows beginSubagentCleanup to re-trigger runSubagentAnnounceFlow on the next cycle. Combined with fresh UUIDs, this creates an infinite loop.

Feedback Loop

Sub-agent completes -> runSubagentAnnounceFlow -> sendAnnounce (new UUID)
-> Parent session busy -> didAnnounce=false -> cleanupHandled=false
-> Next cycle: beginSubagentCleanup fires again -> repeat

Meanwhile: notification arrives -> model responds NO_REPLY -> session turn completes
-> gateway sees session available -> delivers next queued notification -> repeat

Suggested Fix

  1. Stable idempotency key: Hash announce:${sessionKey}:${childSessionKey} instead of randomUUID()
  2. Retry cap: Limit finalizeSubagentCleanup retries (e.g., 3 attempts)
  3. Persist retry count: Use a persisted field (not underscore-prefixed) so gateway restart doesn't reset counter
  4. Terminal state on give-up: Set cleanupCompletedAt after max retries to prevent zombie entries

Environment

  • OpenClaw version: latest npm (as of 2026-03-22)
  • Trigger: sessions_spawn from cron job (agentTurn, isolated session)
  • Affected files: reply-*.js in dist (sendAnnounce, finalizeSubagentCleanup)

extent analysis

Fix Plan

To address the infinite re-delivery loop, we will implement the following steps:

  • Stable idempotency key: Replace crypto.randomUUID() with a hashed string based on sessionKey and childSessionKey.
  • Retry cap: Introduce a retry limit in finalizeSubagentCleanup.
  • Persist retry count: Store the retry count in a persisted field.
  • Terminal state on give-up: Set cleanupCompletedAt after max retries.

Code Changes

// Calculate stable idempotency key
const idempotencyKey = crypto.createHash('sha256')
  .update(`announce:${sessionKey}:${childSessionKey}`)
  .digest('hex');

// sendAnnounce function
function sendAnnounce(sessionKey, childSessionKey) {
  const idempotencyKey = crypto.createHash('sha256')
    .update(`announce:${sessionKey}:${childSessionKey}`)
    .digest('hex');
  // Use idempotencyKey for deduplication
}

// finalizeSubagentCleanup function with retry cap
let retryCount = 0;
const maxRetries = 3;

function finalizeSubagentCleanup() {
  if (retryCount >= maxRetries) {
    // Set terminal state on give-up
    cleanupCompletedAt = new Date();
    return;
  }
  // Cleanup logic
  if (didAnnounce === false) {
    retryCount++;
    // Retry cleanup
  } else {
    retryCount = 0;
    // Reset retry count on success
  }
}

Verification

To verify the fix, monitor the system for infinite re-delivery loops and check the retry count and cleanupCompletedAt fields. Ensure that the idempotency key is correctly generated and used for deduplication.

Extra Tips

  • Regularly review and update the retry cap value based on system performance and requirements.
  • Consider implementing exponential backoff for retries to prevent overwhelming the system.
  • Ensure that the persisted retry count field is properly updated and retrieved to maintain accuracy across gateway restarts.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Sub-agent notification storm: randomUUID() defeats idempotency dedup + infinite retry loop [2 comments, 3 participants]