openclaw - 💡(How to fix) Fix Bug: completionAnnouncedAt is set on queue accept, not after sendAnnounce() delivery (registry reports 'announced' ~26s before parent transcript sees it)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

completionAnnouncedAt is set on queue accept, not after sendAnnounce() has actually committed to the parent transcript. When a parent has messages.queue.mode = "collect" (with a non-zero debounceMs) and a subagent finishes while the parent is mid-turn, the registry / dashboards record "announced" within ~1 s of child end, while the parent transcript doesn't receive the announce for the full debounce window (~25–30 s in practice). Anything that reads completionAnnouncedAt as a delivery signal (status surfaces, "did the user see this?" checks, late-followup suppression) is reading enqueue time, not delivery time.

Error Message

  • There's currently no field that records the failure mode: if a queued announce gets dropped on drain (cap, error, parent run ending), there's no lastAnnounceDeliveryError / lastAnnounceDropReason to read.
  • lastAnnounceDeliveryError — last error string from a failed drain attempt.

Root Cause

  • Status surfaces that gate "should I retry / follow up?" on completionAnnouncedAt fire before the parent has actually seen the result, producing false "delivered" reports and duplicate follow-ups.
  • Debugging "the agent never told me it was done" is harder than it should be — the registry shows "announced" while the user (or parent) saw nothing.
  • There's currently no field that records the failure mode: if a queued announce gets dropped on drain (cap, error, parent run ending), there's no lastAnnounceDeliveryError / lastAnnounceDropReason to read.

Fix Action

Workaround

From a busy-parent spawn, expect up to debounceMs + sink_latency (~30 s with our defaults) before the parent sees the announce. Two options for user-facing results:

  • Spawn the subagent and call sessions_yield() afterward when the parent has no more work to do this turn — direct delivery, no queue.
  • Have the subagent message the user-facing channel (e.g. Signal) directly when the result is user-facing, instead of relying on the announce queue.
RAW_BUFFERClick to expand / collapse

Summary

completionAnnouncedAt is set on queue accept, not after sendAnnounce() has actually committed to the parent transcript. When a parent has messages.queue.mode = "collect" (with a non-zero debounceMs) and a subagent finishes while the parent is mid-turn, the registry / dashboards record "announced" within ~1 s of child end, while the parent transcript doesn't receive the announce for the full debounce window (~25–30 s in practice). Anything that reads completionAnnouncedAt as a delivery signal (status surfaces, "did the user see this?" checks, late-followup suppression) is reading enqueue time, not delivery time.

Repro

  1. Parent config has messages.queue.mode = "collect" and messages.queue.debounceMs ≥ 2500 (the default cluster config in our deployment).
  2. Parent is mid-turn (actively producing assistant output / calling tools).
  3. Spawn a mode: "run" subagent that returns a result the parent (or a downstream consumer) cares about; do not call sessions_yield() after spawn (it would end the parent run).
  4. Subagent finishes — say at T = 0 s.
  5. Observed: registry / dashboards record completionAnnouncedAt ≈ T + 1 s. The actual [Queued announce messages while agent was busy] header doesn't appear in the parent transcript until T + 25–30 s (drains after debounce).
  6. Expected: completionAnnouncedAt reflects the time the announce reached the transcript, not the time the queue accepted it.

The 985 ms vs ~27 s gap is reproducible on every busy-parent spawn — confirmed in the 6c48a482 / bu-profile-cleanup line of investigation today.

Why this matters

  • Status surfaces that gate "should I retry / follow up?" on completionAnnouncedAt fire before the parent has actually seen the result, producing false "delivered" reports and duplicate follow-ups.
  • Debugging "the agent never told me it was done" is harder than it should be — the registry shows "announced" while the user (or parent) saw nothing.
  • There's currently no field that records the failure mode: if a queued announce gets dropped on drain (cap, error, parent run ending), there's no lastAnnounceDeliveryError / lastAnnounceDropReason to read.

Root cause (likely)

In the announce-delivery / completion-queue modules: the timestamp write happens on the enqueue path (queue.accept(announce)), not on the sink commit path (after sendAnnounce() returns transcript evidence). Single field, single moment, but the wrong moment.

Proposed fix

Split the timestamps so each phase has its own:

  • completionEnqueuedAt — set when the queue accepts the announce.
  • completionDeliveredAt — set after sendAnnounce() (or equivalent transcript-commit path) returns evidence the announce reached the parent transcript.
  • completionAnnouncedAt — keep the name for back-compat, but only set after delivery (i.e. alias of completionDeliveredAt). Existing readers that treated it as "delivered" silently become correct.

Plus two reliability fields:

  • lastAnnounceDeliveryError — last error string from a failed drain attempt.
  • lastAnnounceDropReason"queue_cap" / "parent_run_ended" / "sink_unavailable" / "dedupe".

And dedupe queued items by announceId on enqueue so repeated drain attempts (or parallel completion paths) don't double-deliver.

Workaround

From a busy-parent spawn, expect up to debounceMs + sink_latency (~30 s with our defaults) before the parent sees the announce. Two options for user-facing results:

  • Spawn the subagent and call sessions_yield() afterward when the parent has no more work to do this turn — direct delivery, no queue.
  • Have the subagent message the user-facing channel (e.g. Signal) directly when the result is user-facing, instead of relying on the announce queue.

Ruled out

Channel routing, sentinel/yield mechanism, queue caps, expectsCompletionMessage handling — all behaved correctly. The bug is strictly in when the delivered timestamp is written.

Companion / context

  • Same investigation surfaced openclaw/openclaw#82911 (message-tool routing drift) and openclaw/openclaw#82912 (tool-result middleware drops nested toolResult). All three together explain a class of user-visible "agent told the system it replied, but I never saw the reply" failures.

Environment

  • openclaw 2026.5.12
  • Linux 6.12.x, Node v24.15.0
  • Agent runtime: Codex app-server + Claude CLI Opus 4.7
  • Parent config: messages.queue.mode = "collect", messages.queue.debounceMs = 2500

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Bug: completionAnnouncedAt is set on queue accept, not after sendAnnounce() delivery (registry reports 'announced' ~26s before parent transcript sees it)