openclaw - 💡(How to fix) Fix Bug: completionAnnouncedAt is set on queue accept, not after sendAnnounce() delivery (registry reports 'announced' ~26s before parent transcript sees it)

StepCodex · 2026-05-17T05:02:28Z

[openclaw] completionAnnouncedAt is set on queue accept , not after sendAnnounce has actually committed to the parent transcript. When a parent has messages.qu… `completionAnnouncedAt` is set on **queue accept**, not after `sendAnnounce()` has actually committed to the parent transcript. When a parent has `messages.queue.mode = "collect"` (with a non-zero `debounceMs`) and a subagent finishes while the parent is mid-turn, the registry / dashboards record "announced" within ~1 s of child end, while the parent transcript doesn't receive the announce for the full debounce window (~25–30 s in practice). Anything that reads `completionAnnouncedAt` as a delivery signal (status surfaces, "did the user see this?" checks, late-followup suppression) is reading enqueue time, not delivery time. ## Workaround From a busy-parent spawn, expect up to `debounceMs + sink_latency` (~30 s with our defaults) before the parent sees the announce. Two options for user-facing results: - Spawn the subagent and call `sessions_yield()` afterward when the parent has no more work to do this turn — direct delivery, no queue. - Have the subagent message the user-facing channel (e.g. Signal) directly when the result is user-facing, instead of relying on the announce queue. ## Summary `completionAnnouncedAt` is set on **queue accept**, not after `sendAnnounce()` has actually committed to the parent transcript. When a parent has `messages.queue.mode = "collect"` (with a non-zero `debounceMs`) and a subagent finishes while the parent is mid-turn, the registry / dashboards record "announced" within ~1 s of child end, while the parent transcript doesn't receive the announce for the full debounce window (~25–30 s in practice). Anything that reads `completionAnnouncedAt` as a delivery signal (status surfaces, "did the user see this?" checks, late-followup suppression) is reading enqueue time, not delivery time. ## Repro 1. Parent config has `messages.queue.mode = "collect"` and `messages.queue.debounceMs ≥ 2500` (the default cluster config in our deployment). 2. Parent is mid-turn (actively producing assistant output / calling tools). 3. Spawn a `mode: "run"` subagent that returns a result the parent (or a downstream consumer) cares about; do **not** call `sessions_yield()` after spawn (it would end the parent run). 4. Subagent finishes — say at `T = 0 s`. 5. Observed: registry / dashboards record `completionAnnouncedAt ≈ T + 1 s`. The actual `[Queued announce messages while agent was busy]` header doesn't appear in the parent transcript until `T + 25–30 s` (drains after debounce). 6. Expected: `completionAnnouncedAt` reflects the time the announce **reached the transcript**, not the time the queue accepted it. The 985 ms vs ~27 s gap is reproducible on every busy-parent spawn — confirmed in the 6c48a482 / bu-profile-cleanup line of investigation today. ## Why this matters - Status surfaces that gate "should I retry / follow up?" on `completionAnnouncedAt` fire before the parent has actually seen the result, producing false "delivered" reports and duplicate follow-ups. - Debugging "the agent never told me it was done" is harder than it should be — the registry shows "announced" while the user (or parent) saw nothing. - There's currently no field that records the **failure** mode: if a queued announce gets dropped on drain (cap, error, parent run ending), there's no `lastAnnounceDeliveryError` / `lastAnnounceDropReason` to read. ## Root cause (likely) In the announce-delivery / completion-queue modules: the timestamp write happens on the enqueue path (`queue.accept(announce)`), not on the sink commit path (after `sendAnnounce()` returns transcript evidence). Single field, single moment, but the wrong moment. ## Proposed fix Split the timestamps so each phase has its own: - `completionEnqueuedAt` — set when the queue accepts the announce. - `completionDeliveredAt` — set after `sendAnnounce()` (or equivalent transcript-commit path) returns evidence the announce reached the parent transcript. - `completionAnnouncedAt` — keep the name for back-compat, but **only** set after delivery (i.e. alias of `completionDeliveredAt`). Existing readers that treated it as "delivered" silently become correct. Plus two reliability fields: - `lastAnnounceDeliveryError` — last error string from a failed drain attempt. - `lastAnnounceDropReason` — `"queue_cap"` / `"parent_run_ended"` / `"sink_unavailable"` / `"dedupe"`. And dedupe queued items by `announceId` on enqueue so repeated drain attempts (or parallel completion paths) don't double-deliver. ## Workaround From a busy-parent spawn, expect up to `debounceMs + sink_latency` (~30 s with our defaults) before the parent sees the announce. Two options for user-facing results: - Spawn the subagent and call `sessions_yield()` afterward when the parent has no more work to do this turn — direct delivery, no queue. - Have the subagent message the user-facing channel (e.g. Signal) directly when the result is user-facing, inst

openclaw2026-05-17 05:02:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

completionAnnouncedAt is set on queue accept, not after sendAnnounce() has actually committed to the parent transcript. When a parent has messages.queue.mode = "collect" (with a non-zero debounceMs) and a subagent finishes while the parent is mid-turn, the registry / dashboards record "announced" within ~1 s of child end, while the parent transcript doesn't receive the announce for the full debounce window (~25–30 s in practice). Anything that reads completionAnnouncedAt as a delivery signal (status surfaces, "did the user see this?" checks, late-followup suppression) is reading enqueue time, not delivery time.

Error Message

There's currently no field that records the failure mode: if a queued announce gets dropped on drain (cap, error, parent run ending), there's no lastAnnounceDeliveryError / lastAnnounceDropReason to read.
lastAnnounceDeliveryError — last error string from a failed drain attempt.

Root Cause

Status surfaces that gate "should I retry / follow up?" on completionAnnouncedAt fire before the parent has actually seen the result, producing false "delivered" reports and duplicate follow-ups.
Debugging "the agent never told me it was done" is harder than it should be — the registry shows "announced" while the user (or parent) saw nothing.
There's currently no field that records the failure mode: if a queued announce gets dropped on drain (cap, error, parent run ending), there's no lastAnnounceDeliveryError / lastAnnounceDropReason to read.

Fix Action

Workaround

From a busy-parent spawn, expect up to debounceMs + sink_latency (~30 s with our defaults) before the parent sees the announce. Two options for user-facing results:

Spawn the subagent and call sessions_yield() afterward when the parent has no more work to do this turn — direct delivery, no queue.
Have the subagent message the user-facing channel (e.g. Signal) directly when the result is user-facing, instead of relying on the announce queue.

RAW_BUFFERClick to expand / collapse

Summary

Repro

Parent config has messages.queue.mode = "collect" and messages.queue.debounceMs ≥ 2500 (the default cluster config in our deployment).
Parent is mid-turn (actively producing assistant output / calling tools).
Spawn a mode: "run" subagent that returns a result the parent (or a downstream consumer) cares about; do not call sessions_yield() after spawn (it would end the parent run).
Subagent finishes — say at T = 0 s.
Observed: registry / dashboards record completionAnnouncedAt ≈ T + 1 s. The actual [Queued announce messages while agent was busy] header doesn't appear in the parent transcript until T + 25–30 s (drains after debounce).
Expected: completionAnnouncedAt reflects the time the announce reached the transcript, not the time the queue accepted it.

The 985 ms vs ~27 s gap is reproducible on every busy-parent spawn — confirmed in the 6c48a482 / bu-profile-cleanup line of investigation today.

Why this matters

Status surfaces that gate "should I retry / follow up?" on completionAnnouncedAt fire before the parent has actually seen the result, producing false "delivered" reports and duplicate follow-ups.
Debugging "the agent never told me it was done" is harder than it should be — the registry shows "announced" while the user (or parent) saw nothing.
There's currently no field that records the failure mode: if a queued announce gets dropped on drain (cap, error, parent run ending), there's no lastAnnounceDeliveryError / lastAnnounceDropReason to read.

Root cause (likely)

In the announce-delivery / completion-queue modules: the timestamp write happens on the enqueue path (queue.accept(announce)), not on the sink commit path (after sendAnnounce() returns transcript evidence). Single field, single moment, but the wrong moment.

Proposed fix

Split the timestamps so each phase has its own:

completionEnqueuedAt — set when the queue accepts the announce.
completionDeliveredAt — set after sendAnnounce() (or equivalent transcript-commit path) returns evidence the announce reached the parent transcript.
completionAnnouncedAt — keep the name for back-compat, but only set after delivery (i.e. alias of completionDeliveredAt). Existing readers that treated it as "delivered" silently become correct.

Plus two reliability fields:

lastAnnounceDeliveryError — last error string from a failed drain attempt.
lastAnnounceDropReason — "queue_cap" / "parent_run_ended" / "sink_unavailable" / "dedupe".

And dedupe queued items by announceId on enqueue so repeated drain attempts (or parallel completion paths) don't double-deliver.

Workaround

From a busy-parent spawn, expect up to debounceMs + sink_latency (~30 s with our defaults) before the parent sees the announce. Two options for user-facing results:

Spawn the subagent and call sessions_yield() afterward when the parent has no more work to do this turn — direct delivery, no queue.
Have the subagent message the user-facing channel (e.g. Signal) directly when the result is user-facing, instead of relying on the announce queue.

Ruled out

Channel routing, sentinel/yield mechanism, queue caps, expectsCompletionMessage handling — all behaved correctly. The bug is strictly in when the delivered timestamp is written.

Companion / context

Same investigation surfaced openclaw/openclaw#82911 (message-tool routing drift) and openclaw/openclaw#82912 (tool-result middleware drops nested toolResult). All three together explain a class of user-visible "agent told the system it replied, but I never saw the reply" failures.

Environment

openclaw 2026.5.12
Linux 6.12.x, Node v24.15.0
Agent runtime: Codex app-server + Claude CLI Opus 4.7
Parent config: messages.queue.mode = "collect", messages.queue.debounceMs = 2500

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#memory optimization #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Bug: completionAnnouncedAt is set on queue accept, not after sendAnnounce() delivery (registry reports 'announced' ~26s before parent transcript sees it)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Summary

Repro

Why this matters

Root cause (likely)

Proposed fix

Workaround

Ruled out

Companion / context

Environment

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Bug: completionAnnouncedAt is set on queue accept, not after sendAnnounce() delivery (registry reports 'announced' ~26s before parent transcript sees it)

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Summary

Repro

Why this matters

Root cause (likely)

Proposed fix

Workaround

Ruled out

Companion / context

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING