openclaw - ✅(Solved) Fix Subagent completion announce retry-limit logs hide the underlying delivery error [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84272Fetched 2026-05-20 03:41:56
View on GitHub
Comments
1
Participants
2
Timeline
9
Reactions
1
Author
Timeline (top)
labeled ×6closed ×1commented ×1cross-referenced ×1

In OpenClaw 2026.5.12, failed subagent completion announcements can end with only:

[warn] Subagent announce give up (retry-limit) run=<runId> child=<childSessionKey> requester=<requesterSessionKey> retries=3 endedAgo=<s>

The line does not include the real delivery failure. On systems where the gateway LaunchAgent discards stderr, the more useful per-attempt diagnostic can be lost, leaving no way to tell from gateway.log whether the failure was a gateway timeout, Slack/outbound configuration issue, routed dispatch failure, model failure, missing message-tool delivery, or no visible final reply.

Error Message

Subagent completion direct announce failed for run <historical-run-id>: Error: Outbound not configured for channel: slack Subagent completion direct announce failed for run <historical-run-id>: routed-dispatch-did-not-queue-final

Root Cause

In OpenClaw 2026.5.12, failed subagent completion announcements can end with only:

[warn] Subagent announce give up (retry-limit) run=<runId> child=<childSessionKey> requester=<requesterSessionKey> retries=3 endedAgo=<s>

The line does not include the real delivery failure. On systems where the gateway LaunchAgent discards stderr, the more useful per-attempt diagnostic can be lost, leaving no way to tell from gateway.log whether the failure was a gateway timeout, Slack/outbound configuration issue, routed dispatch failure, model failure, missing message-tool delivery, or no visible final reply.

Fix Action

Fix / Workaround

The line does not include the real delivery failure. On systems where the gateway LaunchAgent discards stderr, the more useful per-attempt diagnostic can be lost, leaving no way to tell from gateway.log whether the failure was a gateway timeout, Slack/outbound configuration issue, routed dispatch failure, model failure, missing message-tool delivery, or no visible final reply.

Subagent completion direct announce failed for run <historical-run-id>: Error: Outbound not configured for channel: slack
Subagent completion direct announce failed for run <historical-run-id>: routed-dispatch-did-not-queue-final

Other historical causes included gateway timeout after ..., model/fallback failures, and routed-dispatch failures.

PR fix notes

PR #84281: Include delivery errors in subagent announce give-up logs

Description (problem / solution / changelog)

Summary

Fixes #84272.

  • Include the saved lastAnnounceDeliveryError in subagent announce retry-limit / expiry give-up warnings.
  • Normalize multiline delivery errors onto one gateway log line so gateway.log remains searchable.
  • Log direct completion announce delivery failures through the normal gateway log path instead of stderr-only output.

Root cause

Subagent announce delivery already formatted and persisted the concrete delivery failure on the run entry, but logAnnounceGiveUp() only emitted run, child, requester, retry count, and ended age. On macOS managed gateways where stderr can be discarded, the final gateway.log warning lost the useful delivery cause.

Real behavior proof

Behavior or issue addressed: Subagent announce retry-limit give-up logs now include the underlying delivery error in the normal gateway log output.

Real environment tested: Local OpenClaw source checkout on macOS using the production logAnnounceGiveUp() implementation and defaultRuntime.log capture.

Exact steps or command run after this patch:

PATH=/Users/andy/.cache/codex-runtimes/codex-primary-runtime/dependencies/node/bin:$PATH node --import tsx --input-type=module --eval 'import { defaultRuntime } from "./src/runtime.ts"; import { logAnnounceGiveUp } from "./src/agents/subagent-registry-helpers.ts"; const originalLog = defaultRuntime.log; const lines = []; defaultRuntime.log = (line) => lines.push(String(line)); const originalNow = Date.now; Date.now = () => 9000; logAnnounceGiveUp({ runId: "run-proof", childSessionKey: "agent:main:subagent:child", requesterSessionKey: "agent:main:main", requesterDisplayKey: "main", task: "finish", cleanup: "keep", createdAt: 1000, startedAt: 2000, endedAt: 4000, announceRetryCount: 3, lastAnnounceDeliveryError: "direct-primary: routed-dispatch-did-not-queue-final\nsteer-fallback: queue_message_failed" }, "retry-limit"); Date.now = originalNow; defaultRuntime.log = originalLog; console.log(JSON.stringify({ loggedLine: lines[0], hasDeliveryError: lines[0]?.includes("deliveryError="), multilineCollapsed: !lines[0]?.includes("\\n") }, null, 2));'

Evidence after fix:

{
  "loggedLine": "[warn] Subagent announce give up (retry-limit) run=run-proof child=agent:main:subagent:child requester=agent:main:main retries=3 endedAgo=5s deliveryError=\"direct-primary: routed-dispatch-did-not-queue-final steer-fallback: queue_message_failed\"",
  "hasDeliveryError": true,
  "multilineCollapsed": true
}

Observed result after fix: The normal gateway log warning includes deliveryError=..., preserves the concrete delivery failure, and collapses multiline errors onto one log line.

What was not tested: No live macOS LaunchAgent was started. The proof exercises the production logging helper directly, and focused tests cover the direct completion announce failure path.

Validation

  • node scripts/run-vitest.mjs src/agents/subagent-registry-lifecycle.test.ts
  • node scripts/run-vitest.mjs src/agents/subagent-announce.test.ts src/agents/subagent-announce-dispatch.test.ts
  • git diff --check

Attribution

If maintainers squash or rework this PR, please preserve author attribution or include:

Co-authored-by: Andy Ye <[email protected]>

Changed files

  • src/agents/subagent-announce.test.ts (modified, +45/-3)
  • src/agents/subagent-announce.ts (modified, +2/-2)
  • src/agents/subagent-registry-helpers.test.ts (modified, +40/-1)
  • src/agents/subagent-registry-helpers.ts (modified, +9/-1)

Code Example

[warn] Subagent announce give up (retry-limit) run=<runId> child=<childSessionKey> requester=<requesterSessionKey> retries=3 endedAgo=<s>

---

2026-05-19T12:32:33.842-04:00 [warn] Subagent announce give up (retry-limit) run=<run-id-a> ... retries=3 endedAgo=22s
2026-05-19T12:51:58.995-04:00 [warn] Subagent announce give up (retry-limit) run=<run-id-b> ... retries=3 endedAgo=22s
2026-05-19T13:03:38.699-04:00 [warn] Subagent announce give up (retry-limit) run=<run-id-c> ... retries=3 endedAgo=10s
2026-05-19T13:20:33.471-04:00 [warn] Subagent announce give up (retry-limit) run=<run-id> ... retries=3 endedAgo=10s

---

Subagent completion direct announce failed for run <historical-run-id>: Error: Outbound not configured for channel: slack
Subagent completion direct announce failed for run <historical-run-id>: routed-dispatch-did-not-queue-final

---

[warn] Subagent completion direct announce failed run=<runId> child=<childSessionKey> requester=<requesterSessionKey> attempt=<n> path=direct error=<delivery.error>
[warn] Subagent announce give up (retry-limit) run=<runId> child=<childSessionKey> requester=<requesterSessionKey> retries=<n> endedAgo=<s> deliveryError=<lastAnnounceDeliveryError>
RAW_BUFFERClick to expand / collapse

Summary

In OpenClaw 2026.5.12, failed subagent completion announcements can end with only:

[warn] Subagent announce give up (retry-limit) run=<runId> child=<childSessionKey> requester=<requesterSessionKey> retries=3 endedAgo=<s>

The line does not include the real delivery failure. On systems where the gateway LaunchAgent discards stderr, the more useful per-attempt diagnostic can be lost, leaving no way to tell from gateway.log whether the failure was a gateway timeout, Slack/outbound configuration issue, routed dispatch failure, model failure, missing message-tool delivery, or no visible final reply.

Environment

  • OpenClaw: 2026.5.12 (f066dd2)
  • Gateway: macOS LaunchAgent
  • Gateway stdout: ~/.openclaw/logs/gateway.log
  • Gateway stderr: /dev/null

Observed

Live gateway.log had retry-limit warnings for several runs:

2026-05-19T12:32:33.842-04:00 [warn] Subagent announce give up (retry-limit) run=<run-id-a> ... retries=3 endedAgo=22s
2026-05-19T12:51:58.995-04:00 [warn] Subagent announce give up (retry-limit) run=<run-id-b> ... retries=3 endedAgo=22s
2026-05-19T13:03:38.699-04:00 [warn] Subagent announce give up (retry-limit) run=<run-id-c> ... retries=3 endedAgo=10s
2026-05-19T13:20:33.471-04:00 [warn] Subagent announce give up (retry-limit) run=<run-id> ... retries=3 endedAgo=10s

Each run had three preceding subagent_delivery_target fired events with expectsCompletionMessage=true, but the retry-limit warning had no delivery error.

Historical gateway.err.log shows the missing diagnostic previously carried the useful cause, for example:

Subagent completion direct announce failed for run <historical-run-id>: Error: Outbound not configured for channel: slack
Subagent completion direct announce failed for run <historical-run-id>: routed-dispatch-did-not-queue-final

Other historical causes included gateway timeout after ..., model/fallback failures, and routed-dispatch failures.

Relevant Code Path

  • subagent-registry-32aElbRE.js: resumeSubagentRun() gives up after 3 retries and calls finalizeResumedAnnounceGiveUp({ reason: "retry-limit" }).
  • subagent-registry-32aElbRE.js: onDeliveryResult can format and persist entry.lastAnnounceDeliveryError.
  • subagent-announce-delivery-DzsdC5tX.js: completion messages use direct-primary delivery first, then queue fallback, and concrete failures are returned in delivery.error.
  • subagent-announce-Cdo94lsz.js: direct announce failures can be logged per attempt.

Expected

Operators should be able to diagnose a failed subagent completion announcement from gateway.log alone, even when stderr is discarded.

Proposed Fix

  • Log per-attempt direct completion announce failures through the normal gateway log path.
  • Persist the formatted failure on the run entry via entry.lastAnnounceDeliveryError or equivalent.
  • Include the last known delivery error in Subagent announce give up (retry-limit) and expiry warnings.
  • Include enough phase context to identify whether the direct attempt, queue fallback, or agent-mediated final delivery failed.

Suggested warning shape:

[warn] Subagent completion direct announce failed run=<runId> child=<childSessionKey> requester=<requesterSessionKey> attempt=<n> path=direct error=<delivery.error>
[warn] Subagent announce give up (retry-limit) run=<runId> child=<childSessionKey> requester=<requesterSessionKey> retries=<n> endedAgo=<s> deliveryError=<lastAnnounceDeliveryError>

This is observability-only. It should not change retry policy, delivery ordering, Slack behavior, cleanup, or hook semantics.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING