openclaw - 💡(How to fix) Fix [Bug]: bonjour plugin: unhandled rejection on probe cancellation crashes process in 2026.4.24 (worked in 2026.4.23) [1 comments, 2 participants]

openclaw2026-04-26 19:55:21

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72346•Fetched 2026-04-27 05:31:13

View on GitHub

Comments

Participants

Timeline

Reactions

Author

gshockbu

Participants

clawsweeper[bot]

gshockbu

Timeline (top)

labeled ×2closed ×1commented ×1

Error Message

control-flow signal rather than an error. preferred upload path. Bundle includes the rejection error, stack, and

Root Cause

The process then exits (Node default: exit code 1 on unhandled rejection). ECS reports the openclaw container exitCode: 1 and stoppedReason: "Task failed container health checks" (the health check fails downstream because the gateway is no longer answering on :18789 after the process died).

Fix Action

Fix / Workaround

Confirmation that this is a known issue (or an intentional behavior change with a documented disable flag).
An openclaw.json flag to disable the bonjour plugin in environments where mDNS isn't useful — this would let operators avoid the crash loop without needing to patch the image. (plugins.disabled: ["bonjour"] or plugins.bonjour.enabled: false shape both work.)
A fix that catches CIAO PROBING CANCELLED and treats it as a normal control-flow signal rather than an error.

Workaround in use

Code Example

[plugins] bonjour: restarting advertiser (service stuck in probing for 232658ms (gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing))
[plugins] bonjour: advertised gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=unannounced
[openclaw] Unhandled promise rejection: CIAO PROBING CANCELLED
[openclaw] wrote stability bundle: /home/node/.openclaw/logs/stability/openclaw-stability-2026-04-26T18-47-42-467Z-35-unhandled_rejection.json

---

[bonjour] watchdog detected non-announced service; attempting re-advertise (gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing)
[bonjour] restarting advertiser (service stuck in probing for 45143ms (gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing))
[bonjour] watchdog detected non-announced service; attempting re-advertise (gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing)

---

RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

TL;DR

Upgrading from 2026.4.23-arm64 to 2026.4.24-arm64 introduced an unhandled promise rejection from the bonjour plugin's watchdog-triggered advertiser restart. The rejection takes the openclaw process down. On environments where mDNS cannot resolve (e.g. AWS ECS Fargate — no L2 broadcast/multicast), bonjour stays in probing indefinitely, the watchdog fires every ~3-4 minutes, and the resulting cancellation crashes the gateway every cycle.

2026.4.23-arm64 exhibits the same "stuck in probing" condition but recovers cleanly without crashing.

Environment

OpenClaw versions: 2026.4.24-arm64 (broken), 2026.4.23-arm64 (working)
Platform: AWS ECS Fargate, linux/arm64, Debian-based image as pulled from ghcr.io/openclaw/openclaw
Fargate task networking: awsvpc mode (each task has its own ENI in a VPC subnet — no multicast/broadcast support, so mDNS probes never receive responses)
Plugins enabled: acpx, bonjour, browser, device-pair, msteams, phone-control, talk-voice
Node: whatever the upstream image ships with (haven't pinned)

Reproduction

Deploy 2026.4.24-arm64 in any environment without mDNS reachability (Fargate is a clean repro; any container without L2 broadcast support should also reproduce).
Wait 4–10 minutes.
Process exits with Unhandled promise rejection: CIAO PROBING CANCELLED.
ECS task is replaced by the orchestrator. Cycle repeats indefinitely.

Reproduces 100% of the time across 13 independent tasks (a fleet of clients). Each task lasts 9-11 minutes before the bonjour watchdog cancellation kills it.

Expected behavior

2026.4.23-arm64's behavior: bonjour watchdog periodically logs stuck in probing for Xms and restarts the advertiser. Cancellation during restart is caught and handled. Process stays alive. mDNS doesn't work in this environment but that's acceptable — every other plugin and the gateway itself function normally.

Actual behavior in 2026.4.24

[plugins] bonjour: restarting advertiser (service stuck in probing for 232658ms (gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing))
[plugins] bonjour: advertised gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=unannounced
[openclaw] Unhandled promise rejection: CIAO PROBING CANCELLED
[openclaw] wrote stability bundle: /home/node/.openclaw/logs/stability/openclaw-stability-2026-04-26T18-47-42-467Z-35-unhandled_rejection.json

Working behavior in 2026.4.23 (same task, same environment)

[bonjour] watchdog detected non-announced service; attempting re-advertise (gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing)
[bonjour] restarting advertiser (service stuck in probing for 45143ms (gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing))
[bonjour] watchdog detected non-announced service; attempting re-advertise (gateway fqdn=ip-X-X-X-X.ec2.internal (OpenClaw)._openclaw-gw._tcp.local. host=openclaw.local. port=18789 state=probing)

…then the process keeps running. No stability bundle written. No exit.

Diff signals between the two versions

Log prefix changed: [bonjour] (2026.4.23) → [plugins] bonjour: (2026.4.24). Suggests the plugin moved or was wrapped through a different loader path between releases.
Watchdog stuck-time threshold appears different: ~45s observed in 2026.4.23 vs ~210-265s observed in 2026.4.24 before the watchdog fires the cancellation. (Unsure if this is a config change or just observation noise.)
The substantive behavior change: 2026.4.24 surfaces the CIAO PROBING CANCELLED cancellation (from the ciao mDNS library) through to the Node default unhandled-rejection handler. 2026.4.23 either catches it or routes it through a handler that swallows it.

Hypothesis

Looks like a missing .catch() (or removed try { await … } catch (err) if (err.message === 'CIAO PROBING CANCELLED') swallow) somewhere in the bonjour plugin's watchdog/restart code path between 2026.4.23 and 2026.4.24.

When ciao cancels an in-flight probe (during the watchdog-triggered advertiser restart), the cancellation rejects the probe promise with CIAO PROBING CANCELLED. In 2026.4.24, that rejection escapes to the top-level unhandled-rejection handler.

What would help

Confirmation that this is a known issue (or an intentional behavior change with a documented disable flag).
An openclaw.json flag to disable the bonjour plugin in environments where mDNS isn't useful — this would let operators avoid the crash loop without needing to patch the image. (plugins.disabled: ["bonjour"] or plugins.bonjour.enabled: false shape both work.)
A fix that catches CIAO PROBING CANCELLED and treats it as a normal control-flow signal rather than an error.

Stability bundles

The image writes a JSON stability bundle on every crash: /home/node/.openclaw/logs/stability/openclaw-stability-<timestamp>-N-unhandled_rejection.json where N is the rejection count.

Happy to attach a sanitized bundle if useful — let me know if there's a preferred upload path. Bundle includes the rejection error, stack, and some runtime context.

Workaround in use

Pinned image to 2026.4.24-arm64 → rolled back to 2026.4.23-arm64 fleet-wide. Stable since the rollback. Will hold there until a fix or disable flag is available.

Logs (sanitized excerpts)

Three independent stopped tasks, all matching pattern:

Task	Lifetime	openclaw container exit code
16d7cee8…	10m 11s	1
64a96602…	10m 8s	1
6a339824…	10m 20s	1

stopCode: ServiceSchedulerInitiated, stoppedReason: "Task failed container health checks" (downstream of the openclaw process exiting).

The stability bundles all categorize as unhandled_rejection with the same CIAO PROBING CANCELLED payload.

Steps to reproduce

The container repeatedly restarts post application of the new container. Regressed to prior version to restore service.

Expected behavior

Stable running after load and no openclaw RC =1 failures

Actual behavior

Restarts recursively

OpenClaw version

OpenClaw versions:* 2026.4.24-arm64

Operating system

Linux

Install method

Container pull and loaded via docker in AWS ECS

Model

claude sonnet 4.6

Provider / routing chain

AWS

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

The issue can be fixed by catching the CIAO PROBING CANCELLED rejection in the bonjour plugin's watchdog/restart code path.

Guidance

Review the bonjour plugin's code to identify where the CIAO PROBING CANCELLED rejection is not being caught.
Add a try-catch block to handle the rejection and prevent it from escaping to the top-level unhandled-rejection handler.
Consider adding a configuration flag to disable the bonjour plugin in environments where mDNS is not useful.
Verify that the fix works by deploying the updated image to the affected environment and monitoring for crashes.

Example

try {
  // code that triggers the CIAO PROBING CANCELLED rejection
} catch (err) {
  if (err.message === 'CIAO PROBING CANCELLED') {
    // swallow the rejection and continue running
  } else {
    throw err;
  }
}

Notes

The provided information suggests that the issue is specific to the 2026.4.24-arm64 version of OpenClaw and is caused by a change in the bonjour plugin's watchdog/restart code path. The fix should be applied to this specific version to prevent the unhandled rejection from crashing the process.

Recommendation

Apply a workaround by catching the CIAO PROBING CANCELLED rejection in the bonjour plugin's code, as this is a more targeted solution than rolling back to a previous version.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Stable running after load and no openclaw RC =1 failures

#prompt issue #agent setup #task chaining #parallel task #integration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix [Bug]: bonjour plugin: unhandled rejection on probe cancellation crashes process in 2026.4.24 (worked in 2026.4.23) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Workaround in use

Code Example

Bug type

Beta release blocker

Summary

TL;DR

Environment

Reproduction

Expected behavior

Actual behavior in 2026.4.24

Working behavior in 2026.4.23 (same task, same environment)

Diff signals between the two versions

Hypothesis

What would help

Stability bundles

Workaround in use

Logs (sanitized excerpts)

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING