openclaw - ✅(Solved) Fix Gateway becomes unstable from bonjour plugin errors, causing Control UI disconnects (1006) [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72920Fetched 2026-04-28 06:30:16
View on GitHub
Comments
2
Participants
2
Timeline
6
Reactions
0
Assignees
Timeline (top)
commented ×2cross-referenced ×2assigned ×1closed ×1

The local gateway appears to restart/crash due to the bonjour plugin, which causes Control UI websocket disconnects (1006) and makes the UI appear unstable.

Root Cause

Possibly related to Control UI surfacing internal/system-level failures too aggressively, but this looks like a separate root cause from the async-completion message leak.

Fix Action

Workaround

Disable the bonjour plugin:

"plugins": {
  "entries": {
    "bonjour": {
      "enabled": false
    }
  }
}

This stopped the issue in my setup.

PR fix notes

PR #73029: fix(bonjour): suppress ciao crash when networkInterfaces() is denied

Description (problem / solution / changelog)

Fixes #72945

Summary

  • Problem: When the gateway runs inside a restricted sandbox that denies os.networkInterfaces() (NemoClaw, Docker-in-Docker, k3s with locked-down policy, etc.), @homebridge/ciao's NetworkManager init throws an unhandled SystemError: A system error occurred: uv_interface_addresses returned Unknown system error 1. The bonjour plugin already classifies and suppresses three other ciao-originated errors (cancellation, interface assertion, netmask assertion), but this failure mode wasn't covered, so Node turns the rejection into a process-level crash ~39ms after the ready log.
  • Why it matters: The gateway logs "ready (6 plugins, 2.0s)" and immediately disappears. Operators see "ready" then nothing — a confusing, unactionable failure mode in environments where discovery.mdns.mode=off is the right answer but isn't auto-detected. The mdns.mode knob is also undocumented (the reporter discovered it by grepping the dist bundle).
  • What changed: Extended classifyCiaoProcessError in extensions/bonjour/src/ciao.ts with a 4th classification kind, interface-enumeration-failure, that matches the libuv UV_INTERFACE_ADDRESSES syscall token in the error message. Updated extensions/bonjour/src/advertiser.ts:handleCiaoProcessError to log a single bonjour: disabling mDNS — networkInterfaces() unavailable warning and suppress the rejection without requesting recovery (recovery would just re-enter the same failing syscall). Added 2 regression tests in extensions/bonjour/src/ciao.test.ts.
  • What did NOT change (scope boundary): The discovery.mdns.mode config knob and its default ("minimal"). The mDNS feature itself in environments where os.networkInterfaces() works. The other 3 existing classifications. The recovery code path for interface/netmask assertions. The @homebridge/ciao dependency or any patch on it. Did not auto-detect sandboxes (option 2 in the issue body) — that's a separate UX choice and out of scope for the crash fix. Did not document mdns.mode (option 3) — that's a docs PR.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #72945
  • Related #72920 (Gateway becomes unstable from bonjour plugin errors — same plugin, different error class)
  • Related #72902 (Bonjour/mdns broadcaster crashes gateway on macOS — same plugin, different trigger)
  • This PR fixes a bug or regression

Root Cause

  • Root cause: @homebridge/ciao's NetworkManager.assumeNetworkInterfaceNames calls os.networkInterfaces() synchronously during init. In restricted sandboxes that syscall returns EPERM-equivalent (Unknown system error 1); Node surfaces it as a SystemError with message containing uv_interface_addresses. The bonjour plugin's existing classifier (extensions/bonjour/src/ciao.ts:classifyCiaoProcessError) only matched three string patterns — cancellation, IPV4 interface assertion, netmask assertion — so this SystemError fell through, and the unhandled rejection killed the process.
  • Missing detection / guardrail: extensions/bonjour/src/ciao.test.ts had a "keeps unrelated rejections visible" test but no positive coverage for sandbox-style failures. The node-linker=hoisted layout and --unhandled-rejections=warn flag don't catch SystemError-class rejections at the runtime level (verified by the reporter on Node v22.22.2), so suppression must happen in the plugin's own classifier.
  • Contributing context (if known): The architectural pattern of "classify ciao errors then decide whether to suppress" already exists in this file — the fix is a one-pattern extension rather than a new mechanism. Recovery is intentionally skipped for this kind because re-running ciao init would hit the same os.networkInterfaces() block and crash again; the right answer is to log once and let the plugin go quiet for the rest of the gateway's lifetime.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/bonjour/src/ciao.test.ts
  • Scenario the test should lock in:
    1. A SystemError whose message contains uv_interface_addresses returned Unknown system error 1 is classified as kind: "interface-enumeration-failure".
    2. The same error wrapped via new Error(..., { cause }) is also detected (the existing collectCiaoProcessErrorCandidates walks cause/reason/errors chains, but the regex match has to land on the inner error).
    3. ignoreCiaoUnhandledRejection returns true for both.
    4. Existing "keeps unrelated rejections visible" test remains green (regression guard against widening the regex).
  • Why this is the smallest reliable guardrail: classifyCiaoProcessError is a pure function with no I/O — unit tests are deterministic. The advertiser-side wiring is a one-line branch in handleCiaoProcessError that returns the same true/false contract; a separate test for the log-line text would lock in formatting that may not be load-bearing.
  • Existing test that already covers this (if any): None — none of the 9 existing tests cover SystemError-class rejections. The closest is "suppresses aggregate ciao assertion rejections" which uses AggregateError-wrapped AssertionErrors (a different shape from SystemError).
  • If no new test is added, why not: N/A — 2 new tests added (direct + wrapped via cause).

User-visible / Behavior Changes

  • In restricted sandboxes where os.networkInterfaces() is denied:
    • Before: gateway logs [gateway] ready (6 plugins, 2.0s), then crashes ~39ms later with an unhandled SystemError.
    • After: gateway stays up. A single warning bonjour: disabling mDNS — networkInterfaces() unavailable in this environment: SystemError: A system error occurred: uv_interface_addresses returned Unknown system error 1 is logged. mDNS does not function (which is the same outcome users get today by setting discovery.mdns.mode=off manually).
  • In normal environments: no change. The new regex only matches the libuv syscall token, which doesn't appear in any healthy ciao error path.

Diagram

Bonjour init in a restricted sandbox

Before:
[bonjour] register process handlers
  -> ciao NetworkManager init
  -> os.networkInterfaces()
  -> SystemError: uv_interface_addresses returned Unknown system error 1
  -> classifyCiaoProcessError() returns null
  -> handleCiaoProcessError returns false (don't suppress)
  -> Node default: process exits on unhandled rejection
  -> CRASH: 39ms after "ready"

After:
[bonjour] register process handlers
  -> ciao NetworkManager init
  -> os.networkInterfaces()
  -> SystemError: uv_interface_addresses returned Unknown system error 1
  -> classifyCiaoProcessError() returns { kind: "interface-enumeration-failure", formatted }
  -> handleCiaoProcessError logs warn, returns true (suppress)
  -> Node leaves the process running
  -> Gateway stays up; mDNS is dormant

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No (this prevents a crash from a denied syscall; it doesn't add a new call)
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux 6.8.0-110-generic (Ubuntu) — and any sandbox/container that locks down os.networkInterfaces() (NemoClaw, Docker-in-Docker, k3s, etc.)
  • Runtime/container: Node 22.14+
  • Model/provider: N/A
  • Integration/channel (if any): bonjour/mDNS sidecar — channel-discovery, not channel I/O
  • Relevant config (redacted): default — no discovery.mdns.mode override needed before the fix to reproduce the crash; just run inside a restricted sandbox

Steps

  1. Start the gateway inside a sandbox that blocks os.networkInterfaces() (per the reporter, NemoClaw via openshell sandbox exec ...).
  2. Watch the gateway log.

Expected

  • Gateway logs [gateway] ready ... and stays up.
  • A single bonjour: disabling mDNS — networkInterfaces() unavailable ... warning appears within the first second of bonjour init.
  • Process does not exit.

Actual (before fix)

  • Gateway logs [gateway] ready (6 plugins, 2.0s), then ~39ms later: [openclaw] Unhandled promise rejection: SystemError: A system error occurred: uv_interface_addresses returned Unknown system error 1. Process exits.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)
$ pnpm test extensions/bonjour
 Test Files  3 passed (3)
      Tests  38 passed (38)   # 36 existing + 2 new (direct + wrapped-via-cause)

pnpm check:changed is green: conflict markers, changelog attributions, typecheck core/core-tests/extensions/extension-tests, lint, runtime import cycles, plus the various pairing/webhook guards all pass.

Human Verification (required)

  • Verified scenarios:
    • Targeted vitest run for extensions/bonjour (38/38 pass locally on Node 22).
    • Full pnpm check:changed gate (all lanes green).
    • Re-read extensions/bonjour/src/advertiser.ts:handleCiaoProcessError to confirm the new branch logs once, returns true, and does not call requestCiaoRecovery?.(...) (recovery would re-enter the same failing syscall).
    • Confirmed by reading extensions/bonjour/src/ciao.ts that collectCiaoProcessErrorCandidates already walks cause / reason / errors[] chains, so wrapping a SystemError inside new Error(..., { cause }) still classifies correctly — covered by the second new test.
  • Edge cases checked:
    • Direct SystemError rejection (most common shape from Node).
    • SystemError wrapped in a generic Error via { cause } (defensive against future ciao-side wrapping).
    • Unrelated rejections (new Error("boom")) still return false from ignoreCiaoUnhandledRejection — existing test still passes after the regex addition.
    • Mixed-case / lowercase form: the regex uses the case-insensitive \bUV_INTERFACE_ADDRESSES\b against the .toUpperCase()'d message, matching the same convention as the other 3 patterns in the file.
  • What you did not verify:
    • Live reproduction inside a NemoClaw sandbox or a deliberately-restricted Docker container. I do not have access to a sandbox that denies os.networkInterfaces(). The unit test reconstructs the exact SystemError.name + .message shape Node produces (per the reporter's log), and the advertiser branch is straight-line; verifying that the existing classifier→suppression hookup works is what the existing 36 tests already cover.
    • The auto-detect-sandbox approach (option 2 in the issue body). That is a separate UX choice — defaulting discovery.mdns.mode=off based on env detection has policy implications beyond a crash fix and should be its own PR.
    • Documenting discovery.mdns.mode (option 3 in the issue body). That is a docs PR.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

(Both will be checked once review activity lands. Currently no bot review conversations on this PR.)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: a future ciao or libuv release uses uv_interface_addresses in a benign log line and we suppress something that should bubble.
    • Mitigation: the regex requires the exact libuv syscall token as a word boundary in an error message that has already been classified as a process-level rejection. Matching benign log output is not how this code path runs — only unhandledRejection / uncaughtException reach the classifier.
  • Risk: suppressing the failure hides a deeper bug if os.networkInterfaces() ever fails on a normal host.
    • Mitigation: the bonjour: disabling mDNS … warning is logged at WARN level (not DEBUG), so operators see the message in default log configurations. It includes the formatted error so the underlying SystemError is preserved in the log line.
  • Risk: skipping requestCiaoRecovery?.(...) for this kind diverges from the interface/netmask assertion paths.
    • Mitigation: intentional — recovery for those two kinds re-runs ciao init against (presumably) a healed network. For interface enumeration, the failure is a sandbox policy that won't change; re-running would crash again. The branch is commented to make the asymmetry explicit.

Changed files

  • extensions/bonjour/src/advertiser.ts (modified, +7/-0)
  • extensions/bonjour/src/ciao.test.ts (modified, +21/-0)
  • extensions/bonjour/src/ciao.ts (modified, +9/-1)

Code Example

"plugins": {
  "entries": {
    "bonjour": {
      "enabled": false
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

The local gateway appears to restart/crash due to the bonjour plugin, which causes Control UI websocket disconnects (1006) and makes the UI appear unstable.

What happened

In Control UI, the interface disconnected with:

  • disconnected (1006): no reason

At the same time, gateway logs showed restart behavior and errors/warnings around the bonjour plugin, including:

  • bonjour: watchdog detected non-announced service; attempting re-advertise
  • Unhandled promise rejection: CIAO ANNOUNCEMENT CANCELLED

After disabling the bonjour plugin and restarting the gateway, the restarts/disconnects stopped.

Why this is a problem

  • Causes visible Control UI disconnects for local users
  • Makes the app look flaky even though the main failure is inside service discovery / advertisement
  • Can trigger repeated gateway restarts and interrupt active sessions

Expected behavior

If Bonjour/mDNS advertisement fails or is cancelled, the gateway should handle it gracefully without crashing/restarting or dropping the active Control UI connection.

Actual behavior

The gateway appears to restart or become unstable after bonjour advertisement problems, and Control UI gets disconnected with websocket code 1006.

Environment

  • OpenClaw app: 2026.4.24
  • Channel/surface: Control UI / webchat
  • Gateway mode: local
  • OS: macOS (local gateway)

Evidence observed

Symptoms in logs included:

  • repeated gateway restarts around the disconnect window
  • bonjour: watchdog detected non-announced service; attempting re-advertise
  • Unhandled promise rejection: CIAO ANNOUNCEMENT CANCELLED
  • after disabling plugins.entries.bonjour.enabled, gateway came back up stably without the repeated disconnect behavior

Repro idea

I do not have a minimal deterministic repro yet, but one likely path is:

  1. Run local gateway with bonjour enabled
  2. Let the service get into a failed/cancelled advertisement state
  3. Open/use Control UI during or after the re-advertise path
  4. Observe websocket disconnect(s) and possible gateway restart/instability

Workaround

Disable the bonjour plugin:

"plugins": {
  "entries": {
    "bonjour": {
      "enabled": false
    }
  }
}

This stopped the issue in my setup.

Related

Possibly related to Control UI surfacing internal/system-level failures too aggressively, but this looks like a separate root cause from the async-completion message leak.

extent analysis

TL;DR

Disable the bonjour plugin to prevent gateway restarts and Control UI disconnects.

Guidance

  • The bonjour plugin appears to be the root cause of the issue, as disabling it resolves the problem.
  • To verify, enable the bonjour plugin and attempt to reproduce the issue by letting the service get into a failed/cancelled advertisement state.
  • If the issue persists, try to isolate the specific conditions that trigger the bonjour plugin failure, such as network configuration or service discovery issues.
  • Consider implementing a more robust error handling mechanism for the bonjour plugin to prevent gateway restarts and Control UI disconnects.

Example

"plugins": {
  "entries": {
    "bonjour": {
      "enabled": false
    }
  }
}

This configuration disables the bonjour plugin, which may help prevent the issue.

Notes

The exact cause of the bonjour plugin failure is unclear, and further investigation may be needed to determine the root cause. Disabling the plugin may have unintended consequences, such as affecting service discovery or advertisement.

Recommendation

Apply workaround: Disable the bonjour plugin, as it appears to be the most effective solution to prevent gateway restarts and Control UI disconnects. This workaround may need to be revisited if the underlying issue with the bonjour plugin is resolved.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If Bonjour/mDNS advertisement fails or is cancelled, the gateway should handle it gracefully without crashing/restarting or dropping the active Control UI connection.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING