openclaw - ✅(Solved) Fix Gateway crashes 39ms after 'ready' on os.networkInterfaces() failure in restricted sandboxes (NemoClaw, Docker-in-Docker, etc.) [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72945Fetched 2026-04-28 06:29:47
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Participants
Assignees
Timeline (top)
commented ×2cross-referenced ×2assigned ×1closed ×1

In plain English: start the gateway inside any restricted sandbox that doesn't allow os.networkInterfaces() (NemoClaw is one) and the gateway crashes 39ms after ready with an unhandled rejection from the @homebridge/ciao mDNS sidecar. The crash is silent in the sense that the gateway DID just log "ready (6 plugins, 2.0s)" — operators see "ready" and then watch the process disappear. Setting discovery.mdns.mode=off works around it but isn't documented anywhere; we found the config knob by grepping dist/audit-D8YFKksP.js.

Error Message

2026-04-27T06:38:11.821 [gateway] ready (6 plugins, 2.0s) 2026-04-27T06:38:11.860 [openclaw] Unhandled promise rejection: SystemError: A system error occurred: uv_interface_addresses returned Unknown system error 1 at Object.networkInterfaces (node:os:218:16) at Function.assumeNetworkInterfaceNames ( /usr/local/lib/node_modules/openclaw/node_modules/@homebridge/ciao/src/NetworkManager.ts:527:23) at NetworkManager.getCurrentNetworkInterfaces (...)

Root Cause

@homebridge/ciao (the mDNS sidecar OpenClaw uses for channel discovery — Bonjour-style multicast service announce) calls os.networkInterfaces() at NetworkManager initialization. In a NemoClaw sandbox, that syscall is restricted by the sandbox policy and returns EPERM-equivalent (Unknown system error 1). Node turns this into an unhandled rejection on a SystemError, which by default kills the process.

NODE_OPTIONS="--unhandled-rejections=warn" does NOT prevent the crash — the SystemError-class rejection isn't caught by that flag in our test on Node v22.22.2.

Fix Action

Workaround

Set discovery.mdns.mode = "off" in openclaw.json:

{
  "discovery": { "mdns": { "mode": "off" } }
}

After the workaround, gateway uptime is unbounded (we ran it for >5min with zero crashes during validation, then a second 90+ minute run during the SRE-MAS scenario).

The mdns.mode knob isn't documented anywhere — found via grep cfg.discovery?.mdns?.mode /usr/local/lib/node_modules/openclaw/dist/audit-D8YFKksP.js. Default is "minimal"; valid values are off | minimal | full.

PR fix notes

PR #73029: fix(bonjour): suppress ciao crash when networkInterfaces() is denied

Description (problem / solution / changelog)

Fixes #72945

Summary

  • Problem: When the gateway runs inside a restricted sandbox that denies os.networkInterfaces() (NemoClaw, Docker-in-Docker, k3s with locked-down policy, etc.), @homebridge/ciao's NetworkManager init throws an unhandled SystemError: A system error occurred: uv_interface_addresses returned Unknown system error 1. The bonjour plugin already classifies and suppresses three other ciao-originated errors (cancellation, interface assertion, netmask assertion), but this failure mode wasn't covered, so Node turns the rejection into a process-level crash ~39ms after the ready log.
  • Why it matters: The gateway logs "ready (6 plugins, 2.0s)" and immediately disappears. Operators see "ready" then nothing — a confusing, unactionable failure mode in environments where discovery.mdns.mode=off is the right answer but isn't auto-detected. The mdns.mode knob is also undocumented (the reporter discovered it by grepping the dist bundle).
  • What changed: Extended classifyCiaoProcessError in extensions/bonjour/src/ciao.ts with a 4th classification kind, interface-enumeration-failure, that matches the libuv UV_INTERFACE_ADDRESSES syscall token in the error message. Updated extensions/bonjour/src/advertiser.ts:handleCiaoProcessError to log a single bonjour: disabling mDNS — networkInterfaces() unavailable warning and suppress the rejection without requesting recovery (recovery would just re-enter the same failing syscall). Added 2 regression tests in extensions/bonjour/src/ciao.test.ts.
  • What did NOT change (scope boundary): The discovery.mdns.mode config knob and its default ("minimal"). The mDNS feature itself in environments where os.networkInterfaces() works. The other 3 existing classifications. The recovery code path for interface/netmask assertions. The @homebridge/ciao dependency or any patch on it. Did not auto-detect sandboxes (option 2 in the issue body) — that's a separate UX choice and out of scope for the crash fix. Did not document mdns.mode (option 3) — that's a docs PR.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #72945
  • Related #72920 (Gateway becomes unstable from bonjour plugin errors — same plugin, different error class)
  • Related #72902 (Bonjour/mdns broadcaster crashes gateway on macOS — same plugin, different trigger)
  • This PR fixes a bug or regression

Root Cause

  • Root cause: @homebridge/ciao's NetworkManager.assumeNetworkInterfaceNames calls os.networkInterfaces() synchronously during init. In restricted sandboxes that syscall returns EPERM-equivalent (Unknown system error 1); Node surfaces it as a SystemError with message containing uv_interface_addresses. The bonjour plugin's existing classifier (extensions/bonjour/src/ciao.ts:classifyCiaoProcessError) only matched three string patterns — cancellation, IPV4 interface assertion, netmask assertion — so this SystemError fell through, and the unhandled rejection killed the process.
  • Missing detection / guardrail: extensions/bonjour/src/ciao.test.ts had a "keeps unrelated rejections visible" test but no positive coverage for sandbox-style failures. The node-linker=hoisted layout and --unhandled-rejections=warn flag don't catch SystemError-class rejections at the runtime level (verified by the reporter on Node v22.22.2), so suppression must happen in the plugin's own classifier.
  • Contributing context (if known): The architectural pattern of "classify ciao errors then decide whether to suppress" already exists in this file — the fix is a one-pattern extension rather than a new mechanism. Recovery is intentionally skipped for this kind because re-running ciao init would hit the same os.networkInterfaces() block and crash again; the right answer is to log once and let the plugin go quiet for the rest of the gateway's lifetime.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: extensions/bonjour/src/ciao.test.ts
  • Scenario the test should lock in:
    1. A SystemError whose message contains uv_interface_addresses returned Unknown system error 1 is classified as kind: "interface-enumeration-failure".
    2. The same error wrapped via new Error(..., { cause }) is also detected (the existing collectCiaoProcessErrorCandidates walks cause/reason/errors chains, but the regex match has to land on the inner error).
    3. ignoreCiaoUnhandledRejection returns true for both.
    4. Existing "keeps unrelated rejections visible" test remains green (regression guard against widening the regex).
  • Why this is the smallest reliable guardrail: classifyCiaoProcessError is a pure function with no I/O — unit tests are deterministic. The advertiser-side wiring is a one-line branch in handleCiaoProcessError that returns the same true/false contract; a separate test for the log-line text would lock in formatting that may not be load-bearing.
  • Existing test that already covers this (if any): None — none of the 9 existing tests cover SystemError-class rejections. The closest is "suppresses aggregate ciao assertion rejections" which uses AggregateError-wrapped AssertionErrors (a different shape from SystemError).
  • If no new test is added, why not: N/A — 2 new tests added (direct + wrapped via cause).

User-visible / Behavior Changes

  • In restricted sandboxes where os.networkInterfaces() is denied:
    • Before: gateway logs [gateway] ready (6 plugins, 2.0s), then crashes ~39ms later with an unhandled SystemError.
    • After: gateway stays up. A single warning bonjour: disabling mDNS — networkInterfaces() unavailable in this environment: SystemError: A system error occurred: uv_interface_addresses returned Unknown system error 1 is logged. mDNS does not function (which is the same outcome users get today by setting discovery.mdns.mode=off manually).
  • In normal environments: no change. The new regex only matches the libuv syscall token, which doesn't appear in any healthy ciao error path.

Diagram

Bonjour init in a restricted sandbox

Before:
[bonjour] register process handlers
  -> ciao NetworkManager init
  -> os.networkInterfaces()
  -> SystemError: uv_interface_addresses returned Unknown system error 1
  -> classifyCiaoProcessError() returns null
  -> handleCiaoProcessError returns false (don't suppress)
  -> Node default: process exits on unhandled rejection
  -> CRASH: 39ms after "ready"

After:
[bonjour] register process handlers
  -> ciao NetworkManager init
  -> os.networkInterfaces()
  -> SystemError: uv_interface_addresses returned Unknown system error 1
  -> classifyCiaoProcessError() returns { kind: "interface-enumeration-failure", formatted }
  -> handleCiaoProcessError logs warn, returns true (suppress)
  -> Node leaves the process running
  -> Gateway stays up; mDNS is dormant

Security Impact

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No (this prevents a crash from a denied syscall; it doesn't add a new call)
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Linux 6.8.0-110-generic (Ubuntu) — and any sandbox/container that locks down os.networkInterfaces() (NemoClaw, Docker-in-Docker, k3s, etc.)
  • Runtime/container: Node 22.14+
  • Model/provider: N/A
  • Integration/channel (if any): bonjour/mDNS sidecar — channel-discovery, not channel I/O
  • Relevant config (redacted): default — no discovery.mdns.mode override needed before the fix to reproduce the crash; just run inside a restricted sandbox

Steps

  1. Start the gateway inside a sandbox that blocks os.networkInterfaces() (per the reporter, NemoClaw via openshell sandbox exec ...).
  2. Watch the gateway log.

Expected

  • Gateway logs [gateway] ready ... and stays up.
  • A single bonjour: disabling mDNS — networkInterfaces() unavailable ... warning appears within the first second of bonjour init.
  • Process does not exit.

Actual (before fix)

  • Gateway logs [gateway] ready (6 plugins, 2.0s), then ~39ms later: [openclaw] Unhandled promise rejection: SystemError: A system error occurred: uv_interface_addresses returned Unknown system error 1. Process exits.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)
$ pnpm test extensions/bonjour
 Test Files  3 passed (3)
      Tests  38 passed (38)   # 36 existing + 2 new (direct + wrapped-via-cause)

pnpm check:changed is green: conflict markers, changelog attributions, typecheck core/core-tests/extensions/extension-tests, lint, runtime import cycles, plus the various pairing/webhook guards all pass.

Human Verification (required)

  • Verified scenarios:
    • Targeted vitest run for extensions/bonjour (38/38 pass locally on Node 22).
    • Full pnpm check:changed gate (all lanes green).
    • Re-read extensions/bonjour/src/advertiser.ts:handleCiaoProcessError to confirm the new branch logs once, returns true, and does not call requestCiaoRecovery?.(...) (recovery would re-enter the same failing syscall).
    • Confirmed by reading extensions/bonjour/src/ciao.ts that collectCiaoProcessErrorCandidates already walks cause / reason / errors[] chains, so wrapping a SystemError inside new Error(..., { cause }) still classifies correctly — covered by the second new test.
  • Edge cases checked:
    • Direct SystemError rejection (most common shape from Node).
    • SystemError wrapped in a generic Error via { cause } (defensive against future ciao-side wrapping).
    • Unrelated rejections (new Error("boom")) still return false from ignoreCiaoUnhandledRejection — existing test still passes after the regex addition.
    • Mixed-case / lowercase form: the regex uses the case-insensitive \bUV_INTERFACE_ADDRESSES\b against the .toUpperCase()'d message, matching the same convention as the other 3 patterns in the file.
  • What you did not verify:
    • Live reproduction inside a NemoClaw sandbox or a deliberately-restricted Docker container. I do not have access to a sandbox that denies os.networkInterfaces(). The unit test reconstructs the exact SystemError.name + .message shape Node produces (per the reporter's log), and the advertiser branch is straight-line; verifying that the existing classifier→suppression hookup works is what the existing 36 tests already cover.
    • The auto-detect-sandbox approach (option 2 in the issue body). That is a separate UX choice — defaulting discovery.mdns.mode=off based on env detection has policy implications beyond a crash fix and should be its own PR.
    • Documenting discovery.mdns.mode (option 3 in the issue body). That is a docs PR.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

(Both will be checked once review activity lands. Currently no bot review conversations on this PR.)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: a future ciao or libuv release uses uv_interface_addresses in a benign log line and we suppress something that should bubble.
    • Mitigation: the regex requires the exact libuv syscall token as a word boundary in an error message that has already been classified as a process-level rejection. Matching benign log output is not how this code path runs — only unhandledRejection / uncaughtException reach the classifier.
  • Risk: suppressing the failure hides a deeper bug if os.networkInterfaces() ever fails on a normal host.
    • Mitigation: the bonjour: disabling mDNS … warning is logged at WARN level (not DEBUG), so operators see the message in default log configurations. It includes the formatted error so the underlying SystemError is preserved in the log line.
  • Risk: skipping requestCiaoRecovery?.(...) for this kind diverges from the interface/netmask assertion paths.
    • Mitigation: intentional — recovery for those two kinds re-runs ciao init against (presumably) a healed network. For interface enumeration, the failure is a sandbox policy that won't change; re-running would crash again. The branch is commented to make the asymmetry explicit.

Changed files

  • extensions/bonjour/src/advertiser.ts (modified, +7/-0)
  • extensions/bonjour/src/ciao.test.ts (modified, +21/-0)
  • extensions/bonjour/src/ciao.ts (modified, +9/-1)

Code Example

openshell sandbox exec -n nemoclaw-deepobs --timeout 0 -- openclaw gateway > /tmp/gw.log 2>&1 &

---

2026-04-27T06:38:11.821 [gateway] ready (6 plugins, 2.0s)
   2026-04-27T06:38:11.860 [openclaw] Unhandled promise rejection: SystemError: A system error occurred:
       uv_interface_addresses returned Unknown system error 1
       at Object.networkInterfaces (node:os:218:16)
       at Function.assumeNetworkInterfaceNames (
           /usr/local/lib/node_modules/openclaw/node_modules/@homebridge/ciao/src/NetworkManager.ts:527:23)
       at NetworkManager.getCurrentNetworkInterfaces (...)

---

{
  "discovery": { "mdns": { "mode": "off" } }
}

---

let interfaces;
try {
  interfaces = os.networkInterfaces();
} catch (err) {
  logger.warn(`mDNS sidecar disabled: networkInterfaces() unavailable (${err.code ?? err.message})`);
  return; // skip ciao init entirely
}

---

# In a sandbox that blocks os.networkInterfaces():
openclaw gateway 2>&1 | tee gw.log &
sleep 30
ps aux | grep openclaw-gateway | grep -v grep
# Expected: gateway process still alive after 30s.
grep -c "Unhandled promise rejection" gw.log
# Expected: 0 (try/catch path) or 1 followed by clean continue (auto-detect path).
RAW_BUFFERClick to expand / collapse

Summary

In plain English: start the gateway inside any restricted sandbox that doesn't allow os.networkInterfaces() (NemoClaw is one) and the gateway crashes 39ms after ready with an unhandled rejection from the @homebridge/ciao mDNS sidecar. The crash is silent in the sense that the gateway DID just log "ready (6 plugins, 2.0s)" — operators see "ready" and then watch the process disappear. Setting discovery.mdns.mode=off works around it but isn't documented anywhere; we found the config knob by grepping dist/audit-D8YFKksP.js.

Repro

  1. Start the gateway inside a NemoClaw sandbox (k3s pod with policy that blocks os.networkInterfaces()):

    openshell sandbox exec -n nemoclaw-deepobs --timeout 0 -- openclaw gateway > /tmp/gw.log 2>&1 &
  2. Watch the log:

    2026-04-27T06:38:11.821 [gateway] ready (6 plugins, 2.0s)
    2026-04-27T06:38:11.860 [openclaw] Unhandled promise rejection: SystemError: A system error occurred:
        uv_interface_addresses returned Unknown system error 1
        at Object.networkInterfaces (node:os:218:16)
        at Function.assumeNetworkInterfaceNames (
            /usr/local/lib/node_modules/openclaw/node_modules/@homebridge/ciao/src/NetworkManager.ts:527:23)
        at NetworkManager.getCurrentNetworkInterfaces (...)
  3. Process exits ~39ms after the "ready" line.

Root cause

@homebridge/ciao (the mDNS sidecar OpenClaw uses for channel discovery — Bonjour-style multicast service announce) calls os.networkInterfaces() at NetworkManager initialization. In a NemoClaw sandbox, that syscall is restricted by the sandbox policy and returns EPERM-equivalent (Unknown system error 1). Node turns this into an unhandled rejection on a SystemError, which by default kills the process.

NODE_OPTIONS="--unhandled-rejections=warn" does NOT prevent the crash — the SystemError-class rejection isn't caught by that flag in our test on Node v22.22.2.

Workaround

Set discovery.mdns.mode = "off" in openclaw.json:

{
  "discovery": { "mdns": { "mode": "off" } }
}

After the workaround, gateway uptime is unbounded (we ran it for >5min with zero crashes during validation, then a second 90+ minute run during the SRE-MAS scenario).

The mdns.mode knob isn't documented anywhere — found via grep cfg.discovery?.mdns?.mode /usr/local/lib/node_modules/openclaw/dist/audit-D8YFKksP.js. Default is "minimal"; valid values are off | minimal | full.

Suggested fix

Two angles, both easy:

1. Defensive try/catch in ciao initialization (proper fix)

Either upstream in @homebridge/ciao or via OpenClaw's wrapping init code, catch os.networkInterfaces() failures and skip mDNS gracefully:

let interfaces;
try {
  interfaces = os.networkInterfaces();
} catch (err) {
  logger.warn(`mDNS sidecar disabled: networkInterfaces() unavailable (${err.code ?? err.message})`);
  return; // skip ciao init entirely
}

This is the right shape — restricted sandboxes are a legitimate runtime, not an exception case.

2. Auto-detect sandbox env at startup

OpenClaw could detect "I'm in a sandbox" (presence of /sandbox writable, presence of NEMOCLAW_* env vars, etc.) and default discovery.mdns.mode=off automatically. Less robust than a try/catch but eliminates the manual config step for NemoClaw users.

3. (Minimal) Document the mdns.mode knob

If neither of the above lands quickly, even just adding discovery.mdns.mode to the public config schema documentation would help. Currently the config schema doesn't mention it; users have to dig through the compiled JS.

Alternatives considered

  • Catch the unhandled rejection process-wide: would prevent the crash, but mDNS would still be silently broken — better to skip it cleanly than to half-init.
  • Restrict OpenClaw's runtime to non-sandboxed environments: unworkable, NemoClaw is a supported deployment.

Test plan

# In a sandbox that blocks os.networkInterfaces():
openclaw gateway 2>&1 | tee gw.log &
sleep 30
ps aux | grep openclaw-gateway | grep -v grep
# Expected: gateway process still alive after 30s.
grep -c "Unhandled promise rejection" gw.log
# Expected: 0 (try/catch path) or 1 followed by clean continue (auto-detect path).

Risk / blast radius

  • Try/catch around ciao init: zero risk — preserves current behavior on systems where networkInterfaces() works.
  • Auto-detect path: small risk of false-positive sandbox detection (e.g., a non-sandbox env that happens to have /sandbox). Mitigated by making it a soft-default that explicit config overrides.

Open questions for maintainers

  1. Defensive try/catch upstream in @homebridge/ciao, or wrapped at OpenClaw's mDNS-init call site? (We have no signal on @homebridge/ciao's responsiveness as a maintained dep.)
  2. Is the broader mDNS-sidecar path actually useful in any sandboxed deployment? If not, the auto-detect-and-skip path is essentially free.
  3. Default discovery.mdns.mode=off for new sandbox-created gateways via openshell sandbox create? That'd be a NemoClaw-side change — happy to file separately if useful.

Tested-against

  • OpenClaw v2026.4.9
  • NemoClaw v0.0.26 / OpenShell 0.0.36 (sandbox provider)
  • Node v22.22.2

Severity

High for any sandboxed deployment of OpenClaw. The error message is not silent (you do see the unhandled rejection log), but the gateway-just-died-after-ready experience is confusing and the workaround knob is undocumented.

extent analysis

TL;DR

The most likely fix is to add a defensive try/catch block around the os.networkInterfaces() call in the @homebridge/ciao initialization code to handle the EPERM error that occurs in restricted sandboxes.

Guidance

  • Implement a try/catch block in the @homebridge/ciao initialization code to catch os.networkInterfaces() failures and skip mDNS initialization if it fails.
  • Alternatively, OpenClaw could detect if it's running in a sandbox environment and default discovery.mdns.mode to off to prevent the crash.
  • Document the discovery.mdns.mode configuration option to allow users to manually disable mDNS if needed.
  • Test the fix by running OpenClaw in a sandbox environment and verifying that it doesn't crash after 30 seconds.

Example

let interfaces;
try {
  interfaces = os.networkInterfaces();
} catch (err) {
  logger.warn(`mDNS sidecar disabled: networkInterfaces() unavailable (${err.code ?? err.message})`);
  return; // skip ciao init entirely
}

Notes

The fix assumes that the @homebridge/ciao library is actively maintained and will accept patches. If not, the try/catch block could be implemented in OpenClaw's wrapping init code instead.

Recommendation

Apply the workaround by setting discovery.mdns.mode = "off" in openclaw.json until a more permanent fix can be implemented, as it is a simple and effective solution to prevent the crash.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Gateway crashes 39ms after 'ready' on os.networkInterfaces() failure in restricted sandboxes (NemoClaw, Docker-in-Docker, etc.) [1 pull requests, 2 comments, 2 participants]