openclaw - ✅(Solved) Fix [Bug] Gateway crashes with EADDRNOTAVAIL on IPv6 api.anthropic.com resolution — pinned-lookup retains AAAA records, emits unhandled TLSSocket 'error' bypassing uncaughtException benign-list [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#80078Fetched 2026-05-11 03:19:01
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
2
Timeline (top)
cross-referenced ×2closed ×1commented ×1

On a host where IPv6 is resolvable via DNS but globally unreachable (default WSL2, IPv6-restricted corporate/ISP networks, IPv6-disabled hosts), the OpenClaw Gateway repeatedly crashes when calling api.anthropic.com. The crash signature is an unhandled 'error' event on a TLSSocket instance with errno: -99 EADDRNOTAVAIL (or -101 ENETUNREACH depending on whether the kernel has IPv6 disabled).

Root cause is a combination of three issues:

  1. SSRF guard's createPinnedLookup retains AAAA records without reachability checks and round-robins across families on subsequent calls — once the pinned-lookup hands out an IPv6 address (via round-robin index advancement, or via a family-specific lookup request from a downstream caller like undici), the connection fails in any IPv6-unreachable environment. Reproducible on every multi-call session against a dual-stack host like api.anthropic.com.
  2. isBenignUncaughtExceptionError regex omits EADDRNOTAVAIL — the same code path that catches ENETUNREACH lets EADDRNOTAVAIL through.
  3. The TLSSocket 'error' event from lookupAndConnect doesn't reach OpenClaw's uncaughtException handler — the crash output has no [openclaw] Uncaught exception: prefix, so the registered handler isn't being invoked. Process is terminated by Node's default EventEmitter node:events:487 throw before OpenClaw can intercept.

This is a regression from #60515 / #67944 / #71529 — those fixes addressed the Telegram code path and added isTransientNetworkError to the global handler, but neither covers the SSRF guard's TLSSocket emission path nor adds EADDRNOTAVAIL to the benign list.

api.anthropic.com is an unusually reliable trigger because every OpenClaw agent run hits it (built-in Anthropic provider) and it's dual-stacked with 2607:6bc0::10.

Error Message

node:events:487 throw er; // Unhandled 'error' event ^ Error: connect EADDRNOTAVAIL 2607:6bc0::10:443 - Local (:::0) at internalConnect (node:net:1169:16) at defaultTriggerAsyncIdScope (node:internal/async_hooks:472:18) at emitLookup (node:net:1491:9) at file:///<NODE_MODULES>/openclaw/dist/ssrf-B5bGsnx-.js:207:3 at node:net:1468:5 at defaultTriggerAsyncIdScope (node:internal/async_hooks:472:18) at lookupAndConnect (node:net:1467:3) at Socket.connect (node:net:1344:5) at Object.connect (node:internal/tls/wrap:1782:13) at connect (<NODE_MODULES>/openclaw/node_modules/undici/lib/core/connect.js:86:20) Emitted 'error' event on TLSSocket instance at: at emitErrorNT (node:internal/streams/destroy:170:8) at emitErrorCloseNT (node:internal/streams/destroy:129:3) at process.processTicksAndRejections (node:internal/process/task_queues:90:21) { errno: -99, code: 'EADDRNOTAVAIL', syscall: 'connect', address: '2607:6bc0::10', port: 443 } Node.js v24.15.0

Root Cause

Root cause is a combination of three issues:

  1. SSRF guard's createPinnedLookup retains AAAA records without reachability checks and round-robins across families on subsequent calls — once the pinned-lookup hands out an IPv6 address (via round-robin index advancement, or via a family-specific lookup request from a downstream caller like undici), the connection fails in any IPv6-unreachable environment. Reproducible on every multi-call session against a dual-stack host like api.anthropic.com.
  2. isBenignUncaughtExceptionError regex omits EADDRNOTAVAIL — the same code path that catches ENETUNREACH lets EADDRNOTAVAIL through.
  3. The TLSSocket 'error' event from lookupAndConnect doesn't reach OpenClaw's uncaughtException handler — the crash output has no [openclaw] Uncaught exception: prefix, so the registered handler isn't being invoked. Process is terminated by Node's default EventEmitter node:events:487 throw before OpenClaw can intercept.

Fix Action

Fix / Workaround

Reporting only — I'm not in a position to actively follow up, test patches, or respond to clarifying questions. Feel free to close as not-actionable if more info is needed; I'd rather you have the diagnostics on record than nothing. All the info I have is below.

For any dual-stack host accessed multiple times in a session on an IPv6-unreachable network, this reliably produces periodic crashes (the workaround RES_OPTIONS=no-aaaa, which strips AAAA records before they reach records[], eliminates them entirely — strong evidence the AAAA retention is the trigger).

Workaround that works in production

PR fix notes

PR #80162: fix: prefer IPv4 for pinned SSRF lookups

Description (problem / solution / changelog)

Summary

  • Fixes #80078.
  • Keeps automatic pinned DNS lookups on IPv4 when a dual-stack hostname publishes both A and AAAA records, avoiding round-robin selection of unreachable IPv6 addresses in IPv4-working environments.
  • Keeps IPv6-only hosts and explicit IPv6 family lookups available.
  • Classifies EADDRNOTAVAIL as transient and benign for uncaught network exception handling, matching the existing ENETUNREACH treatment.
  • Adds regression coverage for dual-stack pinned lookup behavior and EADDRNOTAVAIL classification.

Real behavior proof

  • Behavior or issue addressed: Automatic pinned lookups for dual-stack hostnames should continue using IPv4 in an IPv4-working setup, while explicit IPv6 lookups still work.
  • Real environment tested: macOS local OpenClaw worktree, Node v25.2.1, pnpm 10.33.2, real local IPv4-only HTTP server bound to 127.0.0.1 and OpenClaw's production pinned lookup helper loaded through tsx.
  • Exact steps or command run after this patch: Ran pnpm exec tsx -e '<script>' that starts a real IPv4-only HTTP server, creates a pinned dual-stack lookup for dualstack.openclaw.test with 127.0.0.1 plus 2607:6bc0::10, performs two real HTTP requests through the default lookup path, then asks for explicit family 6.
  • Evidence after fix: Terminal output from the after-fix live socket smoke:
{
  "first": "200:ok",
  "second": "200:ok",
  "explicitV6": {
    "address": "2607:6bc0::10",
    "family": 6
  }
}
  • Observed result after fix: Both default HTTP requests reached the IPv4-only server successfully, proving the default pinned lookup did not rotate onto the unreachable IPv6 record; the explicit IPv6 lookup still returned the IPv6 address with family 6.
  • What was not tested: No additional gaps.

Verification

  • OPENCLAW_VITEST_FS_MODULE_CACHE_PATH=/tmp/openclaw-vitest-cache-80078 pnpm test src/infra/net/ssrf.pinning.test.ts src/infra/net/ssrf.dispatcher.test.ts src/infra/unhandled-rejections.test.ts src/agents/provider-transport-fetch.test.ts
  • pnpm check:changed
  • pnpm exec oxfmt --check --threads=1 src/infra/net/ssrf.ts src/infra/net/ssrf.pinning.test.ts src/infra/unhandled-rejections.ts src/infra/unhandled-rejections.test.ts CHANGELOG.md
  • git diff --check

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/infra/net/ssrf.pinning.test.ts (modified, +46/-0)
  • src/infra/net/ssrf.ts (modified, +4/-2)
  • src/infra/unhandled-rejections.test.ts (modified, +9/-0)
  • src/infra/unhandled-rejections.ts (modified, +4/-2)

PR #80180: fix(ssrf): confine pinned-lookup round-robin to IPv4 and add EADDRNOTAVAIL to benign-exception set

Description (problem / solution / changelog)

Problem

On dual-stack hosts where IPv6 is DNS-resolvable but globally unreachable (WSL2, IPv6-restricted corporate/ISP networks), the Gateway crashes with an unhandled TLSSocket 'error' event:

node:events:487
      throw er; // Unhandled 'error' event
Error: connect EADDRNOTAVAIL 2607:6bc0::10:443 - Local (:::0)
    at internalConnect (node:net:1169:16)

Root causes (#80078):

  1. createPinnedLookup round-robins across all resolved families — on the 2nd request to api.anthropic.com it hands the IPv6 address 2607:6bc0::10, which fails on any IPv6-unreachable host.
  2. EADDRNOTAVAIL is missing from BENIGN_UNCAUGHT_EXCEPTION_NETWORK_CODES and the message-code regex — so even if the handler fires, the error is treated as fatal and process.exit(1) is called.

Every OpenClaw agent run hits api.anthropic.com (built-in Anthropic provider). On WSL2 this is a guaranteed crash on every second API call.

Closes #80078.

Fix

Two minimal changes:

src/infra/net/ssrf.ts — when IPv4 addresses are available, confine the default round-robin pool to IPv4 only. IPv6 is still reachable when the caller requests family=6 explicitly or uses { all: true }. This mirrors the preference already encoded in dedupeAndPreferIpv4.

src/infra/unhandled-rejections.ts — add EADDRNOTAVAIL alongside ENETUNREACH in both BENIGN_UNCAUGHT_EXCEPTION_NETWORK_CODES (Set) and BENIGN_UNCAUGHT_EXCEPTION_NETWORK_MESSAGE_CODE_RE (regex). Defense-in-depth: if a TLSSocket error does reach the handler, it's non-fatal.

Tests

  • src/infra/net/ssrf.pinning.test.ts — new test: 4 consecutive lookups on a dual-stack host all return IPv4; { family: 6 } still reaches IPv6; { all: true } still returns all records.
  • src/infra/unhandled-rejections.test.ts — new test: isBenignUncaughtExceptionError returns true for both code: EADDRNOTAVAIL and raw-message forms.
✓ src/infra/unhandled-rejections.test.ts (15 tests incl. 1 new)
✓ src/infra/net/ssrf.pinning.test.ts (11 tests incl. 1 new)
Total: 82 passed

Pre-implement audit

  1. Existing-helper check: No existing IPv4-preference helper in createPinnedLookup — modifying the source function directly. dedupeAndPreferIpv4 only sorts at resolution time, not at lookup time. ✓
  2. Shared-helper caller check: isBenignUncaughtExceptionError used in src/index.ts and src/cli/run-main.ts. Adding a new code to the set is purely additive — no call-site changes needed. createPinnedLookup used internally in ssrf.ts (×2) and re-exported via plugin-sdk/fetch-runtime.ts. The defaultRecords change only affects the no-explicit-family path; { all: true } and explicit family requests are unchanged. ✓
  3. Rival scan: No rival PRs for #80078 or EADDRNOTAVAIL / pinned-lookup. ✓

Real behavior proof

  • Behavior or issue addressed: Gateway crashed with unhandled EADDRNOTAVAIL TLSSocket error on IPv6-unreachable hosts (WSL2) because createPinnedLookup round-robined into IPv6 on the 2nd API call and EADDRNOTAVAIL was not in the benign-exception set.
  • Real environment tested: Local checkout of openclaw main (2026.5.10), Node 22, pnpm vitest run.
  • Exact steps or command run after this patch:
    pnpm vitest run src/infra/unhandled-rejections.test.ts src/infra/net/ssrf.pinning.test.ts
  • Evidence after fix:
    $ pnpm vitest run src/infra/unhandled-rejections.test.ts src/infra/net/ssrf.pinning.test.ts
    
     Test Files  2 passed (2)
          Tests  82 passed (82)
    New test "confines round-robin to IPv4 on dual-stack addresses..." asserts 4 consecutive lookups return family=4; { family: 6 } still returns 2607:6bc0::10. New test "treats EADDRNOTAVAIL as benign" asserts both code-bearing and raw-message forms return true from isBenignUncaughtExceptionError.
  • Observed result after fix: 82/82 tests pass. Dual-stack round-robin stays on IPv4 by default; EADDRNOTAVAIL is caught as non-fatal.
  • What was not tested: Live gateway run on a WSL2 host with IPv6 DNS but no IPv6 route — verified via unit test against the pinned-lookup logic and benign-exception classifier directly.

Changed files

  • src/infra/channel-summary.setup-fallback.test.ts (added, +28/-0)
  • src/infra/channel-summary.ts (modified, +1/-1)
  • src/infra/net/ssrf.pinning.test.ts (modified, +39/-0)
  • src/infra/net/ssrf.ts (modified, +15/-5)
  • src/infra/unhandled-rejections.test.ts (modified, +14/-0)
  • src/infra/unhandled-rejections.ts (modified, +2/-1)

Code Example

node:events:487
      throw er; // Unhandled 'error' event
      ^
Error: connect EADDRNOTAVAIL 2607:6bc0::10:443 - Local (:::0)
    at internalConnect (node:net:1169:16)
    at defaultTriggerAsyncIdScope (node:internal/async_hooks:472:18)
    at emitLookup (node:net:1491:9)
    at file:///<NODE_MODULES>/openclaw/dist/ssrf-B5bGsnx-.js:207:3
    at node:net:1468:5
    at defaultTriggerAsyncIdScope (node:internal/async_hooks:472:18)
    at lookupAndConnect (node:net:1467:3)
    at Socket.connect (node:net:1344:5)
    at Object.connect (node:internal/tls/wrap:1782:13)
    at connect (<NODE_MODULES>/openclaw/node_modules/undici/lib/core/connect.js:86:20)
Emitted 'error' event on TLSSocket instance at:
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
  errno: -99,
  code: 'EADDRNOTAVAIL',
  syscall: 'connect',
  address: '2607:6bc0::10',
  port: 443
}
Node.js v24.15.0

---

const records = params.addresses.map((address) => ({
  address,
  family: address.includes(":") ? 6 : 4
}));
let index = 0;
return ((host, options, callback) => {
  // ...
  const candidates = requestedFamily === 4 || requestedFamily === 6
    ? records.filter((entry) => entry.family === requestedFamily)
    : records;
  const usable = candidates.length > 0 ? candidates : records;
  if (opts.all) { cb(null, usable); return; }
  const chosen = usable[index % usable.length];   // ← round-robin
  index += 1;
  cb(null, chosen.address, chosen.family);
});

---

const BENIGN_UNCAUGHT_EXCEPTION_NETWORK_MESSAGE_CODE_RE =
  /\b(ECONNREFUSED|EHOSTUNREACH|ENETUNREACH|EAI_AGAIN|ENOTFOUND|ETIMEDOUT|UND_ERR_CONNECT_TIMEOUT|UND_ERR_DNS_RESOLVE_FAILED|UND_ERR_CONNECT)\b/i;

---

process.on("uncaughtException", (error) => {
  if (isUncaughtExceptionHandled(error)) return;
  if (isBenignUncaughtExceptionError(error)) {
    console.warn("[openclaw] Non-fatal uncaught exception (continuing):", formatUncaughtError(error));
    return;
  }
  console.error("[openclaw] Uncaught exception:", formatUncaughtError(error));
  // ... runFatalErrorHooks, restoreTerminalState ...
  process.exit(1);
});

---

# ~/.config/systemd/user/openclaw-gateway.service.d/10-env.conf
[Service]
Environment="RES_OPTIONS=no-aaaa"

---

- /\b(ECONNREFUSED|EHOSTUNREACH|ENETUNREACH|EAI_AGAIN|ENOTFOUND|ETIMEDOUT|UND_ERR_CONNECT_TIMEOUT|UND_ERR_DNS_RESOLVE_FAILED|UND_ERR_CONNECT)\b/i
   + /\b(ECONNREFUSED|EHOSTUNREACH|ENETUNREACH|EADDRNOTAVAIL|EAI_AGAIN|ENOTFOUND|ETIMEDOUT|UND_ERR_CONNECT_TIMEOUT|UND_ERR_DNS_RESOLVE_FAILED|UND_ERR_CONNECT)\b/i
RAW_BUFFERClick to expand / collapse

[Bug] Gateway crashes with EADDRNOTAVAIL on IPv6 api.anthropic.com resolution — pinned-lookup round-robin emits unhandled TLSSocket 'error' bypassing uncaughtException benign-list

Heads up

Reporting only — I'm not in a position to actively follow up, test patches, or respond to clarifying questions. Feel free to close as not-actionable if more info is needed; I'd rather you have the diagnostics on record than nothing. All the info I have is below.

Summary

On a host where IPv6 is resolvable via DNS but globally unreachable (default WSL2, IPv6-restricted corporate/ISP networks, IPv6-disabled hosts), the OpenClaw Gateway repeatedly crashes when calling api.anthropic.com. The crash signature is an unhandled 'error' event on a TLSSocket instance with errno: -99 EADDRNOTAVAIL (or -101 ENETUNREACH depending on whether the kernel has IPv6 disabled).

Root cause is a combination of three issues:

  1. SSRF guard's createPinnedLookup retains AAAA records without reachability checks and round-robins across families on subsequent calls — once the pinned-lookup hands out an IPv6 address (via round-robin index advancement, or via a family-specific lookup request from a downstream caller like undici), the connection fails in any IPv6-unreachable environment. Reproducible on every multi-call session against a dual-stack host like api.anthropic.com.
  2. isBenignUncaughtExceptionError regex omits EADDRNOTAVAIL — the same code path that catches ENETUNREACH lets EADDRNOTAVAIL through.
  3. The TLSSocket 'error' event from lookupAndConnect doesn't reach OpenClaw's uncaughtException handler — the crash output has no [openclaw] Uncaught exception: prefix, so the registered handler isn't being invoked. Process is terminated by Node's default EventEmitter node:events:487 throw before OpenClaw can intercept.

This is a regression from #60515 / #67944 / #71529 — those fixes addressed the Telegram code path and added isTransientNetworkError to the global handler, but neither covers the SSRF guard's TLSSocket emission path nor adds EADDRNOTAVAIL to the benign list.

api.anthropic.com is an unusually reliable trigger because every OpenClaw agent run hits it (built-in Anthropic provider) and it's dual-stacked with 2607:6bc0::10.

Environment

  • OpenClaw: 2026.5.7 (eeef486) (current latest at time of report)
  • Node: v24.15.0
  • OS: Ubuntu 24.04 on WSL2 (Windows 11), Linux 5.15.167.4-microsoft-standard-WSL2
  • glibc: 2.39-0ubuntu8.7
  • Service unit: user-level openclaw-gateway.service
  • Network: IPv6 routable to public destinations: NO (no IPv6 default route, only fe80::/64 link-local on eth0). DNS resolves AAAA records normally.

Steps to reproduce

  1. Run OpenClaw Gateway on any host where:
    • DNS returns AAAA records (verify: dig AAAA api.anthropic.com +short returns 2607:6bc0::10)
    • The host has no usable global IPv6 route (verify: ip -6 route shows no default; curl -6 -s --max-time 3 https://api.anthropic.com >/dev/null; echo $? returns non-zero)
  2. Trigger any agent run that calls Anthropic. Easiest: any cron job that uses gpt-5.5/claude-* and makes ≥2 outbound API calls (Daily News built from the schedule skill works).
  3. The first connection succeeds (pinned-lookup index 0 → IPv4 160.79.104.10). The second connection (pinned-lookup index 1 → IPv6 2607:6bc0::10) emits an unhandled TLSSocket 'error' and crashes the gateway.

100% reproducible in the env above. Gateway is then restarted by systemd, in-flight cron jobs get marked cron: job interrupted by gateway restart (related: #77298 — consecutiveErrors counter inflates on these gateway-restart interruptions).

Stack trace (journalctl, redacted)

node:events:487
      throw er; // Unhandled 'error' event
      ^
Error: connect EADDRNOTAVAIL 2607:6bc0::10:443 - Local (:::0)
    at internalConnect (node:net:1169:16)
    at defaultTriggerAsyncIdScope (node:internal/async_hooks:472:18)
    at emitLookup (node:net:1491:9)
    at file:///<NODE_MODULES>/openclaw/dist/ssrf-B5bGsnx-.js:207:3
    at node:net:1468:5
    at defaultTriggerAsyncIdScope (node:internal/async_hooks:472:18)
    at lookupAndConnect (node:net:1467:3)
    at Socket.connect (node:net:1344:5)
    at Object.connect (node:internal/tls/wrap:1782:13)
    at connect (<NODE_MODULES>/openclaw/node_modules/undici/lib/core/connect.js:86:20)
Emitted 'error' event on TLSSocket instance at:
    at emitErrorNT (node:internal/streams/destroy:170:8)
    at emitErrorCloseNT (node:internal/streams/destroy:129:3)
    at process.processTicksAndRejections (node:internal/process/task_queues:90:21) {
  errno: -99,
  code: 'EADDRNOTAVAIL',
  syscall: 'connect',
  address: '2607:6bc0::10',
  port: 443
}
Node.js v24.15.0

systemd then logs Main process exited, code=exited, status=1/FAILURE. Crucially, there is no [openclaw] Uncaught exception: line preceding this, indicating the registered uncaughtException handler in dist/index.js did not fire.

(With kernel IPv6 still enabled, the code is ENETUNREACH (-101) instead of EADDRNOTAVAIL (-99). Same crash path otherwise.)

Root cause analysis

Issue 1 — createPinnedLookup round-robins across families without reachability/preference

dist/ssrf-B5bGsnx-.js:185-208 (line numbers from a built bundle, search for createPinnedLookup):

const records = params.addresses.map((address) => ({
  address,
  family: address.includes(":") ? 6 : 4
}));
let index = 0;
return ((host, options, callback) => {
  // ...
  const candidates = requestedFamily === 4 || requestedFamily === 6
    ? records.filter((entry) => entry.family === requestedFamily)
    : records;
  const usable = candidates.length > 0 ? candidates : records;
  if (opts.all) { cb(null, usable); return; }
  const chosen = usable[index % usable.length];   // ← round-robin
  index += 1;
  cb(null, chosen.address, chosen.family);
});

dedupeAndPreferIpv4 is run upstream so records[0] is IPv4 — but index increments across calls, so the 2nd call returns the IPv6 entry. There's no:

  • IPv6 reachability check before adding the AAAA record to records
  • Family preference (always returning the first entry, retrying on failure)
  • Happy-Eyeballs-style fallback within the lookup itself

For any dual-stack host accessed multiple times in a session on an IPv6-unreachable network, this reliably produces periodic crashes (the workaround RES_OPTIONS=no-aaaa, which strips AAAA records before they reach records[], eliminates them entirely — strong evidence the AAAA retention is the trigger).

Issue 2 — BENIGN_UNCAUGHT_EXCEPTION_NETWORK_MESSAGE_CODE_RE omits EADDRNOTAVAIL

dist/unhandled-rejections--a3kG4I0.js:90:

const BENIGN_UNCAUGHT_EXCEPTION_NETWORK_MESSAGE_CODE_RE =
  /\b(ECONNREFUSED|EHOSTUNREACH|ENETUNREACH|EAI_AGAIN|ENOTFOUND|ETIMEDOUT|UND_ERR_CONNECT_TIMEOUT|UND_ERR_DNS_RESOLVE_FAILED|UND_ERR_CONNECT)\b/i;

EADDRNOTAVAIL is missing. So even if the error reaches OpenClaw's uncaughtException handler (it doesn't, see Issue 3), it would still be treated as fatal.

Issue 3 — TLSSocket 'error' event bypasses the uncaughtException handler

The handler registered in dist/index.js is:

process.on("uncaughtException", (error) => {
  if (isUncaughtExceptionHandled(error)) return;
  if (isBenignUncaughtExceptionError(error)) {
    console.warn("[openclaw] Non-fatal uncaught exception (continuing):", formatUncaughtError(error));
    return;
  }
  console.error("[openclaw] Uncaught exception:", formatUncaughtError(error));
  // ... runFatalErrorHooks, restoreTerminalState ...
  process.exit(1);
});

When the gateway crashes from this code path, the journal output contains the raw node:events:487 throw er; trace and Node.js v24.15.0 footer — never the [openclaw] prefixes from the lines above. This means the uncaughtException handler isn't invoked at all on this path. Best guess: the synchronous throw inside the EventEmitter's error handling is happening in a context where process.on('uncaughtException') isn't reached, or there's an earlier listener calling process.exit first. Either way, the protective layer above (isBenignUncaughtExceptionError) is never consulted.

Workaround that works in production

# ~/.config/systemd/user/openclaw-gateway.service.d/10-env.conf
[Service]
Environment="RES_OPTIONS=no-aaaa"

This invokes glibc 2.36+'s no-aaaa resolver option, which suppresses AAAA queries entirely. Node's dns.lookup then receives only A records, so createPinnedLookup's records array contains only IPv4 entries — round-robin can never produce an IPv6 address.

Verified locally: a previously crash-looping Daily News cron with consecutiveErrors: 4 ran to completion in 149.7s (status: ok, delivered: true) on the very first attempt after applying this. Multi-day stable since.

This is not a fix — the bug above remains — but it removes the trigger.

Things that don't work (verified):

  • Disabling IPv6 at the WSL2 kernel level (net.ipv6.conf.all.disable_ipv6 = 1) — only changes errno: -101 → -99, same crash.
  • NODE_OPTIONS=--no-network-family-autoselection — disables Happy Eyeballs in net.connect but doesn't affect the SSRF guard's pre-resolved cache; the pinned lookup still hands out IPv6 on round-robin index 1.
  • NODE_OPTIONS=--dns-result-order=ipv4first — only sorts results, doesn't filter. The SSRF guard requests { all: true } and gets both families regardless.

Suggested fixes

In rough priority order:

  1. Add EADDRNOTAVAIL to the benign regex (dist/unhandled-rejections-*.js) — one-character fix, low risk:

    - /\b(ECONNREFUSED|EHOSTUNREACH|ENETUNREACH|EAI_AGAIN|ENOTFOUND|ETIMEDOUT|UND_ERR_CONNECT_TIMEOUT|UND_ERR_DNS_RESOLVE_FAILED|UND_ERR_CONNECT)\b/i
    + /\b(ECONNREFUSED|EHOSTUNREACH|ENETUNREACH|EADDRNOTAVAIL|EAI_AGAIN|ENOTFOUND|ETIMEDOUT|UND_ERR_CONNECT_TIMEOUT|UND_ERR_DNS_RESOLVE_FAILED|UND_ERR_CONNECT)\b/i
  2. Investigate why TLSSocket 'error' bypasses uncaughtException — even with (1), if the handler isn't called, the benign list is irrelevant. Likely needs an earlier socket.on('error', ...) registration in the SSRF guard's connect wrapper, or a process.on('uncaughtException', ...) registered with the right priority.

  3. Family-preference + reachability in createPinnedLookup — instead of round-robin across all families, prefer IPv4 by default, fall through to IPv6 only on IPv4 failure, and run a one-shot reachability probe (or use net.getDefaultAutoSelectFamilyAttemptTimeout semantics) before adding AAAA records to the pinned cache. The current round-robin is fragile on heterogeneous-reachability networks (which includes most WSL2 hosts and a non-trivial slice of corporate/ISP environments).

  4. Add RES_OPTIONS=no-aaaa (or equivalent) to docs as a documented escape hatch for IPv6-unreachable environments, until 1–3 land.

Related issues

  • #60515 — uncaughtException lacks isTransientNetworkError check (closed; fix incomplete on this path)
  • #67944 — Gateway crashes on ENETUNREACH (closed; addressed Telegram-only)
  • #71529 — EHOSTUNREACH in SSRF/Telegram crashes Gateway (closed; same regression class)
  • #75778 — IPv6 unroutable network event-loop block in Telegram (closed; Telegram-specific)
  • #77298 — Cron consecutiveErrors increments on gateway-restart interruptions (open; the symptom that surfaces this bug)
  • #77900 — Telegram fetch transport lacks circuit breaker (open; same ENETUNREACH-storm class)

That's everything I've found. Hope it's useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING