openclaw - ✅(Solved) Fix [Bug]: Gateway hangs indefinitely on macOS sleep/wake — no timeout on startup network request [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63966Fetched 2026-04-10 03:41:29
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1referenced ×1

Error Message

Apply a timeout (5–10 seconds) to any network calls made during gateway initialization. On timeout or connection failure, log the error and exit(1). launchd will then restart the process automatically, and by the next attempt the network is typically ready. console.error('[gateway] startup network check failed:', err.message);

Root Cause

The launchd plist uses RunAtLoad=true and KeepAlive=true. On wake/boot, launchd starts the gateway immediately, before the network stack is ready (Wi-Fi reassociation → DHCP → DNS all take time).

The gateway's startup sequence includes a remote authentication/validation HTTP call. When this call is made during the post-wake network window:

  1. The TCP connection enters SYN_SENT state
  2. The request hangs indefinitely — no timeout is set at the application layer
  3. The process does not crash, so launchd's KeepAlive never triggers a restart
  4. Result: the gateway is permanently stuck in the initialization phase
# Process is alive but port never opens:
$ lsof -i :18789
(no output)

$ ps aux | grep openclaw
peterson  3008  openclaw   ← alive, 0% CPU, just waiting

Fix Action

Fix / Workaround

Workaround (user-side)

PR fix notes

PR #63981: fix(gateway): add startup timeout to prevent indefinite hang on macOS sleep/wake

Description (problem / solution / changelog)

Summary

Fixes #63966.

When launchd starts the gateway with RunAtLoad=true before the network stack is ready (e.g. after macOS sleep/wake), a network call inside params.start() can hang indefinitely. The process stays alive but never binds port 18789, so launchd's KeepAlive has nothing to restart.

  • Wrap params.start() in a Promise.race against a 60 s deadline (STARTUP_TIMEOUT_MS)
  • On timeout: throw an error → propagates through the existing isFirstStart path → exit(1) → launchd/systemd supervisor restarts the process once the network is actually available
  • On clean start: clearTimeout ensures the deadline is cancelled immediately, zero overhead on the happy path

The timeout constant sits alongside the other timing constants at the top of run-loop.ts and is easy to adjust.

Test plan

  • New vitest case: exits non-zero when startup times out on first start (no network) — uses vi.useFakeTimers() to advance 60 001 ms and asserts runGatewayLoop rejects with the timeout message
  • All existing runGatewayLoop tests still pass
  • pnpm build && pnpm check && pnpm test green

Notes

  • This is a fail-fast fix only; it does not change which specific network call hangs. launchd's KeepAlive (already in the plist) handles the restart.
  • Does not affect in-process restarts (SIGUSR1): isFirstStart is false on subsequent iterations so the error is caught, logged, and the process stays alive per existing behaviour.

🤖 AI-assisted (Claude Sonnet 4.6)

Changed files

  • src/cli/gateway-cli/run-loop.test.ts (modified, +101/-0)
  • src/cli/gateway-cli/run-loop.ts (modified, +44/-1)

Code Example

# Process is alive but port never opens:
$ lsof -i :18789
(no output)

$ ps aux | grep openclaw
peterson  3008  openclaw   ← alive, 0% CPU, just waiting

---

// Example — apply to the auth/validation fetch at startup
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), 8000);
try {
  await fetch(AUTH_URL, { signal: controller.signal });
} catch (err) {
  console.error('[gateway] startup network check failed:', err.message);
  process.exit(1);  // let launchd KeepAlive restart us
} finally {
  clearTimeout(timer);
}
RAW_BUFFERClick to expand / collapse

Environment

  • OpenClaw version: 2026.4.9
  • OS: macOS (darwin 25.5.0)
  • Install method: Homebrew (/opt/homebrew)
  • Service manager: launchd (ai.openclaw.gateway LaunchAgent)

Bug Description

After macOS wakes from sleep (or on boot), the gateway process starts but never opens port 18789. The process appears alive (ps shows it running), but the HTTP server is never started and the port never listens.

Key observation: the gateway log only contains 2 entries — the startup banner and credential sync — then stops. The normal startup sequence ([gateway] starting HTTP server..., [canvas] host mounted at http://127.0.0.1:18789/) never appears.

Root Cause Analysis

The launchd plist uses RunAtLoad=true and KeepAlive=true. On wake/boot, launchd starts the gateway immediately, before the network stack is ready (Wi-Fi reassociation → DHCP → DNS all take time).

The gateway's startup sequence includes a remote authentication/validation HTTP call. When this call is made during the post-wake network window:

  1. The TCP connection enters SYN_SENT state
  2. The request hangs indefinitely — no timeout is set at the application layer
  3. The process does not crash, so launchd's KeepAlive never triggers a restart
  4. Result: the gateway is permanently stuck in the initialization phase
# Process is alive but port never opens:
$ lsof -i :18789
(no output)

$ ps aux | grep openclaw
peterson  3008  openclaw   ← alive, 0% CPU, just waiting

Expected Behaviour

The gateway should either:

  • Fail fast with a non-zero exit code if the startup network request times out (allowing launchd's KeepAlive to restart it with exponential backoff until connectivity is established), or
  • Retry the authentication call internally with a timeout+backoff loop before proceeding

Suggested Fix (for openclaw team)

Apply a timeout (5–10 seconds) to any network calls made during gateway initialization. On timeout or connection failure, log the error and exit(1). launchd will then restart the process automatically, and by the next attempt the network is typically ready.

// Example — apply to the auth/validation fetch at startup
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), 8000);
try {
  await fetch(AUTH_URL, { signal: controller.signal });
} catch (err) {
  console.error('[gateway] startup network check failed:', err.message);
  process.exit(1);  // let launchd KeepAlive restart us
} finally {
  clearTimeout(timer);
}

Workaround (user-side)

Created a wrapper script (~/.local/bin/openclaw-gateway-start) that polls for connectivity before exec-ing the gateway, and pointed the launchd plist's ProgramArguments at it. This prevents the hang but is a band-aid — the real fix belongs in the application.

Additional Notes

  • Reviewed by both Claude (Sonnet 4.6) and Gemini (gemini-3.1-pro-preview) independently; both arrived at the same root cause conclusion.
  • NetworkState=true in the plist is not a sufficient fix — on modern macOS it only checks that a network interface is active, not that IP/DNS/internet are actually reachable.

extent analysis

TL;DR

Apply a timeout to network calls during gateway initialization to prevent indefinite hanging and allow launchd to restart the process.

Guidance

  • Review the gateway's startup sequence to identify where the remote authentication/validation HTTP call is made and apply a timeout (5-10 seconds) to this call.
  • Implement a retry mechanism with exponential backoff to handle temporary network connectivity issues.
  • Verify that the KeepAlive feature in launchd is properly configured to restart the gateway process in case of a non-zero exit code.
  • Consider implementing a connectivity check before starting the gateway process to ensure that the network is ready.

Example

// Apply a timeout to the auth/validation fetch at startup
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), 8000);
try {
  await fetch(AUTH_URL, { signal: controller.signal });
} catch (err) {
  console.error('[gateway] startup network check failed:', err.message);
  process.exit(1);  // let launchd KeepAlive restart us
} finally {
  clearTimeout(timer);
}

Notes

The provided workaround using a wrapper script can be used as a temporary solution, but the real fix should be applied in the OpenClaw application to handle network connectivity issues during startup.

Recommendation

Apply the suggested fix by adding a timeout to the network calls during gateway initialization, as this will allow launchd to restart the process and ensure that the gateway starts correctly after a network connectivity issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING