openclaw - 💡(How to fix) Fix Gateway startup/status can self-DoS after restart and reports ready before channels are online [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72929Fetched 2026-04-28 06:30:09
View on GitHub
Comments
2
Participants
2
Timeline
3
Reactions
0
Author
Participants
Timeline (top)
commented ×2closed ×1

After updating OpenClaw from 2026.4.24 to 2026.4.25 and restarting the gateway, Discord appeared offline. The gateway was not crashing. It was repeatedly restarted during its post-core startup window. The process logs ready after roughly 5 to 8 seconds, but Discord becomes usable only after sidecars/channels complete, observed at roughly 60 to 105 seconds depending on the cycle.

Repeated openclaw gateway restart, openclaw gateway status, or openclaw doctor calls during this window can make the system look like a crash loop and may restart or extend the startup window. Status output is contradictory during this state.

Error Message

Runtime: running (pid ..., state active) Warm-up: launch agents can take a few seconds. Try again shortly. Connectivity probe: failed connect ECONNREFUSED 127.0.0.1:18789 Gateway port 18789 is not listening (service appears running). Last gateway error: ... [ws] closed before connect ... code=1006

Root Cause

This is operator-confusing because it mixes running runtime state, non-listening gateway state, stale short warm-up hint, and expected startup websocket noise.

Code Example

[gateway] loading configuration…
[gateway] resolving authentication…
[gateway] starting...
[secrets] [SECRETS_GATEWAY_AUTH_SURFACE] gateway.auth.token is inactive. gateway.auth.token: *** token env var is configured.
[gateway] starting HTTP server...
[health-monitor] started (interval: 300s, startup-grace: 60s, channel-connect-grace: 120s)
[gateway] ready (9 plugins: ...; ~5-8s)
[gateway] starting channels and sidecars...
[plugins] embedded acpx runtime backend registered
[browser/server] Browser control listening
[heartbeat] started
[discord] [default] starting provider
[discord] channels resolved
[discord] users resolved
[discord] logged in to discord as ...

---

Runtime: running (pid ..., state active)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
connect ECONNREFUSED 127.0.0.1:18789
Gateway port 18789 is not listening (service appears running).
Last gateway error: ... [ws] closed before connect ... code=1006

---

Gateway is still initializing: phase=channels-starting, elapsed=47s, Discord not online yet.
Refusing to restart because this would reset startup. Wait ~73s or pass --force.

---

[gateway] ready (...)

---

[gateway] core ready (...); starting sidecars/channels next

---

[gateway] ready for traffic (channels: discord=online; elapsed=60.6s)

---

[gateway/startup] phase=acpx-register duration=44.1s
[gateway/startup] phase=browser-server duration=0.2s
[gateway/startup] phase=discord-login duration=2.3s

---

[gateway/ws] expected-during-startup closed before connect ...

---

Warm-up: gateway core may be ready in a few seconds, but channels such as Discord can take 1 to 2 minutes. Avoid restart/status loops during this window.

---

Startup phase: channels-starting, elapsed=45s, Discord not online yet, expected window up to 120s.
RAW_BUFFERClick to expand / collapse

OpenClaw 2026.4.25 gateway startup/status hardening ticket

Date: 2026-04-27 Version: OpenClaw 2026.4.25, commit aa36ee6 Host: macOS LaunchAgent gateway, loopback bind, port 18789

Summary

After updating OpenClaw from 2026.4.24 to 2026.4.25 and restarting the gateway, Discord appeared offline. The gateway was not crashing. It was repeatedly restarted during its post-core startup window. The process logs ready after roughly 5 to 8 seconds, but Discord becomes usable only after sidecars/channels complete, observed at roughly 60 to 105 seconds depending on the cycle.

Repeated openclaw gateway restart, openclaw gateway status, or openclaw doctor calls during this window can make the system look like a crash loop and may restart or extend the startup window. Status output is contradictory during this state.

Evidence

Relevant logs:

  • ~/.openclaw/logs/gateway.log
  • /tmp/openclaw/openclaw-2026-04-27.log
  • ~/.openclaw/logs/stability/

No new stability bundles were created on 2026-04-27. Newest stability bundles are from 2026-04-25, which supports the conclusion that this was not a fresh crash loop.

Observed startup sequence:

[gateway] loading configuration…
[gateway] resolving authentication…
[gateway] starting...
[secrets] [SECRETS_GATEWAY_AUTH_SURFACE] gateway.auth.token is inactive. gateway.auth.token: *** token env var is configured.
[gateway] starting HTTP server...
[health-monitor] started (interval: 300s, startup-grace: 60s, channel-connect-grace: 120s)
[gateway] ready (9 plugins: ...; ~5-8s)
[gateway] starting channels and sidecars...
[plugins] embedded acpx runtime backend registered
[browser/server] Browser control listening
[heartbeat] started
[discord] [default] starting provider
[discord] channels resolved
[discord] users resolved
[discord] logged in to discord as ...

Representative final successful cycle:

  • 11:41:22 EDT: gateway starting
  • 11:41:28 EDT: ready logged
  • 11:42:13 EDT: embedded acpx runtime backend registered
  • 11:42:20 EDT: Discord provider starting
  • 11:42:22 EDT: logged in to Discord

That is roughly 60 seconds from process start to Discord online, and roughly 54 seconds after the misleading ready log.

Representative interrupted cycles:

  • 11:24:41 start, 11:24:46 ready, then new startup begins at 11:25:18 before channel completion.
  • 11:32:11 start, 11:32:19 ready, then new startup begins at 11:32:34 before channel completion.
  • 11:36:25 start, 11:36:32 ready, then new startup begins at 11:37:01 before channel completion.
  • 11:40:46 start, 11:40:51 ready, then new startup begins at 11:41:21 before channel completion.

Status output during startup reported:

Runtime: running (pid ..., state active)
Warm-up: launch agents can take a few seconds. Try again shortly.
Connectivity probe: failed
connect ECONNREFUSED 127.0.0.1:18789
Gateway port 18789 is not listening (service appears running).
Last gateway error: ... [ws] closed before connect ... code=1006

This is operator-confusing because it mixes running runtime state, non-listening gateway state, stale short warm-up hint, and expected startup websocket noise.

Issue 1: Restart-during-startup self-DoS

Finding

Confirmed. The gateway can be kept in a startup loop by repeated restarts/status/doctor activity during the post-core startup window.

Likely root cause

openclaw gateway restart lacks a startup-state guard. It treats any request as an immediate restart even while the prior restart has not finished channel initialization. If status or doctor triggers LaunchAgent repair/reload during startup, they can also contribute. Read-only commands must never bounce the service.

Proposed code changes

  1. Add a persisted startup state artifact written by the gateway, for example:

    • ~/.openclaw/run/gateway-startup-state.json
    • or an equivalent runtime state endpoint.
  2. Track phases:

    • process-starting
    • config
    • auth
    • http-listening
    • core-ready
    • sidecars-starting
    • acpx-ready
    • channels-starting
    • discord-starting
    • discord-online
    • ready-for-traffic
  3. In openclaw gateway restart, refuse or debounce if:

    • current phase is not ready-for-traffic, and
    • process age or startup elapsed is less than a configurable guard window, default 120 seconds.

Suggested operator output:

Gateway is still initializing: phase=channels-starting, elapsed=47s, Discord not online yet.
Refusing to restart because this would reset startup. Wait ~73s or pass --force.
  1. Add --force to override.

  2. Audit openclaw gateway status and openclaw doctor so read-only invocations never call service install, unload, reload, kickstart, restart, or repair unless explicitly passed a repair flag.

Issue 2: Post-ready startup time

Finding

Confirmed. ready is logged when core HTTP/plugin setup is complete, not when channels are online. On this host, Discord can lag ready by roughly 54 seconds.

Likely root cause

Sidecar/channel startup happens after the current ready log. AC PX registration appears to be the longest step, roughly 44 to 46 seconds after starting channels and sidecars... in later cycles. Discord itself starts quickly after AC PX, then logs in within a few seconds in the final successful cycle.

Proposed code changes

  1. Rename the current log line from:
[gateway] ready (...)

to:

[gateway] core ready (...); starting sidecars/channels next
  1. Add a later operator-facing line only after required channels are online:
[gateway] ready for traffic (channels: discord=online; elapsed=60.6s)
  1. Instrument duration for each sidecar/channel start:
[gateway/startup] phase=acpx-register duration=44.1s
[gateway/startup] phase=browser-server duration=0.2s
[gateway/startup] phase=discord-login duration=2.3s
  1. Review channel/sidecar startup ordering. If Discord does not require AC PX, start Discord in parallel with AC PX and browser sidecar. If it does require AC PX, status should say so clearly.

Issue 3: OPENCLAW_GATEWAY_TOKEN warning persists

Finding

Confirmed. The LaunchAgent plist embeds OPENCLAW_GATEWAY_TOKEN and OPENCLAW_SERVICE_MANAGED_ENV_KEYS includes OPENCLAW_GATEWAY_TOKEN. Logs say gateway.auth.token is inactive while gateway token env var is configured.

Likely root cause

The LaunchAgent installer still writes or preserves the env token even when gateway.auth.token is SecretRef-managed. Doctor then correctly detects the embedded token but recommends openclaw gateway install --force, which does not actually remove it in this configuration.

Proposed code changes

  1. In the LaunchAgent install/generation path, if gateway.auth.token resolves to an active SecretRef-managed value, omit OPENCLAW_GATEWAY_TOKEN from:

    • EnvironmentVariables
    • OPENCLAW_SERVICE_MANAGED_ENV_KEYS
  2. Add explicit cleanup behavior for openclaw gateway install --force:

    • read existing plist
    • remove obsolete managed env keys not present in the newly generated desired environment
    • write desired plist atomically
    • unload/reload only when install command is intentionally state-changing
  3. Update doctor:

    • if token is embedded and SecretRef-managed auth is active, recommend the fix that actually removes it, or report the exact file path and env key.
    • if the env token is intentionally retained as fallback but inactive, downgrade from warning to info and stop recommending a no-op fix.

Issue 4: Log noise during restart/startup

Finding

Confirmed. Expected restart/startup events are surfaced as warnings/errors:

  • [agents/subagent-registry] subagent wait interrupted; scheduling recovery
  • [ws] closed before connect ... code=1006
  • handshake timeout
  • subagent announce timeout after 120 seconds

Proposed code changes

  1. Mark planned restart/startup windows with a runtime flag or timestamp.

  2. During startup grace and channel-connect grace, downgrade expected local loopback failures to info:

[gateway/ws] expected-during-startup closed before connect ...
  1. Keep WARN/ERROR for:

    • failures after grace expiration
    • non-loopback clients
    • repeated failures after ready-for-traffic
  2. Status/doctor should suppress Last gateway error if the latest error is expected-during-startup and a newer healthy event exists.

Issue 5: Documentation and operator messaging

Finding

Confirmed. The current warm-up hint says LaunchAgents can take a few seconds. On this Mac, the realistic Discord-ready window is 60 to 120 seconds.

Proposed changes

  1. Update status hint:
Warm-up: gateway core may be ready in a few seconds, but channels such as Discord can take 1 to 2 minutes. Avoid restart/status loops during this window.
  1. Prefer instrumented status over static hint:
Startup phase: channels-starting, elapsed=45s, Discord not online yet, expected window up to 120s.

Acceptance criteria

  • Running openclaw gateway restart twice within 60 seconds either no-ops or warns/refuses unless --force is passed.
  • openclaw gateway status during startup reports phase and elapsed time instead of contradictory running/not-running output.
  • openclaw doctor does not restart or repair anything unless explicitly asked.
  • openclaw doctor either clears the embedded token warning after the recommended fix or recommends a fix that actually removes the token.
  • Discord is reachable within a documented, predictable 60 to 120 second window after update/restart without operator intervention.
  • Expected local loopback websocket/subagent failures during startup are not presented as real failures.

Immediate local operator guidance

Until this is fixed upstream:

  1. After openclaw update or openclaw gateway restart, wait 2 full minutes before running more diagnostics.
  2. Avoid repeated openclaw gateway restart, openclaw doctor, or openclaw gateway status during that window.
  3. Treat ready (...) as core-ready only, not Discord-ready.
  4. Watch for [discord] logged in to discord as ... before assuming the bot is online.
  5. Rotate any API keys accidentally pasted into chat/log transcripts.

Security note

A pasted terminal output contained live API tokens from the LaunchAgent environment. Those should be rotated. Do not include raw tokens in the public issue.

extent analysis

TL;DR

To address the OpenClaw gateway startup and status issues, implement a startup-state guard in openclaw gateway restart, track phases, and refuse or debounce restarts during the startup window.

Guidance

  1. Implement startup-state tracking: Introduce a persisted startup state artifact, such as ~/.openclaw/run/gateway-startup-state.json, to track phases like process-starting, core-ready, and discord-online.
  2. Debounce restarts: Modify openclaw gateway restart to refuse or debounce restarts if the current phase is not ready-for-traffic and the process age or startup elapsed is less than a configurable guard window (default 120 seconds).
  3. Update status output: Change the ready log line to indicate that sidecars and channels are starting next, and add a later operator-facing line when required channels are online.
  4. Instrument sidecar/channel startup: Log the duration for each sidecar/channel start to better understand startup times.

Example

[gateway] core ready (...); starting sidecars/channels next
[gateway] ready for traffic (channels: discord=online; elapsed=60.6s)

Notes

  • The proposed changes aim to address the issues with the OpenClaw gateway startup and status, but may require further refinement based on the specific implementation details.
  • It is essential to test these changes thoroughly to ensure they resolve the reported issues without introducing new problems.

Recommendation

Apply the proposed workaround by implementing the startup-state guard and debouncing restarts during the startup window. This should help prevent the gateway from being kept in a startup loop by repeated restarts and status checks.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway startup/status can self-DoS after restart and reports ready before channels are online [2 comments, 2 participants]