openclaw - 💡(How to fix) Fix Gateway exits with code 0 when critical channel exhausts restarts, preventing systemd auto-recovery [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#55410Fetched 2026-04-08 01:39:50
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
closed ×1locked ×1

Error Message

20:16:46 [whatsapp] No messages received in 41m - restarting connection 20:16:46 [whatsapp] Web connection closed (status 499). Retry 9/12 in 30s 20:16:49 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: disconnected) 20:16:50 [whatsapp] [default] channel exited: {error:{data:{code:ETIMEDOUT}...}} 20:16:50 [whatsapp] [default] auto-restart attempt 1/10 in 5s 20:16:55 [whatsapp] Listening for personal WhatsApp inbound messages. 20:17:30 [gateway] signal SIGTERM received ← clean shutdown, exit 0 20:17:31 systemd: Stopped openclaw-gateway.service 20:17:31 systemd: Consumed 47.974s CPU time, 590.8M memory peak

--- 2-hour gap with no auto-restart ---

22:07:45 systemd: Started openclaw-gateway.service ← manual restart

Root Cause

The systemd unit ships with:

SuccessExitStatus=0 143

Exit code 0 is explicitly listed as a success exit status. When the gateway exits with code 0 after a channel failure, systemd interprets this as the process completing its job successfully and does not trigger Restart=always.

The gateway should exit with a non-zero code (e.g. 1) when a critical channel fails and cannot recover, to correctly signal failure to the process manager.

Fix Action

Workaround

Remove 0 from SuccessExitStatus in the systemd unit:

# Before (shipped default)
SuccessExitStatus=0 143

# After (workaround)
SuccessExitStatus=143

Then systemctl --user daemon-reload. This forces systemd to restart on any exit except clean SIGTERM.

Code Example

20:16:46 [whatsapp] No messages received in 41m - restarting connection
20:16:46 [whatsapp] Web connection closed (status 499). Retry 9/12 in 30s
20:16:49 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: disconnected)
20:16:50 [whatsapp] [default] channel exited: {error:{data:{code:ETIMEDOUT}...}}
20:16:50 [whatsapp] [default] auto-restart attempt 1/10 in 5s
20:16:55 [whatsapp] Listening for personal WhatsApp inbound messages.
20:17:30 [gateway] signal SIGTERM received        ← clean shutdown, exit 0
20:17:31 systemd: Stopped openclaw-gateway.service
20:17:31 systemd: Consumed 47.974s CPU time, 590.8M memory peak
# --- 2-hour gap with no auto-restart ---
22:07:45 systemd: Started openclaw-gateway.service   ← manual restart

---

SuccessExitStatus=0 143

---

# Before (shipped default)
SuccessExitStatus=0 143

# After (workaround)
SuccessExitStatus=143
RAW_BUFFERClick to expand / collapse

Bug Description

When a critical channel (e.g. WhatsApp Web) exhausts all internal restart attempts and the final health-monitor-triggered restart also fails (e.g. ETIMEDOUT), the gateway exits with exit code 0 (success) instead of a non-zero failure code.

This causes process managers (systemd, Docker, PM2, Kubernetes) to treat the shutdown as a successful completion rather than a failure — so they do not auto-restart the process.

Relationship to #55330

This is a separate but related bug to #55330:

#55330This issue
ModeProcess stays alive but loops infinitely (can't stop)Process exits cleanly but signals wrong outcome
SymptomHigh CPU, 499 loop every ~60sGateway stays down indefinitely
FixReset lastInboundAt on reconnectExit with non-zero code on channel failure

They are inverse failure modes of the same restart handling subsystem.

Steps to Reproduce

  1. Start gateway with WhatsApp Web connected
  2. Wait for a quiet period where no WA messages arrive for 30+ minutes
  3. Watchdog fires 499 disconnect repeatedly until retry limit is reached (10/12 in observed case)
  4. Health-monitor triggers one final restart attempt which fails (ETIMEDOUT)
  5. Gateway exits — with code 0
  6. systemd does not restart the service

Log Evidence

20:16:46 [whatsapp] No messages received in 41m - restarting connection
20:16:46 [whatsapp] Web connection closed (status 499). Retry 9/12 in 30s
20:16:49 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: disconnected)
20:16:50 [whatsapp] [default] channel exited: {error:{data:{code:ETIMEDOUT}...}}
20:16:50 [whatsapp] [default] auto-restart attempt 1/10 in 5s
20:16:55 [whatsapp] Listening for personal WhatsApp inbound messages.
20:17:30 [gateway] signal SIGTERM received        ← clean shutdown, exit 0
20:17:31 systemd: Stopped openclaw-gateway.service
20:17:31 systemd: Consumed 47.974s CPU time, 590.8M memory peak
# --- 2-hour gap with no auto-restart ---
22:07:45 systemd: Started openclaw-gateway.service   ← manual restart

Root Cause

The systemd unit ships with:

SuccessExitStatus=0 143

Exit code 0 is explicitly listed as a success exit status. When the gateway exits with code 0 after a channel failure, systemd interprets this as the process completing its job successfully and does not trigger Restart=always.

The gateway should exit with a non-zero code (e.g. 1) when a critical channel fails and cannot recover, to correctly signal failure to the process manager.

Expected Behavior

Gateway exits with code 1 (or any non-zero code) when:

  • A critical channel exhausts all restart attempts, OR
  • The health-monitor cannot recover a channel after its own restart

Actual Behavior

Gateway exits with code 0, causing:

  • systemd: no restart triggered (Restart=always is a no-op for success exits)
  • Docker: container marked as exited cleanly, no restart
  • Any process manager respecting exit codes: same silent failure

Impact

Severity: High. Any production OpenClaw deployment with WhatsApp Web will silently lose the channel and not recover. The clean shutdown logs make this difficult to diagnose — there is no crash, no error, just a successful exit.

Workaround

Remove 0 from SuccessExitStatus in the systemd unit:

# Before (shipped default)
SuccessExitStatus=0 143

# After (workaround)
SuccessExitStatus=143

Then systemctl --user daemon-reload. This forces systemd to restart on any exit except clean SIGTERM.

Environment

  • OpenClaw version: 2026.3.23-2
  • OS: Linux 6.17.0-1009-oracle (arm64)
  • Node: v22.22.1
  • Channel: WhatsApp Web (Baileys)
  • Process manager: systemd (user unit)

extent analysis

Fix Plan

To fix the issue, we need to modify the gateway to exit with a non-zero code when a critical channel fails and cannot recover. Here are the steps:

  • Modify the gateway's shutdown logic to exit with a non-zero code (e.g., 1) when a critical channel exhausts all restart attempts or the health-monitor cannot recover a channel after its own restart.
  • Update the systemd unit to remove 0 from SuccessExitStatus to ensure that systemd restarts the process on any non-zero exit code.

Example code snippet to exit with a non-zero code:

process.exit(1); // Exit with code 1 on critical channel failure

Update the systemd unit:

SuccessExitStatus=143

Then, run systemctl --user daemon-reload to apply the changes.

Verification

To verify that the fix worked:

  • Reproduce the issue by simulating a critical channel failure.
  • Check the systemd logs to ensure that the process is restarted after exiting with a non-zero code.
  • Verify that the gateway recovers and resumes normal operation after the restart.

Extra Tips

  • Ensure that the SuccessExitStatus configuration is updated correctly in the systemd unit.
  • Test the fix thoroughly to ensure that it works as expected in different scenarios.
  • Consider adding logging or monitoring to detect and alert on critical channel failures to facilitate quicker diagnosis and resolution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING