openclaw - 💡(How to fix) Fix Gateway exits with code 0 when critical channel exhausts restarts, preventing systemd auto-recovery [1 participants]

openclaw2026-03-26 23:06:00

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#55410•Fetched 2026-04-08 01:39:50

View on GitHub

Comments

Participants

Timeline

Reactions

Author

adithyan-ak

Participants

adithyan-ak

Timeline (top)

closed ×1locked ×1

Error Message

20:16:46 [whatsapp] No messages received in 41m - restarting connection 20:16:46 [whatsapp] Web connection closed (status 499). Retry 9/12 in 30s 20:16:49 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: disconnected) 20:16:50 [whatsapp] [default] channel exited: {error:{data:{code:ETIMEDOUT}...}} 20:16:50 [whatsapp] [default] auto-restart attempt 1/10 in 5s 20:16:55 [whatsapp] Listening for personal WhatsApp inbound messages. 20:17:30 [gateway] signal SIGTERM received ← clean shutdown, exit 0 20:17:31 systemd: Stopped openclaw-gateway.service 20:17:31 systemd: Consumed 47.974s CPU time, 590.8M memory peak

--- 2-hour gap with no auto-restart ---

22:07:45 systemd: Started openclaw-gateway.service ← manual restart

Root Cause

The systemd unit ships with:

SuccessExitStatus=0 143

Exit code 0 is explicitly listed as a success exit status. When the gateway exits with code 0 after a channel failure, systemd interprets this as the process completing its job successfully and does not trigger Restart=always.

The gateway should exit with a non-zero code (e.g. 1) when a critical channel fails and cannot recover, to correctly signal failure to the process manager.

Fix Action

Workaround

Remove 0 from SuccessExitStatus in the systemd unit:

# Before (shipped default)
SuccessExitStatus=0 143

# After (workaround)
SuccessExitStatus=143

Then systemctl --user daemon-reload. This forces systemd to restart on any exit except clean SIGTERM.

Code Example

20:16:46 [whatsapp] No messages received in 41m - restarting connection
20:16:46 [whatsapp] Web connection closed (status 499). Retry 9/12 in 30s
20:16:49 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: disconnected)
20:16:50 [whatsapp] [default] channel exited: {error:{data:{code:ETIMEDOUT}...}}
20:16:50 [whatsapp] [default] auto-restart attempt 1/10 in 5s
20:16:55 [whatsapp] Listening for personal WhatsApp inbound messages.
20:17:30 [gateway] signal SIGTERM received        ← clean shutdown, exit 0
20:17:31 systemd: Stopped openclaw-gateway.service
20:17:31 systemd: Consumed 47.974s CPU time, 590.8M memory peak
# --- 2-hour gap with no auto-restart ---
22:07:45 systemd: Started openclaw-gateway.service   ← manual restart

---

SuccessExitStatus=0 143

---

# Before (shipped default)
SuccessExitStatus=0 143

# After (workaround)
SuccessExitStatus=143

RAW_BUFFERClick to expand / collapse

Bug Description

When a critical channel (e.g. WhatsApp Web) exhausts all internal restart attempts and the final health-monitor-triggered restart also fails (e.g. ETIMEDOUT), the gateway exits with exit code 0 (success) instead of a non-zero failure code.

This causes process managers (systemd, Docker, PM2, Kubernetes) to treat the shutdown as a successful completion rather than a failure — so they do not auto-restart the process.

Relationship to #55330

This is a separate but related bug to #55330:

	#55330	This issue
Mode	Process stays alive but loops infinitely (can't stop)	Process exits cleanly but signals wrong outcome
Symptom	High CPU, 499 loop every ~60s	Gateway stays down indefinitely
Fix	Reset `lastInboundAt` on reconnect	Exit with non-zero code on channel failure

They are inverse failure modes of the same restart handling subsystem.

Steps to Reproduce

Start gateway with WhatsApp Web connected
Wait for a quiet period where no WA messages arrive for 30+ minutes
Watchdog fires 499 disconnect repeatedly until retry limit is reached (10/12 in observed case)
Health-monitor triggers one final restart attempt which fails (ETIMEDOUT)
Gateway exits — with code 0
systemd does not restart the service

Log Evidence

20:16:46 [whatsapp] No messages received in 41m - restarting connection
20:16:46 [whatsapp] Web connection closed (status 499). Retry 9/12 in 30s
20:16:49 [health-monitor] [whatsapp:default] health-monitor: restarting (reason: disconnected)
20:16:50 [whatsapp] [default] channel exited: {error:{data:{code:ETIMEDOUT}...}}
20:16:50 [whatsapp] [default] auto-restart attempt 1/10 in 5s
20:16:55 [whatsapp] Listening for personal WhatsApp inbound messages.
20:17:30 [gateway] signal SIGTERM received        ← clean shutdown, exit 0
20:17:31 systemd: Stopped openclaw-gateway.service
20:17:31 systemd: Consumed 47.974s CPU time, 590.8M memory peak
# --- 2-hour gap with no auto-restart ---
22:07:45 systemd: Started openclaw-gateway.service   ← manual restart

Root Cause

The systemd unit ships with:

SuccessExitStatus=0 143

The gateway should exit with a non-zero code (e.g. 1) when a critical channel fails and cannot recover, to correctly signal failure to the process manager.

Expected Behavior

Gateway exits with code 1 (or any non-zero code) when:

A critical channel exhausts all restart attempts, OR
The health-monitor cannot recover a channel after its own restart

Actual Behavior

Gateway exits with code 0, causing:

systemd: no restart triggered (Restart=always is a no-op for success exits)
Docker: container marked as exited cleanly, no restart
Any process manager respecting exit codes: same silent failure

Impact

Severity: High. Any production OpenClaw deployment with WhatsApp Web will silently lose the channel and not recover. The clean shutdown logs make this difficult to diagnose — there is no crash, no error, just a successful exit.

Workaround

Remove 0 from SuccessExitStatus in the systemd unit:

# Before (shipped default)
SuccessExitStatus=0 143

# After (workaround)
SuccessExitStatus=143

Then systemctl --user daemon-reload. This forces systemd to restart on any exit except clean SIGTERM.

Environment

OpenClaw version: 2026.3.23-2
OS: Linux 6.17.0-1009-oracle (arm64)
Node: v22.22.1
Channel: WhatsApp Web (Baileys)
Process manager: systemd (user unit)

extent analysis

Fix Plan

To fix the issue, we need to modify the gateway to exit with a non-zero code when a critical channel fails and cannot recover. Here are the steps:

Modify the gateway's shutdown logic to exit with a non-zero code (e.g., 1) when a critical channel exhausts all restart attempts or the health-monitor cannot recover a channel after its own restart.
Update the systemd unit to remove 0 from SuccessExitStatus to ensure that systemd restarts the process on any non-zero exit code.

Example code snippet to exit with a non-zero code:

process.exit(1); // Exit with code 1 on critical channel failure

Update the systemd unit:

SuccessExitStatus=143

Then, run systemctl --user daemon-reload to apply the changes.

Verification

To verify that the fix worked:

Reproduce the issue by simulating a critical channel failure.
Check the systemd logs to ensure that the process is restarted after exiting with a non-zero code.
Verify that the gateway recovers and resumes normal operation after the restart.

Extra Tips

Ensure that the SuccessExitStatus configuration is updated correctly in the systemd unit.
Test the fix thoroughly to ensure that it works as expected in different scenarios.
Consider adding logging or monitoring to detect and alert on critical channel failures to facilitate quicker diagnosis and resolution.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#vector store #embedding generation #cache error #pipeline error #runtime error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Gateway exits with code 0 when critical channel exhausts restarts, preventing systemd auto-recovery [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

--- 2-hour gap with no auto-restart ---

Root Cause

Fix Action

Workaround

Code Example

Bug Description

Relationship to #55330

Steps to Reproduce

Log Evidence

Root Cause

Expected Behavior

Actual Behavior

Impact

Workaround

Environment

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Gateway exits with code 0 when critical channel exhausts restarts, preventing systemd auto-recovery [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

--- 2-hour gap with no auto-restart ---

Root Cause

Fix Action

Workaround

Code Example

Bug Description

Relationship to #55330

Steps to Reproduce

Log Evidence

Root Cause

Expected Behavior

Actual Behavior

Impact

Workaround

Environment

extent analysis

Fix Plan

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING