openclaw - 💡(How to fix) Fix [Bug]: Gateway loops with SIGTERM every ~90s after upgrade 2026.4.23→2026.5.18 (WSL2). Inbound msg received but cli watchdog kills mid-response

StepCodex · 2026-05-20T14:49:56Z

[openclaw] After upgrade from 2026.4.23 to 2026.5.18 on WSL2 Ubuntu-24.04, the openclaw-gateway enters a perpetual restart loop, killing itself every ~30-90 se… After upgrade from `2026.4.23` to `2026.5.18` on WSL2 Ubuntu-24.04, the openclaw-gateway enters a perpetual restart loop, killing itself every ~30-90 seconds via SIGTERM. When a Telegram inbound message arrives, the gateway: 1. Receives the message correctly (`[telegram] Inbound message ... (direct, 15 chars)`) 2. Triggers `cli exec` to claude-cli 3. Gets SIGTERMed ~1.5 min later, **before claude-cli emits its first token** 4. User never receives a response Result: bot has been functionally unresponsive for 16+ days even though it accepts and acknowledges incoming messages. ## Fix / Workaround 1. Install openclaw 2026.4.23, configure Telegram channel + claude-cli backend. 2. Use it normally for some weeks (bot responds OK). 3. Upgrade to 2026.5.18 (\`npm i -g openclaw@latest\`). 4. Restart \`openclaw-gateway\` via systemd. 5. Observe gateway log: ready → telegram provider starts → SIGTERM at ~30-90s. Loops. 6. Send DM to bot → \`[telegram] Inbound message\` appears, \`[agent/cli-backend] cli exec\` fires, but SIGTERM kills gateway before claude-cli output. No outbound \`sendMessage\` logged. User receives nothing. The workaround was to add \`alsoAllow\` at \`tools\` root level. That schema-accepted, but unclear if it is the intended location. ## Summary After upgrade from `2026.4.23` to `2026.5.18` on WSL2 Ubuntu-24.04, the openclaw-gateway enters a perpetual restart loop, killing itself every ~30-90 seconds via SIGTERM. When a Telegram inbound message arrives, the gateway: 1. Receives the message correctly (`[telegram] Inbound message ... (direct, 15 chars)`) 2. Triggers `cli exec` to claude-cli 3. Gets SIGTERMed ~1.5 min later, **before claude-cli emits its first token** 4. User never receives a response Result: bot has been functionally unresponsive for 16+ days even though it accepts and acknowledges incoming messages. ## Environment - OS: Ubuntu 24.04 on WSL2 (Windows 11 Pro) - Node: v24.15.0 - OpenClaw: 2026.5.18 (50a2481), upgraded from 2026.4.23 - Channel: Telegram (single bot, no other consumers) - CLI backend: claude-cli (Claude Code 2.1.119) via custom wrapper - Agent profile: \`messaging\` ## Reproduction 1. Install openclaw 2026.4.23, configure Telegram channel + claude-cli backend. 2. Use it normally for some weeks (bot responds OK). 3. Upgrade to 2026.5.18 (\`npm i -g openclaw@latest\`). 4. Restart \`openclaw-gateway\` via systemd. 5. Observe gateway log: ready → telegram provider starts → SIGTERM at ~30-90s. Loops. 6. Send DM to bot → \`[telegram] Inbound message\` appears, \`[agent/cli-backend] cli exec\` fires, but SIGTERM kills gateway before claude-cli output. No outbound \`sendMessage\` logged. User receives nothing. ## Evidence Diagnostics bundle: attached (\`openclaw-diagnostics-2026-05-20T14-48-45-951Z-1942.zip\`, 10.7 KiB, 8 files, payload-free). Representative log excerpt (one full cycle where a real user message was received and processed): \`\`\` 2026-05-20T13:10:34.347Z [gateway] loading configuration… 2026-05-20T13:10:38.937Z [gateway] ready 2026-05-20T13:10:39.052Z [telegram] [default] starting provider (@Festinnbot) 2026-05-20T13:10:40.258Z [telegram] [diag] isolated polling ingress started spool=... 2026-05-20T13:10:40.287Z [telegram] Inbound message telegram: -> @Festinnbot (direct, 15 chars) 2026-05-20T13:10:40.368Z tools policy: profile \"messaging\" (agent \"main\") has configured tool sections (tools.exec / tools.fs) that no longer implicitly widen the profile. Add alsoAllow: [\"exec\", \"process\", \"read\", \"write\", \"edit\"] explicitly if these tools should be available. See #47487. 2026-05-20T13:10:52.166Z [agent/cli-backend] cli session reset: provider=claude-cli reason=system-prompt 2026-05-20T13:10:52.360Z [agent/cli-backend] cli exec: provider=claude-cli model=sonnet promptChars=4320 trigger=user useResume=false session=none resumeSession=none reuse=invalidated:system-prompt historyPrompt=present 2026-05-20T13:11:39.900Z [health:debug] channel { ... } 2026-05-20T13:11:40.580Z [health:debug] probe.bot { channel: 'telegram', accountId: 'default', username: 'Festinnbot' } 2026-05-20T13:12:18.002Z [gateway] signal SIGTERM received 2026-05-20T13:12:18.016Z [gateway] received SIGTERM; shutting down 2026-05-20T13:12:18.037Z [shutdown] started: gateway stopping 2026-05-20T13:12:18.186Z [shutdown] completed cleanly in 147ms \`\`\` Pattern across 20+ cycles in last 2h: SIGTERM lands ~85-90s after \`[telegram] [default] starting provider\`, regardless of whether a message was received. Always preceded by \`[health:debug] probe.bot\` returning OK with the correct \`@Festinnbot\` username. journalctl --user output is empty in this WSL2 install (no user systemd journal). System journalctl was also queried, no matching entries for the gateway service shutdown initiator. ## What I tried (did not resolve) | Cha

Root Cause

After upgrade from 2026.4.23 to 2026.5.18 on WSL2 Ubuntu-24.04, the openclaw-gateway enters a perpetual restart loop, killing itself every ~30-90 seconds via SIGTERM. When a Telegram inbound message arrives, the gateway:

Receives the message correctly ([telegram] Inbound message ... (direct, 15 chars))
Triggers cli exec to claude-cli
Gets SIGTERMed ~1.5 min later, before claude-cli emits its first token
User never receives a response

Result: bot has been functionally unresponsive for 16+ days even though it accepts and acknowledges incoming messages.

Fix Action

Fix / Workaround

Install openclaw 2026.4.23, configure Telegram channel + claude-cli backend.
Use it normally for some weeks (bot responds OK).
Upgrade to 2026.5.18 (`npm i -g openclaw@latest`).
Restart `openclaw-gateway` via systemd.
Observe gateway log: ready → telegram provider starts → SIGTERM at ~30-90s. Loops.
Send DM to bot → `[telegram] Inbound message` appears, `[agent/cli-backend] cli exec` fires, but SIGTERM kills gateway before claude-cli output. No outbound `sendMessage` logged. User receives nothing.

The workaround was to add `alsoAllow` at `tools` root level. That schema-accepted, but unclear if it is the intended location.

Summary

Receives the message correctly ([telegram] Inbound message ... (direct, 15 chars))
Triggers cli exec to claude-cli
Gets SIGTERMed ~1.5 min later, before claude-cli emits its first token
User never receives a response

Result: bot has been functionally unresponsive for 16+ days even though it accepts and acknowledges incoming messages.

Environment

OS: Ubuntu 24.04 on WSL2 (Windows 11 Pro)
Node: v24.15.0
OpenClaw: 2026.5.18 (50a2481), upgraded from 2026.4.23
Channel: Telegram (single bot, no other consumers)
CLI backend: claude-cli (Claude Code 2.1.119) via custom wrapper
Agent profile: `messaging`

Reproduction

Install openclaw 2026.4.23, configure Telegram channel + claude-cli backend.
Use it normally for some weeks (bot responds OK).
Upgrade to 2026.5.18 (`npm i -g openclaw@latest`).
Restart `openclaw-gateway` via systemd.
Observe gateway log: ready → telegram provider starts → SIGTERM at ~30-90s. Loops.
Send DM to bot → `[telegram] Inbound message` appears, `[agent/cli-backend] cli exec` fires, but SIGTERM kills gateway before claude-cli output. No outbound `sendMessage` logged. User receives nothing.

Evidence

Diagnostics bundle: attached (`openclaw-diagnostics-2026-05-20T14-48-45-951Z-1942.zip`, 10.7 KiB, 8 files, payload-free).

Representative log excerpt (one full cycle where a real user message was received and processed):

``` 2026-05-20T13:10:34.347Z [gateway] loading configuration… 2026-05-20T13:10:38.937Z [gateway] ready 2026-05-20T13:10:39.052Z [telegram] [default] starting provider (@Festinnbot) 2026-05-20T13:10:40.258Z [telegram] [diag] isolated polling ingress started spool=... 2026-05-20T13:10:40.287Z [telegram] Inbound message telegram:<userId> -> @Festinnbot (direct, 15 chars) 2026-05-20T13:10:40.368Z tools policy: profile "messaging" (agent "main") has configured tool sections (tools.exec / tools.fs) that no longer implicitly widen the profile. Add alsoAllow: ["exec", "process", "read", "write", "edit"] explicitly if these tools should be available. See #47487. 2026-05-20T13:10:52.166Z [agent/cli-backend] cli session reset: provider=claude-cli reason=system-prompt 2026-05-20T13:10:52.360Z [agent/cli-backend] cli exec: provider=claude-cli model=sonnet promptChars=4320 trigger=user useResume=false session=none resumeSession=none reuse=invalidated:system-prompt historyPrompt=present 2026-05-20T13:11:39.900Z [health:debug] channel { ... } 2026-05-20T13:11:40.580Z [health:debug] probe.bot { channel: 'telegram', accountId: 'default', username: 'Festinnbot' } 2026-05-20T13:12:18.002Z [gateway] signal SIGTERM received 2026-05-20T13:12:18.016Z [gateway] received SIGTERM; shutting down 2026-05-20T13:12:18.037Z [shutdown] started: gateway stopping 2026-05-20T13:12:18.186Z [shutdown] completed cleanly in 147ms ```

Pattern across 20+ cycles in last 2h: SIGTERM lands ~85-90s after `[telegram] [default] starting provider`, regardless of whether a message was received. Always preceded by `[health:debug] probe.bot` returning OK with the correct `@Festinnbot` username.

journalctl --user output is empty in this WSL2 install (no user systemd journal). System journalctl was also queried, no matching entries for the gateway service shutdown initiator.

What I tried (did not resolve)

Change	Effect
`OPENCLAW_DISABLE_BONJOUR=1`	Reduced loop frequency from ~10s to ~30-60s. mDNS not viable in WSL2. Partial.
`OPENCLAW_NO_RESPAWN=1`	Gateway dies once and stays dead. Confirms parent process supervises the child.
`gateway.channelHealthCheckMinutes: 0`, `channelMaxRestartsPerHour: 1000`, `channelStaleEventThresholdMinutes: 1440`	No effect, loop persists.
Add `tools.alsoAllow: ["exec","process","read","write","edit"]`	Silenced the "tools policy" warning. Extended uptime from ~30-60s to ~90s. Did not stop loop.
Stop service for 90s+ to release Telegram long-poll lease, then restart	Loop returns within ~90s. `curl getUpdates` from outside gateway returns `{ok:true,result:[]}` — no other consumer holding the token.

The `tools policy` warning correlation is suspicious. The new v2026.5.x breaking change requires explicit `alsoAllow`, but the proper structural form (e.g., `tools.profile` as object with `{name, alsoAllow}`) is rejected by the config schema:

``` tools.profile: Invalid input (allowed: "minimal", "coding", "messaging", "full") ```

The workaround was to add `alsoAllow` at `tools` root level. That schema-accepted, but unclear if it is the intended location.

Hypothesis (best guess)

CLI watchdog (`noOutputTimeoutMs`) fires when claude-cli takes >~85s to emit first token. claude-cli in stream-json mode with thinking=off should normally emit within seconds for short prompts, but with the wrapper script and the `--add-dir` traversal of large repos, first-token latency can spike. Suggesting:

Either the `noOutputTimeoutMs` default is too aggressive for stream-json mode with large workspaces, or
Something else (gateway supervisor, parent process) is sending SIGTERM independently of cli watchdog.

The exact code path emitting SIGTERM is opaque in the minified `dist/`. From source map names: `onSigterm` handler at `run-loop.ts:461` only logs and processes the signal — it does not originate it. The originator remains unidentified.

Requested

Documented env var or openclaw.json key to extend `noOutputTimeoutMs` (tried `agents.defaults.cliBackends.claude-cli.noOutputTimeoutMs`, schema rejects).
Confirmation of intended location for `tools.alsoAllow`.
Investigation of supervisor that sends SIGTERM independent of the user's `Restart=on-failure` systemd unit (parent process supervises the child gateway).

This is reproducible 100% on this install. Happy to attach more bundles or run specific diagnostics.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering