openclaw - ✅(Solved) Fix [Bug]: Gateway becomes unresponsive ~60s after startup on 2026.3.31 [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58651Fetched 2026-04-08 01:59:43
View on GitHub
Comments
2
Participants
3
Timeline
7
Reactions
0
Author
Timeline (top)
commented ×2labeled ×2closed ×1cross-referenced ×1

After upgrading from 2026.3.28 to 2026.3.31, the gateway starts successfully (binds port, loads plugins, connects WhatsApp) but becomes completely unresponsive within 30-60 seconds. The process cannot be stopped gracefully (SIGTERM times out after 30s, requires SIGKILL). Consistent across multiple restart attempts.

Rolled back to 2026.3.28 — immediately stable, no issues.

Error Message

  • No ERROR or FATAL entries in logs — the log just stops
  • No ERROR or FATAL entries in logs — the log just stops

Root Cause

Suspected Root Cause

Fix Action

Fixed

PR fix notes

PR #58670: fix(tasks): prevent synchronous task registry sweep from blocking event loop

Description (problem / solution / changelog)

Summary

  • Problem: After upgrading to 2026.3.31, the gateway becomes completely unresponsive within 30-60 seconds of startup. The process hangs and requires SIGKILL to terminate. This is caused by the task registry maintenance sweep in src/tasks/task-registry.maintenance.ts and src/tasks/task-registry.store.sqlite.ts.
  • Root Cause: The task registry maintenance sweep (sweepTaskRegistry) runs every 60 seconds on the main thread. It performs synchronous SQLite I/O using node:sqlite DatabaseSync with PRAGMA synchronous = FULL. Furthermore, after every single row upsert/delete, it redundantly calls ensureTaskRegistryPermissions() which performs 5-7 synchronous filesystem syscalls (mkdirSync, chmodSync, existsSync). When combined with a large number of tasks or other plugins (like LCM) doing sync I/O, this blocks the Node.js event loop entirely.
  • Fix:
    1. Changed PRAGMA synchronous = FULL to NORMAL in task-registry.store.sqlite.ts. Since the task registry is a reconstructable cache, FULL (which forces an fsync on every commit) is overkill and NORMAL is the recommended safe default for WAL mode.
    2. Removed redundant ensureTaskRegistryPermissions() calls from individual row operations (they are still correctly enforced at database open and batch transaction boundaries).
    3. Refactored sweepTaskRegistry and runTaskRegistryMaintenance to be async, and introduced yieldToEventLoop() (via setImmediate) every 25 tasks to prevent the sweep from monopolizing the event loop.
    4. Deferred the initial sweep at startup using setTimeout to avoid blocking the critical startup window.
  • What changed:
    • src/tasks/task-registry.store.sqlite.ts: Changed PRAGMA sync setting and removed redundant permission checks.
    • src/tasks/task-registry.maintenance.ts: Made sweep functions async, added yielding, and deferred initial sweep.
    • src/commands/tasks.ts: Added await to runTaskRegistryMaintenance call.
    • src/tasks/task-registry.test.ts: Updated tests to await the now-async maintenance functions.
  • What did NOT change (scope boundary): The actual logic for determining which tasks are lost or pruned remains unchanged. The SQLite schema and data structures are untouched. No changes were made to the plugin loader or other subsystems.

Reproduction

  1. Start the gateway on version 2026.3.31 with a populated task registry and the LCM plugin enabled.
  2. Wait for approximately 60 seconds.
  3. Observe that the gateway becomes unreachable on localhost and cannot handle SIGTERM.

Risk / Mitigation

  • Risk: Making the sweep asynchronous could theoretically allow overlapping sweeps if a sweep takes longer than 60 seconds.
  • Mitigation: Added a sweepInProgress boolean flag in startTaskRegistryMaintenance to skip the interval tick if the previous async sweep is still running, preventing any overlap. Tests were updated to ensure the async behavior is correctly awaited.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway
  • Tasks

Linked Issue/PR

Fixes #58651

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/commands/tasks.ts (modified, +3/-1)
  • src/tasks/task-registry.maintenance.ts (modified, +58/-11)
  • src/tasks/task-registry.store.sqlite.ts (modified, +1/-5)
  • src/tasks/task-registry.test.ts (modified, +39/-3)

Code Example

17:41:11 [gateway]  [lcm] Plugin loaded  ← Initial load (normal)
17:41:34 [plugins]  [lcm] Plugin loaded  ← Reload 23s later (NOT normal)
17:42:28 [plugins]  [lcm] Plugin loaded  ← Reload 54s later
17:43:26 [plugins]  [lcm] Plugin loaded  ← Reload 58s later
17:44:08 [plugins]  [lcm] Plugin loaded  ← Reload 42s later

---

he detailed log (`/tmp/openclaw/openclaw-2026-03-31.log`) shows the LCM plugin being re-initialized by the `plugins` subsystem every ~55-60 seconds **after** the gateway has already completed startup:


17:41:11 [gateway]  [lcm] Plugin loaded  ← Initial load (normal)
17:41:34 [plugins]  [lcm] Plugin loaded  ← Reload 23s later (NOT normal)
17:42:28 [plugins]  [lcm] Plugin loaded  ← Reload 54s later
17:43:26 [plugins]  [lcm] Plugin loaded  ← Reload 58s later
17:44:08 [plugins]  [lcm] Plugin loaded  ← Reload 42s later


**This does NOT happen on 2026.3.28.** On 3.28, we see only the initial `[gateway]` load — no subsequent `[plugins]` reloads.

Each reload cycle re-initializes the LCM plugin (which opens a SQLite database synchronously). Combined with the new SQLite-backed task registry introduced in 3.31 (`~/.openclaw/tasks/runs.sqlite`, using `node:sqlite` `DatabaseSync`), these periodic synchronous database operations appear to block the Node.js event loop.

## Observed Behavior

**Startup completes successfully every time:**
- Port 18789 bound ✅
- LCM plugin loaded ✅
- WhatsApp provider started and listening ✅
- Tailscale serve enabled ✅
- Cron started (8 jobs)- Browser control listening ✅

**Then within 30-60 seconds:**
- Gateway becomes unreachable on localhost
- `openclaw status` reports "Gateway is unreachable"
- No ERROR or FATAL entries in logs — the log just stops
- Process cannot handle SIGTERM (hangs for 30s until SIGKILL)

**Memory metrics from systemd on each SIGKILL:**
- CPU time: 50-57 seconds consumed
- Memory peak: 2.4-2.5GB
- Swap peak: 766-782MB

No OOM kills in kernel logs. The 2.5GB peak on a 32GB system is well within limits, but the 780MB swap usage is unusual and may indicate memory pressure during the synchronous SQLite operations.
RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

After upgrading from 2026.3.28 to 2026.3.31, the gateway starts successfully (binds port, loads plugins, connects WhatsApp) but becomes completely unresponsive within 30-60 seconds. The process cannot be stopped gracefully (SIGTERM times out after 30s, requires SIGKILL). Consistent across multiple restart attempts.

Rolled back to 2026.3.28 — immediately stable, no issues.

Environment

  • OpenClaw: 2026.3.31 (upgraded from 2026.3.28)
  • Node.js: v25.8.2
  • OS: Ubuntu 24.04, Linux 6.8.0-106-generic (x64)
  • RAM: 32GB (31Gi usable), 15Gi swap
  • LCM plugin: lossless-claw 0.5.2
  • Channels: WhatsApp only
  • Other plugins: browser, memory (qmd)
  • Service manager: systemd user unit

Steps to reproduce

Steps to Reproduce

Attempt 1 (gateway running during install):

  1. Running 2026.3.28 stable (including LCM lossless-claw 0.5.2)
  2. npm install -g [email protected] --min-release-age=0
  3. Gateway auto-restarts via systemd
  4. Gateway starts, logs show normal startup sequence
  5. Within 30-60 seconds, gateway becomes unresponsive
  6. openclaw gateway status shows port timed out
  7. SIGTERM times out (30s), systemd sends SIGKILL

Attempt 2 (clean shutdown before install):

  1. systemctl --user stop openclaw-gateway (verified no processes running)
  2. npm install -g [email protected] --min-release-age=0
  3. openclaw config validate (passed)
  4. systemctl --user start openclaw-gateway
  5. Same behavior — starts fine, unresponsive within 60 seconds
  6. Full system reboot attempted — same result after reboot
  7. 4+ restart cycles, all identical behavior

Rollback:

  1. npm install -g [email protected] --min-release-age=0
  2. Gateway immediately stable on 3.28

Expected behavior

After the upgrade openclaw should start and stay online instead of having the gateway show as online for about a minute then go to unreachable. I don't know if this is related to LCM (lossless-claw 0.52) but that is a plugin I see failing.

Actual behavior

Key Evidence: Abnormal Plugin Reload Cycle

The detailed log (/tmp/openclaw/openclaw-2026-03-31.log) shows the LCM plugin being re-initialized by the plugins subsystem every ~55-60 seconds after the gateway has already completed startup:

17:41:11 [gateway]  [lcm] Plugin loaded  ← Initial load (normal)
17:41:34 [plugins]  [lcm] Plugin loaded  ← Reload 23s later (NOT normal)
17:42:28 [plugins]  [lcm] Plugin loaded  ← Reload 54s later
17:43:26 [plugins]  [lcm] Plugin loaded  ← Reload 58s later
17:44:08 [plugins]  [lcm] Plugin loaded  ← Reload 42s later

This does NOT happen on 2026.3.28. On 3.28, we see only the initial [gateway] load — no subsequent [plugins] reloads.

Each reload cycle re-initializes the LCM plugin (which opens a SQLite database synchronously). Combined with the new SQLite-backed task registry introduced in 3.31 (~/.openclaw/tasks/runs.sqlite, using node:sqlite DatabaseSync), these periodic synchronous database operations appear to block the Node.js event loop.

Observed Behavior

Startup completes successfully every time:

  • Port 18789 bound ✅
  • LCM plugin loaded ✅
  • WhatsApp provider started and listening ✅
  • Tailscale serve enabled ✅
  • Cron started (8 jobs) ✅
  • Browser control listening ✅

Then within 30-60 seconds:

  • Gateway becomes unreachable on localhost
  • openclaw status reports "Gateway is unreachable"
  • No ERROR or FATAL entries in logs — the log just stops
  • Process cannot handle SIGTERM (hangs for 30s until SIGKILL)

Memory metrics from systemd on each SIGKILL:

  • CPU time: 50-57 seconds consumed
  • Memory peak: 2.4-2.5GB
  • Swap peak: 766-782MB

No OOM kills in kernel logs. The 2.5GB peak on a 32GB system is well within limits, but the 780MB swap usage is unusual and may indicate memory pressure during the synchronous SQLite operations.

OpenClaw version

2026.3.31

Operating system

Ubuntu 24.04

Install method

npm global

Model

anthropic/opus-4-6

Provider / routing chain

openclaw -> anthropic -> opus-4-6

Additional provider/model setup details

No response

Logs, screenshots, and evidence

he detailed log (`/tmp/openclaw/openclaw-2026-03-31.log`) shows the LCM plugin being re-initialized by the `plugins` subsystem every ~55-60 seconds **after** the gateway has already completed startup:


17:41:11 [gateway]  [lcm] Plugin loaded  ← Initial load (normal)
17:41:34 [plugins]  [lcm] Plugin loaded  ← Reload 23s later (NOT normal)
17:42:28 [plugins]  [lcm] Plugin loaded  ← Reload 54s later
17:43:26 [plugins]  [lcm] Plugin loaded  ← Reload 58s later
17:44:08 [plugins]  [lcm] Plugin loaded  ← Reload 42s later


**This does NOT happen on 2026.3.28.** On 3.28, we see only the initial `[gateway]` load — no subsequent `[plugins]` reloads.

Each reload cycle re-initializes the LCM plugin (which opens a SQLite database synchronously). Combined with the new SQLite-backed task registry introduced in 3.31 (`~/.openclaw/tasks/runs.sqlite`, using `node:sqlite` `DatabaseSync`), these periodic synchronous database operations appear to block the Node.js event loop.

## Observed Behavior

**Startup completes successfully every time:**
- Port 18789 bound ✅
- LCM plugin loaded ✅
- WhatsApp provider started and listening ✅
- Tailscale serve enabled ✅
- Cron started (8 jobs)- Browser control listening ✅

**Then within 30-60 seconds:**
- Gateway becomes unreachable on localhost
- `openclaw status` reports "Gateway is unreachable"
- No ERROR or FATAL entries in logs — the log just stops
- Process cannot handle SIGTERM (hangs for 30s until SIGKILL)

**Memory metrics from systemd on each SIGKILL:**
- CPU time: 50-57 seconds consumed
- Memory peak: 2.4-2.5GB
- Swap peak: 766-782MB

No OOM kills in kernel logs. The 2.5GB peak on a 32GB system is well within limits, but the 780MB swap usage is unusual and may indicate memory pressure during the synchronous SQLite operations.

Impact and severity

  • Severity: High
  • Impact: Openclaw unavailable
  • Only way to fix is to revert to 2026.3.28

Additional information

Other Observations

  • runs.sqlite was successfully created on first 3.31 startup (49KB, correct schema, 0 rows, WAL mode). The task store migration itself completed fine.
  • WhatsApp creds.json gets corrupted on each SIGKILL (restored from backup on next start) — this is a symptom of the forced kills, not a cause.
  • The systemd unit Description and OPENCLAW_SERVICE_VERSION still show v2026.3.22 after upgrade (cosmetic — openclaw --version correctly reports 3.31). The postinstall hook doesn't update the systemd unit file.
  • Config changes from 3.28→3.31 were minimal: lastTouchedVersion timestamp, "browser" added to plugins.allow, browser plugin auto-enabled.

Suspected Root Cause

3.31 introduced a plugin reload/reconciliation cycle that runs approximately every 60 seconds. This cycle re-initializes all plugins (including LCM, which does synchronous SQLite I/O via node:sqlite DatabaseSync). The new task registry also uses synchronous SQLite. The combination of periodic synchronous database operations appears to block the event loop, making the gateway unresponsive.

extent analysis

TL;DR

The most likely fix is to disable the periodic plugin reload cycle introduced in OpenClaw 2026.3.31, which is causing the gateway to become unresponsive due to synchronous SQLite operations blocking the Node.js event loop.

Guidance

  • Investigate the plugin reload cycle and its interaction with the LCM plugin and SQLite-backed task registry to understand the root cause of the issue.
  • Consider disabling or modifying the plugin reload cycle to prevent synchronous database operations from blocking the event loop.
  • Review the node:sqlite DatabaseSync usage in the LCM plugin and task registry to see if asynchronous alternatives can be used to prevent event loop blocking.
  • Test the gateway with the LCM plugin disabled or removed to see if the issue persists, which can help determine if the problem is specific to the LCM plugin or a more general issue with the plugin reload cycle.

Example

No specific code example is provided, as the issue is related to the interaction between OpenClaw and its plugins, and requires further investigation and modification of the underlying code.

Notes

The issue appears to be specific to OpenClaw 2026.3.31 and may not affect earlier or later versions. The problem is likely related to the introduction of the plugin reload cycle and its interaction with synchronous database operations.

Recommendation

Apply a workaround to disable the periodic plugin reload cycle or modify the LCM plugin and task registry to use asynchronous database operations, as upgrading to a fixed version is not currently an option. This will likely require modifications to the OpenClaw codebase or plugins, and may require additional testing and validation to ensure the workaround does not introduce new issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After the upgrade openclaw should start and stay online instead of having the gateway show as online for about a minute then go to unreachable. I don't know if this is related to LCM (lossless-claw 0.52) but that is a plugin I see failing.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING