openclaw - 💡(How to fix) Fix Slack socket starved by blocking SQLite VACUUM on bloated main.sqlite; interrupted vacuums leak 53GB of orphaned .tmp files

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On a long-running gateway, synchronous SQLite work against ~/.openclaw/memory/main.sqlite blocks the Node event loop for 8–55 seconds at a time. This starves the Slack Socket Mode ping/pong, so Slack disconnects and frequently never reconnects — the agent goes silent on Slack while Telegram keeps working. A secondary bug makes it self-perpetuating: interrupted VACUUM/backup operations leave orphaned main.sqlite.tmp-<uuid> copies that are never cleaned up (53 GB / 1,570 files accumulated here).

Root Cause

Root cause (evidence gathered)

RAW_BUFFERClick to expand / collapse

Summary

On a long-running gateway, synchronous SQLite work against ~/.openclaw/memory/main.sqlite blocks the Node event loop for 8–55 seconds at a time. This starves the Slack Socket Mode ping/pong, so Slack disconnects and frequently never reconnects — the agent goes silent on Slack while Telegram keeps working. A secondary bug makes it self-perpetuating: interrupted VACUUM/backup operations leave orphaned main.sqlite.tmp-<uuid> copies that are never cleaned up (53 GB / 1,570 files accumulated here).

Environment

  • OpenClaw 2026.5.7 (eeef486)
  • Windows 10 Pro 19045, Node v24.13.0
  • Gateway runs as a Windows Scheduled Task
  • Channels: Slack (Socket Mode) + Telegram

Symptoms

  • channels/slack: slack socket disconnected (disconnect); reconnecting in 2s (attempt 1/12) followed by no slack socket mode connected for many minutes. Slack stays dead until a manual openclaw gateway restart.
  • Worst case it logs channel stop exceeded 5000ms after abort and is then permanently wedged (only a gateway restart recovers it).
  • Telegram is unaffected throughout, so the agent looks half-alive.
  • Coincident diagnostic liveness warnings:
    • liveness warning: reasons=event_loop_delay,event_loop_utilization eventLoopDelayP99Ms=9940.5 eventLoopDelayMaxMs=9940.5
    • repeated values in the 8,000–10,000 ms range, recurring every few minutes under normal load and during embedded agent runs ([trace:embedded-run] ... totalMs=17085).

Root cause (evidence gathered)

main.sqlite had grown to 664 MB (689 MB on a prior occasion). Table sizes via dbstat:

tablesize
chunks287 MB
embedding_cache250 MB (8,775 rows)
chunks_vec_vector_chunks0096 MB
chunks_fts_*~25 MB

A full VACUUM of this DB took 55 seconds, during which the event loop is blocked the entire time — which itself triggers the very Slack disconnects described above. After DELETE FROM embedding_cache the same VACUUM completed in ~8 s and reclaimed 251 MB (664 MB → 412 MB), and the event-loop stalls / Slack drops stopped.

Secondary bug: orphaned temp files (the bigger surprise)

VACUUM/backup writes a temporary copy alongside the DB. When the process is killed or restarted mid-operation (e.g. by the gateway restart that users/watchdogs perform to recover the stuck Slack socket), the temp file is abandoned and never cleaned up. Found in ~/.openclaw/memory/:

  • 1,570 main.sqlite.tmp-<uuid> files totaling 53 GB (many 308 MB each), oldest dating back ~2 weeks.

This is a vicious cycle: large DB → long blocking VACUUM → event-loop stall → Slack drops → operator restarts gateway → VACUUM interrupted → another ~300 MB orphan → disk fills, FS scans slow, repeat.

Suggested fixes

  1. Don't run SQLite maintenance/queries on the main thread. Move memory search, session-delta sync, and VACUUM to a worker thread or a separate process so socket heartbeats keep firing.
  2. Clean up orphaned main.sqlite.tmp-* on startup (and on vacuum failure). A simple sweep of stale temp copies would have prevented 53 GB of waste.
  3. Bound embedding_cache growth (LRU/TTL or cap), or use PRAGMA incremental_vacuum / auto_vacuum=INCREMENTAL instead of a monolithic blocking VACUUM.
  4. Decouple agent runs from the Slack socket so a heavy embedded-run (observed 17 s) cannot starve Socket Mode pongs.
  5. Consider auto-reconnect hardening for the channel stop exceeded 5000ms after abort state, which currently never self-recovers.

Happy to provide full logs or dbstat dumps if useful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Slack socket starved by blocking SQLite VACUUM on bloated main.sqlite; interrupted vacuums leak 53GB of orphaned .tmp files