openclaw - 💡(How to fix) Fix Slack socket starved by blocking SQLite VACUUM on bloated main.sqlite; interrupted vacuums leak 53GB of orphaned .tmp files

openclaw2026-05-18 17:23:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

On a long-running gateway, synchronous SQLite work against ~/.openclaw/memory/main.sqlite blocks the Node event loop for 8–55 seconds at a time. This starves the Slack Socket Mode ping/pong, so Slack disconnects and frequently never reconnects — the agent goes silent on Slack while Telegram keeps working. A secondary bug makes it self-perpetuating: interrupted VACUUM/backup operations leave orphaned main.sqlite.tmp-<uuid> copies that are never cleaned up (53 GB / 1,570 files accumulated here).

Root Cause

Root cause (evidence gathered)

RAW_BUFFERClick to expand / collapse

Summary

Environment

OpenClaw 2026.5.7 (eeef486)
Windows 10 Pro 19045, Node v24.13.0
Gateway runs as a Windows Scheduled Task
Channels: Slack (Socket Mode) + Telegram

Symptoms

channels/slack: slack socket disconnected (disconnect); reconnecting in 2s (attempt 1/12) followed by no slack socket mode connected for many minutes. Slack stays dead until a manual openclaw gateway restart.
Worst case it logs channel stop exceeded 5000ms after abort and is then permanently wedged (only a gateway restart recovers it).
Telegram is unaffected throughout, so the agent looks half-alive.
Coincident diagnostic liveness warnings:
- liveness warning: reasons=event_loop_delay,event_loop_utilization eventLoopDelayP99Ms=9940.5 eventLoopDelayMaxMs=9940.5
- repeated values in the 8,000–10,000 ms range, recurring every few minutes under normal load and during embedded agent runs ([trace:embedded-run] ... totalMs=17085).

Root cause (evidence gathered)

main.sqlite had grown to 664 MB (689 MB on a prior occasion). Table sizes via dbstat:

table	size
`chunks`	287 MB
`embedding_cache`	250 MB (8,775 rows)
`chunks_vec_vector_chunks00`	96 MB
`chunks_fts_*`	~25 MB

A full VACUUM of this DB took 55 seconds, during which the event loop is blocked the entire time — which itself triggers the very Slack disconnects described above. After DELETE FROM embedding_cache the same VACUUM completed in ~8 s and reclaimed 251 MB (664 MB → 412 MB), and the event-loop stalls / Slack drops stopped.

Secondary bug: orphaned temp files (the bigger surprise)

VACUUM/backup writes a temporary copy alongside the DB. When the process is killed or restarted mid-operation (e.g. by the gateway restart that users/watchdogs perform to recover the stuck Slack socket), the temp file is abandoned and never cleaned up. Found in ~/.openclaw/memory/:

1,570 main.sqlite.tmp-<uuid> files totaling 53 GB (many 308 MB each), oldest dating back ~2 weeks.

This is a vicious cycle: large DB → long blocking VACUUM → event-loop stall → Slack drops → operator restarts gateway → VACUUM interrupted → another ~300 MB orphan → disk fills, FS scans slow, repeat.

Suggested fixes

Don't run SQLite maintenance/queries on the main thread. Move memory search, session-delta sync, and VACUUM to a worker thread or a separate process so socket heartbeats keep firing.
Clean up orphaned main.sqlite.tmp-* on startup (and on vacuum failure). A simple sweep of stale temp copies would have prevented 53 GB of waste.
Bound embedding_cache growth (LRU/TTL or cap), or use PRAGMA incremental_vacuum / auto_vacuum=INCREMENTAL instead of a monolithic blocking VACUUM.
Decouple agent runs from the Slack socket so a heavy embedded-run (observed 17 s) cannot starve Socket Mode pongs.
Consider auto-reconnect hardening for the channel stop exceeded 5000ms after abort state, which currently never self-recovers.

Happy to provide full logs or dbstat dumps if useful.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#training loop #device allocation #model download #tokenizer error #prompt formatting

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Slack socket starved by blocking SQLite VACUUM on bloated main.sqlite; interrupted vacuums leak 53GB of orphaned .tmp files

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root cause (evidence gathered)

Summary

Environment

Symptoms

Root cause (evidence gathered)

Secondary bug: orphaned temp files (the bigger surprise)

Suggested fixes

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Slack socket starved by blocking SQLite VACUUM on bloated main.sqlite; interrupted vacuums leak 53GB of orphaned .tmp files

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root cause (evidence gathered)

Summary

Environment

Symptoms

Root cause (evidence gathered)

Secondary bug: orphaned temp files (the bigger surprise)

Suggested fixes

Still need to ship something?

RELATED_DISCOVERY

TRENDING