hermes - 💡(How to fix) Fix state.db WAL file grows unbounded — PASSIVE checkpoint never truncates [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Check state.db-wal file size before the checkpoint call. Use TRUNCATE when above a threshold (e.g. 10 MB), PASSIVE otherwise. Path is already imported at the top of the file. TRUNCATE fails gracefully if readers block it — no change to error handling needed.

Root Cause

_try_wal_checkpoint() (hermes_state.py, line 438) and close() (line 457) both use PRAGMA wal_checkpoint(PASSIVE). SQLite's PASSIVE mode moves dirty pages back to the main DB but never shrinks the WAL file on disk — the file stays at its high-water mark until TRUNCATE is called.

The only place TRUNCATE runs today is inside vacuum() (line 2789), which is only triggered by maybe_auto_prune_and_vacuum() when sessions older than 90 days are actually pruned. With retention_days: 90 and an active install, sessions may never age out — so TRUNCATE never fires and the WAL grows indefinitely.

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Problem

state.db-wal grows without bound during normal operation. Recurring daily audits (2026-05-08 through 2026-05-11) flagged state.db bloat, requiring manual PRAGMA wal_checkpoint(TRUNCATE) runs each time.

Root Cause

_try_wal_checkpoint() (hermes_state.py, line 438) and close() (line 457) both use PRAGMA wal_checkpoint(PASSIVE). SQLite's PASSIVE mode moves dirty pages back to the main DB but never shrinks the WAL file on disk — the file stays at its high-water mark until TRUNCATE is called.

The only place TRUNCATE runs today is inside vacuum() (line 2789), which is only triggered by maybe_auto_prune_and_vacuum() when sessions older than 90 days are actually pruned. With retention_days: 90 and an active install, sessions may never age out — so TRUNCATE never fires and the WAL grows indefinitely.

Suggested Fix

Two minimal, non-breaking changes:

1. close() — change PASSIVE → TRUNCATE On process exit, competing readers from the same connection are gone. TRUNCATE returns busy=1 if blocked by another process (the existing try/except handles this safely). Every clean shutdown then shrinks the WAL to 0 bytes.

2. _try_wal_checkpoint() — add size-based TRUNCATE Check state.db-wal file size before the checkpoint call. Use TRUNCATE when above a threshold (e.g. 10 MB), PASSIVE otherwise. Path is already imported at the top of the file. TRUNCATE fails gracefully if readers block it — no change to error handling needed.

Both changes leave the try/except wrappers intact and remain best-effort. The weekly_maintenance.py script (VACUUM + TRUNCATE) acts as the backstop for anything the connection-level checkpoints miss.

Reproduction

Run a long-lived gateway or CLI session with many writes (50+ messages). Observe ~/.hermes/state.db-wal grow across restarts without ever shrinking.

Environment

  • Hermes Agent, SQLite WAL mode
  • Multi-process setup: gateway + CLI sharing one state.db
  • sessions.auto_prune: true, retention_days: 90 — no sessions old enough to prune → VACUUM never fires → WAL never gets TRUNCATE

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING