hermes - 💡(How to fix) Fix apply_wal_with_fallback: DELETE fallback uncaught — crashes on APFS external SSDs

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

apply_wal_with_fallback() in hermes_state.py fails completely when ~/.hermes is on an APFS external SSD. Both WAL and DELETE journal modes throw "disk I/O error". The DELETE fallback is uncaught, so the exception propagates up and crashes every caller that depends on a SQLite connection — kanban dispatcher, SessionDB init, API server, holographic memory store, etc. The fix from #22032 added apply_wal_with_fallback() with _WAL_INCOMPAT_MARKERS including "disk i/o error". WAL failures matching these correctly trigger a DELETE fallback on line 160. However, when DELETE also fails with a disk I/O error (as seen on APFS external SSDs), that exception is NOT caught — it propagates out unhandled:

Root Cause

The fix from #22032 added apply_wal_with_fallback() with _WAL_INCOMPAT_MARKERS including "disk i/o error". WAL failures matching these correctly trigger a DELETE fallback on line 160. However, when DELETE also fails with a disk I/O error (as seen on APFS external SSDs), that exception is NOT caught — it propagates out unhandled:

except sqlite3.OperationalError as exc:
    msg = str(exc).lower()
    if not any(marker in msg for marker in _WAL_INCOMPAT_MARKERS):
        raise
    _log_wal_fallback_once(db_label, exc)
    conn.execute("PRAGMA journal_mode=DELETE")   # <-- UNCAUGHT
    return "delete"

Impact on callers:

  • SessionDB.init (hermes_state.py:354): caught by its own except, sets _last_init_error, re-raises → session DB stays None, features like /resume, /title, /history silently break
  • kanban_db.connect() (kanban_db.py:1050): caught by its own except, closes connection, re-raises → kanban dispatcher crashes every 60s when the dashboard is open
  • api_server.py (line 349): same pattern → response store unavailable
  • plugins/memory/holographic/store.py (line 134): same pattern → holographic memory store fails

Fix Action

Workaround

Manually set journal_mode=DELETE + run VACUUM on the databases:

sqlite3 ~/.hermes/state.db "PRAGMA journal_mode=DELETE; VACUUM;"
sqlite3 ~/.hermes/kanban/default.db "PRAGMA journal_mode=DELETE; VACUUM;"

This persists DELETE mode in the DB header, so subsequent connections start with DELETE and never trigger the WAL fallback path.

Code Example

except sqlite3.OperationalError as exc:
    msg = str(exc).lower()
    if not any(marker in msg for marker in _WAL_INCOMPAT_MARKERS):
        raise
    _log_wal_fallback_once(db_label, exc)
    conn.execute("PRAGMA journal_mode=DELETE")   # <-- UNCAUGHT
    return "delete"

---

sqlite3 ~/.hermes/state.db "PRAGMA journal_mode=DELETE; VACUUM;"
sqlite3 ~/.hermes/kanban/default.db "PRAGMA journal_mode=DELETE; VACUUM;"

---

def apply_wal_with_fallback(
    conn: sqlite3.Connection,
    *,
    db_label: str = "state.db",
) -> str:
    try:
        conn.execute("PRAGMA journal_mode=WAL")
        return "wal"
    except sqlite3.OperationalError as exc:
        msg = str(exc).lower()
        if not any(marker in msg for marker in _WAL_INCOMPAT_MARKERS):
            raise
        _log_wal_fallback_once(db_label, exc)
        try:
            conn.execute("PRAGMA journal_mode=DELETE")
            return "delete"
        except sqlite3.OperationalError as delete_exc:
            logger.warning(
                "%s: both WAL and DELETE journal_mode failed "
                "(WAL: %s, DELETE: %s). "
                "Continuing with default journal mode.",
                db_label, exc, delete_exc,
            )
            return "delete"
RAW_BUFFERClick to expand / collapse

Bug Description

apply_wal_with_fallback() in hermes_state.py fails completely when ~/.hermes is on an APFS external SSD. Both WAL and DELETE journal modes throw "disk I/O error". The DELETE fallback is uncaught, so the exception propagates up and crashes every caller that depends on a SQLite connection — kanban dispatcher, SessionDB init, API server, holographic memory store, etc.

Environment

  • macOS 26.5
  • APFS external SSD (Thunderbolt / USB-C)
  • ~/.hermes lives on the external volume
  • SQLite 3.x (system default)

Root Cause

The fix from #22032 added apply_wal_with_fallback() with _WAL_INCOMPAT_MARKERS including "disk i/o error". WAL failures matching these correctly trigger a DELETE fallback on line 160. However, when DELETE also fails with a disk I/O error (as seen on APFS external SSDs), that exception is NOT caught — it propagates out unhandled:

except sqlite3.OperationalError as exc:
    msg = str(exc).lower()
    if not any(marker in msg for marker in _WAL_INCOMPAT_MARKERS):
        raise
    _log_wal_fallback_once(db_label, exc)
    conn.execute("PRAGMA journal_mode=DELETE")   # <-- UNCAUGHT
    return "delete"

Impact on callers:

  • SessionDB.init (hermes_state.py:354): caught by its own except, sets _last_init_error, re-raises → session DB stays None, features like /resume, /title, /history silently break
  • kanban_db.connect() (kanban_db.py:1050): caught by its own except, closes connection, re-raises → kanban dispatcher crashes every 60s when the dashboard is open
  • api_server.py (line 349): same pattern → response store unavailable
  • plugins/memory/holographic/store.py (line 134): same pattern → holographic memory store fails

Workaround

Manually set journal_mode=DELETE + run VACUUM on the databases:

sqlite3 ~/.hermes/state.db "PRAGMA journal_mode=DELETE; VACUUM;"
sqlite3 ~/.hermes/kanban/default.db "PRAGMA journal_mode=DELETE; VACUUM;"

This persists DELETE mode in the DB header, so subsequent connections start with DELETE and never trigger the WAL fallback path.

Proposed Fix

Wrap the DELETE fallback in a try/except. If both WAL and DELETE fail, log a warning and continue with the connection's default journal mode.

def apply_wal_with_fallback(
    conn: sqlite3.Connection,
    *,
    db_label: str = "state.db",
) -> str:
    try:
        conn.execute("PRAGMA journal_mode=WAL")
        return "wal"
    except sqlite3.OperationalError as exc:
        msg = str(exc).lower()
        if not any(marker in msg for marker in _WAL_INCOMPAT_MARKERS):
            raise
        _log_wal_fallback_once(db_label, exc)
        try:
            conn.execute("PRAGMA journal_mode=DELETE")
            return "delete"
        except sqlite3.OperationalError as delete_exc:
            logger.warning(
                "%s: both WAL and DELETE journal_mode failed "
                "(WAL: %s, DELETE: %s). "
                "Continuing with default journal mode.",
                db_label, exc, delete_exc,
            )
            return "delete"

Tests to update

test_captures_cause_on_failed_init in tests/test_hermes_state_wal_fallback.py currently expects SessionDB() to raise when both pragmas fail. With the fix, SessionDB would succeed (both errors caught internally). Update the test to verify:

  1. SessionDB() succeeds despite both journal_mode pragmas failing
  2. The connection is usable for reads/writes
  3. A warning is logged (new test or extend the existing one)

All callers (would benefit from the fix without any changes)

  • hermes_state.py:354 — SessionDB.init
  • hermes_cli/kanban_db.py:1050 — kanban_db.connect()
  • gateway/platforms/api_server.py:349 — ResponseStore init
  • plugins/memory/holographic/store.py:134 — MemoryStore init

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix apply_wal_with_fallback: DELETE fallback uncaught — crashes on APFS external SSDs