hermes - 💡(How to fix) Fix SQLite 'locking protocol' on NFS silently breaks /resume, /title, /history, /branch, and kanban

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When ~/.hermes is on a network filesystem (NFS, SMB/CIFS, some FUSE mounts, WSL1), SQLite's PRAGMA journal_mode=WAL fails with sqlite3.OperationalError: locking protocol. Every component that opens state.db or kanban.db swallows this error silently, and the user is left with:

  • /resume, /title, /history, /branch all respond "Session database not available." with no explanation
  • hermes update snapshot warning SQLite safe copy failed for ~/.hermes/state.db: locking protocol
  • Kanban dispatcher tick crashing every 60s with the same error
  • TUI session store unavailable warnings
  • (Downstream) the known duplicate column name: consecutive_failures kanban migration race (#21708 / #21374) firing continuously because the migration is retried on every tick

The user has no way to know why any of this is happening. Hermes does not check for WAL compatibility and does not attempt a fallback.

Error Message

2026-05-08 13:41:11 WARNING hermes_cli.backup: SQLite safe copy failed for ~/.hermes/state.db: locking protocol 2026-05-08 13:45:05 ERROR gateway.run: kanban dispatcher: tick failed on board default File "hermes_cli/kanban_db.py", line 878, in connect conn.execute("PRAGMA journal_mode=WAL") sqlite3.OperationalError: locking protocol 2026-05-08 13:46:46 WARNING tui_gateway.server: TUI session store unavailable — continuing without state.db features: locking protocol 2026-05-08 13:46:59 WARNING cli: Failed to initialize SessionDB — session will NOT be indexed for search: locking protocol 2026-05-08 13:47:08 WARNING tui_gateway.server: TUI session store unavailable — continuing without state.db features: locking protocol

Root Cause

Two files hit PRAGMA journal_mode=WAL unconditionally with no fallback:

  • hermes_state.py:201SessionDB.__init__ sets journal_mode=WAL. On failure the caller (SessionDB() in cli.py:2379, gateway/run.py:1194, tui_gateway/server.py) catches the exception and sets _session_db = None, but never tries a different journal mode.
  • hermes_cli/kanban_db.py:920connect() sets journal_mode=WAL. On failure the exception bubbles to the kanban dispatcher tick, which is retried every 60s forever.

The failure is silent downstream:

  • Gateway logs at DEBUG (gateway/run.py:1196): logger.debug("SQLite session store not available: %s", e) — invisible in errors.log.
  • CLI logs at WARNING (correct) — visible but still generic.
  • /resume error message hard-codes "Session database not available." with no cause. Nine such sites across cli.py and gateway/run.py:
    • cli.py:5368, 5479, 6755, 6770
    • gateway/run.py:10186, 10224, 10438, 10482, 10569

Fix Action

Fix / Workaround

  • /resume, /title, /history, /branch all respond "Session database not available." with no explanation
  • hermes update snapshot warning SQLite safe copy failed for ~/.hermes/state.db: locking protocol
  • Kanban dispatcher tick crashing every 60s with the same error
  • TUI session store unavailable warnings
  • (Downstream) the known duplicate column name: consecutive_failures kanban migration race (#21708 / #21374) firing continuously because the migration is retried on every tick
2026-05-08 13:41:11  WARNING hermes_cli.backup: SQLite safe copy failed for ~/.hermes/state.db: locking protocol
2026-05-08 13:45:05  ERROR gateway.run: kanban dispatcher: tick failed on board default
    File "hermes_cli/kanban_db.py", line 878, in connect
      conn.execute("PRAGMA journal_mode=WAL")
  sqlite3.OperationalError: locking protocol
2026-05-08 13:46:46  WARNING tui_gateway.server: TUI session store unavailable — continuing without state.db features: locking protocol
2026-05-08 13:46:59  WARNING cli: Failed to initialize SessionDB — session will NOT be indexed for search: locking protocol
2026-05-08 13:47:08  WARNING tui_gateway.server: TUI session store unavailable — continuing without state.db features: locking protocol

The kanban dispatcher retried this failed migration continuously until the user restarted the gateway.

Code Example

File: "/home/mormio/.hermes"
Type: nfs
ID: 0  Namelen: 255
172.26.224.200:d2dfac12/home on /home type nfs
  (rw, relatime, vers=3, rsize=1048576, wsize=1048576, namelen=255,
   hard, forcerdirplus, proto=tcp, nconnect=4, timeo=600, retrans=2,
   sec=sys, mountaddr=172.26.224.200, mountvers=3, mountport=20048,
   mountproto=udp, local_lock=none, addr=172.26.224.200)

---

2026-05-08 13:41:11  WARNING hermes_cli.backup: SQLite safe copy failed for ~/.hermes/state.db: locking protocol
2026-05-08 13:45:05  ERROR gateway.run: kanban dispatcher: tick failed on board default
    File "hermes_cli/kanban_db.py", line 878, in connect
      conn.execute("PRAGMA journal_mode=WAL")
  sqlite3.OperationalError: locking protocol
2026-05-08 13:46:46  WARNING tui_gateway.server: TUI session store unavailable — continuing without state.db features: locking protocol
2026-05-08 13:46:59  WARNING cli: Failed to initialize SessionDB — session will NOT be indexed for search: locking protocol
2026-05-08 13:47:08  WARNING tui_gateway.server: TUI session store unavailable — continuing without state.db features: locking protocol
RAW_BUFFERClick to expand / collapse

Summary

When ~/.hermes is on a network filesystem (NFS, SMB/CIFS, some FUSE mounts, WSL1), SQLite's PRAGMA journal_mode=WAL fails with sqlite3.OperationalError: locking protocol. Every component that opens state.db or kanban.db swallows this error silently, and the user is left with:

  • /resume, /title, /history, /branch all respond "Session database not available." with no explanation
  • hermes update snapshot warning SQLite safe copy failed for ~/.hermes/state.db: locking protocol
  • Kanban dispatcher tick crashing every 60s with the same error
  • TUI session store unavailable warnings
  • (Downstream) the known duplicate column name: consecutive_failures kanban migration race (#21708 / #21374) firing continuously because the migration is retried on every tick

The user has no way to know why any of this is happening. Hermes does not check for WAL compatibility and does not attempt a fallback.

Evidence

Real user debug report. Their stat -f ~/.hermes output and mount line:

File: "/home/mormio/.hermes"
Type: nfs
ID: 0  Namelen: 255
172.26.224.200:d2dfac12/home on /home type nfs
  (rw, relatime, vers=3, rsize=1048576, wsize=1048576, namelen=255,
   hard, forcerdirplus, proto=tcp, nconnect=4, timeo=600, retrans=2,
   sec=sys, mountaddr=172.26.224.200, mountvers=3, mountport=20048,
   mountproto=udp, local_lock=none, addr=172.26.224.200)

NFSv3 over TCP with local_lock=none — the exact configuration SQLite upstream documents as incompatible with WAL:

SQLite databases in WAL mode do not work over a network filesystem.

The resulting log entries in the same user's session:

2026-05-08 13:41:11  WARNING hermes_cli.backup: SQLite safe copy failed for ~/.hermes/state.db: locking protocol
2026-05-08 13:45:05  ERROR gateway.run: kanban dispatcher: tick failed on board default
    File "hermes_cli/kanban_db.py", line 878, in connect
      conn.execute("PRAGMA journal_mode=WAL")
  sqlite3.OperationalError: locking protocol
2026-05-08 13:46:46  WARNING tui_gateway.server: TUI session store unavailable — continuing without state.db features: locking protocol
2026-05-08 13:46:59  WARNING cli: Failed to initialize SessionDB — session will NOT be indexed for search: locking protocol
2026-05-08 13:47:08  WARNING tui_gateway.server: TUI session store unavailable — continuing without state.db features: locking protocol

The kanban dispatcher retried this failed migration continuously until the user restarted the gateway.

Root cause

Two files hit PRAGMA journal_mode=WAL unconditionally with no fallback:

  • hermes_state.py:201SessionDB.__init__ sets journal_mode=WAL. On failure the caller (SessionDB() in cli.py:2379, gateway/run.py:1194, tui_gateway/server.py) catches the exception and sets _session_db = None, but never tries a different journal mode.
  • hermes_cli/kanban_db.py:920connect() sets journal_mode=WAL. On failure the exception bubbles to the kanban dispatcher tick, which is retried every 60s forever.

The failure is silent downstream:

  • Gateway logs at DEBUG (gateway/run.py:1196): logger.debug("SQLite session store not available: %s", e) — invisible in errors.log.
  • CLI logs at WARNING (correct) — visible but still generic.
  • /resume error message hard-codes "Session database not available." with no cause. Nine such sites across cli.py and gateway/run.py:
    • cli.py:5368, 5479, 6755, 6770
    • gateway/run.py:10186, 10224, 10438, 10482, 10569

Who this affects

  • Users with ~/.hermes on NFS (shared university clusters, enterprise Linux, cloud dev VMs mounting team home dirs)
  • Users with ~/.hermes on SMB/CIFS, some FUSE mounts, or WSL1
  • Anyone whose state.db / kanban.db ends up in a container bind-mount where locking semantics differ

The failure mode presents to the user as "/resume just doesn't work" with no actionable diagnostic. Support burden: every affected user has to share logs with a maintainer to figure out what's broken.

Proposed fix

Three changes, all in one PR:

  1. Fall back to journal_mode=DELETE on WAL failure. DELETE mode is the SQLite default before WAL was invented; it works on NFS. Concurrency drops (no concurrent readers during writes) but the feature works. Apply the fallback in both hermes_state.py and hermes_cli/kanban_db.py. Log a single WARNING on fallback explaining why.

  2. Surface the cause in /resume and related error messages. Capture the underlying OperationalError on the failing init and include it in the user-facing string. Instead of "Session database not available.", show "Session database not available: locking protocol (state.db may be on a network filesystem — see <docs>).".

  3. Bump gateway/run.py:1196 log level from DEBUG to WARNING so the failure appears in errors.log, matching the CLI path which already does this correctly.

Deliberately out of scope for the PR

  • NFS autodetection at startup via statvfs / /proc/mounts. Fragile across Linux/macOS/WSL/Docker overlay FS. The try/except fallback approach is OS-agnostic and more robust.
  • hermes doctor integration. Separate concern, separate PR.
  • The duplicate column name: consecutive_failures kanban migration race (#21708 / #21374). Unrelated root cause; fires because of this bug (WAL failure → migration retried forever) but fixing the WAL issue stops the cascade without fixing the migration itself.

Acceptance criteria

  • SessionDB() succeeds on NFS via DELETE-mode fallback, with a single WARNING logged once per process.
  • kanban_db.connect() succeeds on NFS via the same fallback.
  • /resume on a system where SessionDB genuinely cannot open returns a message containing the underlying cause.
  • New tests cover:
    • WAL pragma raising OperationalError("locking protocol") → DELETE fallback fires, DB is usable.
    • /resume error string includes the captured cause when _session_db is None.
  • No regression in existing SessionDB / kanban tests.

References

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix SQLite 'locking protocol' on NFS silently breaks /resume, /title, /history, /branch, and kanban