hermes - 💡(How to fix) Fix Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The gateway embedded Kanban dispatcher currently opens and closes Kanban SQLite connections on every dispatcher tick. The dispatch path opens one connection per board, and the health telemetry path opens another connection per board on the same tick. In a long-running gateway process with a short dispatch interval, this creates repeated SQLite WAL/SHM connection churn and file descriptor pressure.

We observed this locally in a long-running Mission Control gateway process: lsof showed multiple open handles for kanban.db, kanban.db-wal, and kanban.db-shm. A prior local patch (DEC-2026-05-23-024) mitigated the close path by using a _WalSafeConnection that runs PRAGMA wal_checkpoint(TRUNCATE) before close, but that does not remove the underlying dispatcher churn pattern.

Root Cause

This is safe for event-loop blocking but unfavorable for persistent SQLite connection reuse because default sqlite connections are thread-affine.

Fix Action

Fix / Workaround

Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Title: Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

The gateway embedded Kanban dispatcher currently opens and closes Kanban SQLite connections on every dispatcher tick. The dispatch path opens one connection per board, and the health telemetry path opens another connection per board on the same tick. In a long-running gateway process with a short dispatch interval, this creates repeated SQLite WAL/SHM connection churn and file descriptor pressure.

Code Example

venv/bin/python -m pytest tests/gateway/test_kanban_dispatcher.py tests/hermes_cli/test_kanban_db.py -q

---

178 passed in 4.59s
RAW_BUFFERClick to expand / collapse

Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Title: Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

Summary

The gateway embedded Kanban dispatcher currently opens and closes Kanban SQLite connections on every dispatcher tick. The dispatch path opens one connection per board, and the health telemetry path opens another connection per board on the same tick. In a long-running gateway process with a short dispatch interval, this creates repeated SQLite WAL/SHM connection churn and file descriptor pressure.

We observed this locally in a long-running Mission Control gateway process: lsof showed multiple open handles for kanban.db, kanban.db-wal, and kanban.db-shm. A prior local patch (DEC-2026-05-23-024) mitigated the close path by using a _WalSafeConnection that runs PRAGMA wal_checkpoint(TRUNCATE) before close, but that does not remove the underlying dispatcher churn pattern.

Affected area

  • gateway/run.py
  • Embedded _kanban_dispatcher_watcher()
  • Kanban DB dispatch path and dispatcher health probe

Current behavior

Per tick:

  1. _tick_once_for_board() opens _kb.connect(board=slug), calls _kb.dispatch_once(...), then closes the connection.
  2. _ready_nonempty() opens another _kb.connect(board=slug) for health telemetry, checks spawnable ready/review tasks, then closes the connection.
  3. The watcher uses asyncio.to_thread(...), so work may run on arbitrary default executor threads across ticks.

This is safe for event-loop blocking but unfavorable for persistent SQLite connection reuse because default sqlite connections are thread-affine.

Expected behavior

The embedded dispatcher should avoid per-tick SQLite WAL connection churn while keeping DB work off the event loop and preserving sqlite thread affinity.

Proposed fix

Use a dedicated single-thread ThreadPoolExecutor for dispatcher DB work and maintain a per-board persistent SQLite connection cache inside the dispatcher watcher:

  • one executor thread named kanban-dispatcher,
  • one cached connection per active board,
  • dispatch and ready/review health probes share the cached board connection,
  • fingerprint changes close and reopen the cached connection,
  • corrupt-board handling closes/discards cached connection and suppresses retry until DB fingerprint changes,
  • watcher shutdown/cancellation closes all cached connections on the dispatcher executor thread.

This is upstreamable because it is a minimal runtime change and does not add deployment-specific assumptions.

Local validation

Focused tests added locally:

  • tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_uses_dedicated_single_thread_executor
  • tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reuses_board_connection_across_ticks
  • tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_health_probe_uses_cached_connection
  • tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_closes_cached_connection_on_shutdown
  • tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reopens_cached_connection_when_fingerprint_changes
  • tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_corrupt_board_closes_and_suppresses_until_fingerprint_changes

Command:

venv/bin/python -m pytest tests/gateway/test_kanban_dispatcher.py tests/hermes_cli/test_kanban_db.py -q

Result:

178 passed in 4.59s

Related local evidence

Local deviation DEC-2026-05-23-024 previously addressed the close-path symptom with _WalSafeConnection.close() running PRAGMA wal_checkpoint(TRUNCATE) before super().close(). This issue is the underlying dispatcher lifecycle problem: repeated per-tick open/close cycles. The persistent dispatcher connection refactor reduces dependence on the close-path mitigation but does not replace the need for safe close behavior in the public kanban_db.connect() API.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The embedded dispatcher should avoid per-tick SQLite WAL connection churn while keeping DB work off the event loop and preserving sqlite thread affinity.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure