hermes - 💡(How to fix) Fix Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

hermes2026-05-25 00:09:51

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

The gateway embedded Kanban dispatcher currently opens and closes Kanban SQLite connections on every dispatcher tick. The dispatch path opens one connection per board, and the health telemetry path opens another connection per board on the same tick. In a long-running gateway process with a short dispatch interval, this creates repeated SQLite WAL/SHM connection churn and file descriptor pressure.

We observed this locally in a long-running Mission Control gateway process: lsof showed multiple open handles for kanban.db, kanban.db-wal, and kanban.db-shm. A prior local patch (DEC-2026-05-23-024) mitigated the close path by using a _WalSafeConnection that runs PRAGMA wal_checkpoint(TRUNCATE) before close, but that does not remove the underlying dispatcher churn pattern.

Root Cause

This is safe for event-loop blocking but unfavorable for persistent SQLite connection reuse because default sqlite connections are thread-affine.

Fix Action

Fix / Workaround

Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Title: Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

Code Example

venv/bin/python -m pytest tests/gateway/test_kanban_dispatcher.py tests/hermes_cli/test_kanban_db.py -q

---

178 passed in 4.59s

RAW_BUFFERClick to expand / collapse

Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Title: Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

Summary

Affected area

gateway/run.py
Embedded _kanban_dispatcher_watcher()
Kanban DB dispatch path and dispatcher health probe

Current behavior

Per tick:

_tick_once_for_board() opens _kb.connect(board=slug), calls _kb.dispatch_once(...), then closes the connection.
_ready_nonempty() opens another _kb.connect(board=slug) for health telemetry, checks spawnable ready/review tasks, then closes the connection.
The watcher uses asyncio.to_thread(...), so work may run on arbitrary default executor threads across ticks.

This is safe for event-loop blocking but unfavorable for persistent SQLite connection reuse because default sqlite connections are thread-affine.

Expected behavior

The embedded dispatcher should avoid per-tick SQLite WAL connection churn while keeping DB work off the event loop and preserving sqlite thread affinity.

Proposed fix

Use a dedicated single-thread ThreadPoolExecutor for dispatcher DB work and maintain a per-board persistent SQLite connection cache inside the dispatcher watcher:

one executor thread named kanban-dispatcher,
one cached connection per active board,
dispatch and ready/review health probes share the cached board connection,
fingerprint changes close and reopen the cached connection,
corrupt-board handling closes/discards cached connection and suppresses retry until DB fingerprint changes,
watcher shutdown/cancellation closes all cached connections on the dispatcher executor thread.

This is upstreamable because it is a minimal runtime change and does not add deployment-specific assumptions.

Local validation

Focused tests added locally:

tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_uses_dedicated_single_thread_executor
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reuses_board_connection_across_ticks
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_health_probe_uses_cached_connection
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_closes_cached_connection_on_shutdown
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_reopens_cached_connection_when_fingerprint_changes
tests/gateway/test_kanban_dispatcher.py::test_kanban_dispatcher_corrupt_board_closes_and_suppresses_until_fingerprint_changes

Command:

venv/bin/python -m pytest tests/gateway/test_kanban_dispatcher.py tests/hermes_cli/test_kanban_db.py -q

Result:

178 passed in 4.59s

Related local evidence

Local deviation DEC-2026-05-23-024 previously addressed the close-path symptom with _WalSafeConnection.close() running PRAGMA wal_checkpoint(TRUNCATE) before super().close(). This issue is the underlying dispatcher lifecycle problem: repeated per-tick open/close cycles. The persistent dispatcher connection refactor reduces dependence on the close-path mitigation but does not replace the need for safe close behavior in the public kanban_db.connect() API.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The embedded dispatcher should avoid per-tick SQLite WAL connection churn while keeping DB work off the event loop and preserving sqlite thread affinity.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Gateway embedded Kanban dispatcher opens SQLite WAL connections every tick, causing FD/WAL pressure

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Code Example

Draft upstream issue — Kanban dispatcher persistent connection / WAL FD pressure

Summary

Affected area

Current behavior

Expected behavior

Proposed fix

Local validation

Related local evidence

FAQ

Expected behavior

Still need to ship something?

TRENDING