hermes - 💡(How to fix) Fix kanban_notify_subs storm at gateway boot when last_event_id=0

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On a gateway boot following extended downtime (in our case ~22 min during a version upgrade), 100+ "✔ Kanban t_XXX done" Telegram notifications fired in a single burst. The flood was triggered by 27 kanban_notify_subs rows that had last_event_id = 0 — the _kanban_notifier_watcher walked all task_events newer than 0 for each sub and delivered every backlogged terminal event in one go.

Root Cause

On a gateway boot following extended downtime (in our case ~22 min during a version upgrade), 100+ "✔ Kanban t_XXX done" Telegram notifications fired in a single burst. The flood was triggered by 27 kanban_notify_subs rows that had last_event_id = 0 — the _kanban_notifier_watcher walked all task_events newer than 0 for each sub and delivered every backlogged terminal event in one go.

Fix Action

Fix / Workaround

  • Hermes version: v2026.5.16 (v0.14.0 "Foundation"), running on macOS 25.4.0, Python 3.11.2
  • Branch: clean v2026.5.16 + local fork patches on top
  • Local kanban.db: 28 subs total, 27 at last_event_id = 0
  • Subs created over the past ~12 days (some 2026-05-09, some 2026-05-20)
  • All 27 stale subs target the same (platform='telegram', chat_id=<user_id>)

Workaround applied locally

Code Example

UPDATE kanban_notify_subs
SET last_event_id = COALESCE(
  (SELECT MAX(rowid) FROM task_events WHERE task_events.task_id = kanban_notify_subs.task_id), 0
)
WHERE last_event_id = 0;

---

DELETE FROM kanban_notify_subs;
RAW_BUFFERClick to expand / collapse

Summary

On a gateway boot following extended downtime (in our case ~22 min during a version upgrade), 100+ "✔ Kanban t_XXX done" Telegram notifications fired in a single burst. The flood was triggered by 27 kanban_notify_subs rows that had last_event_id = 0 — the _kanban_notifier_watcher walked all task_events newer than 0 for each sub and delivered every backlogged terminal event in one go.

Environment

  • Hermes version: v2026.5.16 (v0.14.0 "Foundation"), running on macOS 25.4.0, Python 3.11.2
  • Branch: clean v2026.5.16 + local fork patches on top
  • Local kanban.db: 28 subs total, 27 at last_event_id = 0
  • Subs created over the past ~12 days (some 2026-05-09, some 2026-05-20)
  • All 27 stale subs target the same (platform='telegram', chat_id=<user_id>)

Steps to reproduce

  1. Have multiple kanban_notify_subs rows with last_event_id = 0 (this can happen if subs are created via a code path that doesn't snap the cursor to current MAX(rowid) from task_events)
  2. Have those tasks reach terminal states (completed/blocked/etc.) — populating task_events
  3. Restart the gateway (or have it down for any extended period)
  4. On boot, _kanban_notifier_watcher fires within a few seconds and delivers ALL backlogged terminal events for ALL stale subs in a single burst

Two distinct bugs that compound

Bug 1: last_event_id defaults to 0 at sub creation

kanban_notify_subs schema declares last_event_id INTEGER NOT NULL DEFAULT 0. If a sub is created on an already-active task with N events in task_events, those N events get replayed on the next notifier tick. The cursor should snap to current MAX(rowid) FROM task_events WHERE task_id = sub.task_id at sub creation so the sub starts "caught up."

Bug 2: _kanban_notifier_watcher ignores sub.event_kinds on the message-send path

The watcher passes kinds=TERMINAL_KINDS = ("completed", "blocked", "gave_up", "crashed", "timed_out") to claim_unseen_events_for_sub, ignoring the per-sub event_kinds filter column. That means a sub configured as "only notify on blocked/gave_up/crashed/timed_out" still receives "completed" notifications. In our case all 27 subs were configured event_kinds = "blocked,gave_up,crashed,timed_out" (failure-only) but received ✔ done messages.

The cursor advance path may or may not respect event_kinds (we observed cursor staying at 0 after delivery for some subs and advancing for others — investigation incomplete).

Suggested fixes

  1. (Bug 1) In the sub-creation path (kanban_db.add_notify_sub or equivalent), snap last_event_id to MAX(rowid) FROM task_events WHERE task_id = ? at creation time. Subs start caught up — they only deliver events that occur AFTER subscription.
  2. (Bug 2) In _kanban_notifier_watcher, intersect TERMINAL_KINDS with sub.event_kinds before passing to claim_unseen_events_for_sub. If the sub explicitly wants "blocked,gave_up,crashed,timed_out", don't deliver "completed" to it.
  3. (Defense in depth) Add a per-tick rate limit so even if the queue is stale, the watcher doesn't fire 100+ messages in one burst — batch + summarize, or cap at N/tick with a "X more queued" footer.

Workaround applied locally

UPDATE kanban_notify_subs
SET last_event_id = COALESCE(
  (SELECT MAX(rowid) FROM task_events WHERE task_events.task_id = kanban_notify_subs.task_id), 0
)
WHERE last_event_id = 0;

Plus, since the subs were for 8-day-old test-fire tasks no longer of interest:

DELETE FROM kanban_notify_subs;

Future floods prevented by both.

Related upstream activity (none of these address this directly)

  • PR #28748 (OPEN) — board-level kanban notification subscriptions
  • PR #27615 (CLOSED) — deliver commented events; wildcard cross-board subs
  • PR #23120 (CLOSED) — push notification when transitioning to blocked

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix kanban_notify_subs storm at gateway boot when last_event_id=0