hermes - 💡(How to fix) Fix kanban dispatcher: 'duplicate column name: consecutive_failures' on first tick after gateway restart

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

kanban dispatcher fails with sqlite3.OperationalError: duplicate column name: consecutive_failures on the first tick after every gateway restart, on a kanban DB that has been migrated by a prior 0.12.x → 0.13 release. Subsequent ticks succeed. Once-per-restart noise in errors.log.

Error Message

2026-05-08 14:21:53,349 ERROR gateway.run: kanban dispatcher: tick failed on board default Traceback (most recent call last): File "/Users/leon/.hermes/hermes-agent/gateway/run.py", line 3931, in _tick_once_for_board conn = _kb.connect(board=slug) ^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 928, in connect _migrate_add_optional_columns(conn) File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 996, in _migrate_add_optional_columns conn.execute( sqlite3.OperationalError: duplicate column name: consecutive_failures

Root Cause

kanban dispatcher fails with sqlite3.OperationalError: duplicate column name: consecutive_failures on the first tick after every gateway restart, on a kanban DB that has been migrated by a prior 0.12.x → 0.13 release. Subsequent ticks succeed. Once-per-restart noise in errors.log.

Fix Action

Fix / Workaround

kanban dispatcher fails with sqlite3.OperationalError: duplicate column name: consecutive_failures on the first tick after every gateway restart, on a kanban DB that has been migrated by a prior 0.12.x → 0.13 release. Subsequent ticks succeed. Once-per-restart noise in errors.log.

Local main is at origin/main + 3 unrelated local patches (none touch kanban). The DB was created and last-migrated under 0.12.x.

2026-05-08 14:21:53,349 ERROR gateway.run: kanban dispatcher: tick failed on board default
Traceback (most recent call last):
  File "/Users/leon/.hermes/hermes-agent/gateway/run.py", line 3931, in _tick_once_for_board
    conn = _kb.connect(board=slug)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 928, in connect
    _migrate_add_optional_columns(conn)
  File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 996, in _migrate_add_optional_columns
    conn.execute(
sqlite3.OperationalError: duplicate column name: consecutive_failures

Code Example

Hermes Agent v0.13.0 (2026.5.7)
Python: 3.11.15 (macOS 15, Apple Silicon)
OpenAI SDK: 2.32.0

---

2026-05-08 14:21:53,349 ERROR gateway.run: kanban dispatcher: tick failed on board default
Traceback (most recent call last):
  File "/Users/leon/.hermes/hermes-agent/gateway/run.py", line 3931, in _tick_once_for_board
    conn = _kb.connect(board=slug)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 928, in connect
    _migrate_add_optional_columns(conn)
  File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 996, in _migrate_add_optional_columns
    conn.execute(
sqlite3.OperationalError: duplicate column name: consecutive_failures

---

$ sqlite3 ~/.hermes/kanban.db "PRAGMA table_info(tasks);" | tail -10
17|spawn_failures|INTEGER|1|0|0
18|worker_pid|INTEGER|0||0
19|last_spawn_error|TEXT|0||0
...
25|skills|TEXT|0||0
26|consecutive_failures|INTEGER|1|0|0
27|last_failure_error|TEXT|0||0
28|max_retries|INTEGER|0||0

---

import sys, os, sqlite3
sys.path.insert(0, '/Users/leon/.hermes/hermes-agent')
os.chdir('/Users/leon/.hermes/hermes-agent')
from hermes_cli.kanban_db import connect

c = connect(board='default')   # succeeds, no error
c.close()

---

conn = _kb.connect(board=slug)            # ← line 3931, the failing call
try:
    _kb.init_db(board=slug)               # opens another conn that re-runs init
except Exception:
    pass
RAW_BUFFERClick to expand / collapse

Summary

kanban dispatcher fails with sqlite3.OperationalError: duplicate column name: consecutive_failures on the first tick after every gateway restart, on a kanban DB that has been migrated by a prior 0.12.x → 0.13 release. Subsequent ticks succeed. Once-per-restart noise in errors.log.

Version

Hermes Agent v0.13.0 (2026.5.7)
Python: 3.11.15 (macOS 15, Apple Silicon)
OpenAI SDK: 2.32.0

Local main is at origin/main + 3 unrelated local patches (none touch kanban). The DB was created and last-migrated under 0.12.x.

Symptom

~/.hermes/logs/errors.log after gateway restart:

2026-05-08 14:21:53,349 ERROR gateway.run: kanban dispatcher: tick failed on board default
Traceback (most recent call last):
  File "/Users/leon/.hermes/hermes-agent/gateway/run.py", line 3931, in _tick_once_for_board
    conn = _kb.connect(board=slug)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 928, in connect
    _migrate_add_optional_columns(conn)
  File "/Users/leon/.hermes/hermes-agent/hermes_cli/kanban_db.py", line 996, in _migrate_add_optional_columns
    conn.execute(
sqlite3.OperationalError: duplicate column name: consecutive_failures

Only 1 ERROR per gateway restart — subsequent dispatcher ticks (every 60s after) succeed silently. Gateway core, Telegram, Weixin, cron all healthy.

Database state

~/.hermes/kanban.db has 0 tasks. The schema already includes consecutive_failures, last_failure_error, max_retries (added during a prior 0.12.x migration) plus the legacy spawn_failures, last_spawn_error columns:

$ sqlite3 ~/.hermes/kanban.db "PRAGMA table_info(tasks);" | tail -10
17|spawn_failures|INTEGER|1|0|0
18|worker_pid|INTEGER|0||0
19|last_spawn_error|TEXT|0||0
...
25|skills|TEXT|0||0
26|consecutive_failures|INTEGER|1|0|0
27|last_failure_error|TEXT|0||0
28|max_retries|INTEGER|0||0

Note all the columns the migration wants to add are already present (cids 26-28).

Reproduction (does NOT reproduce in isolation)

A direct reproduction from a fresh Python process succeeds — the migration's column-existence guard (if "consecutive_failures" not in cols) correctly skips the ALTER TABLE:

import sys, os, sqlite3
sys.path.insert(0, '/Users/leon/.hermes/hermes-agent')
os.chdir('/Users/leon/.hermes/hermes-agent')
from hermes_cli.kanban_db import connect

c = connect(board='default')   # succeeds, no error
c.close()

But when invoked from the gateway dispatcher's _tick_once_for_board (worker thread via asyncio.to_thread), the same call fails. There appears to be a context-dependent difference in what PRAGMA table_info(tasks) returns at the moment _migrate_add_optional_columns queries it.

Speculation on cause

Two possibilities I can think of:

  1. Concurrent connections during gateway startup: the dispatcher tick races with another path that also opens the kanban DB (e.g., gateway notifier, board init). One connection sees mid-migration state.

  2. Connection-local schema cache: under WAL mode + synchronous=NORMAL, schema visibility across connections may have fence ordering subtleties on first concurrent open.

The dispatcher path at gateway/run.py:3931 does:

conn = _kb.connect(board=slug)            # ← line 3931, the failing call
try:
    _kb.init_db(board=slug)               # opens another conn that re-runs init
except Exception:
    pass

init_db() discards the path from _INITIALIZED_PATHS and re-opens, forcing the migration to re-run on a second connection. So per dispatcher tick, the migration is invoked twice on two different connections.

Suggested fix

Either:

  • Idempotency wrap: catch sqlite3.OperationalError whose message contains "duplicate column name" around each ALTER TABLE in _migrate_add_optional_columns and ignore it. The end state is what we want.
  • Re-query: refresh cols from PRAGMA table_info(tasks) immediately before each guard check (the existing comment notes this is intentionally not done — but the assumption that "no step depends on a column added by a previous step in the same call" doesn't protect against another connection mutating the schema between snapshot and check).

I lean toward the idempotency-wrap fix as the simplest robust solution.

Workaround for affected users

None needed if you don't actively use kanban — the error fires once per restart and doesn't affect anything else. If kanban is in use, the second tick (60s later) succeeds and the dispatcher continues normally.

Relevant recent commits

  • 24d48ffb8 feat(kanban): add specify — auxiliary LLM fleshes out triage tasks (#21435)
  • ac51c4c1a feat(kanban): per-task max_retries override (#21330) — added max_retries column
  • a2ff19305 chore: follow-up cleanup for Kanban migration fix

The migration handler in hermes_cli/kanban_db.py:_migrate_add_optional_columns is the relevant code path.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING