hermes - 💡(How to fix) Fix Kanban SQLite database corruption under rapid task creation

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The kanban SQLite database (~/.hermes/kanban.db) becomes corrupted (database disk image is malformed) when creating ~9-10 tasks in rapid succession via the kanban_create tool API. This has happened 3 times in 2 days under normal orchestrator workflow.

Error Message

{"error": "kanban_create: database disk image is malformed"} 4. All subsequent kanban operations fail with the same error

Root Cause

The kanban SQLite database (~/.hermes/kanban.db) becomes corrupted (database disk image is malformed) when creating ~9-10 tasks in rapid succession via the kanban_create tool API. This has happened 3 times in 2 days under normal orchestrator workflow.

Fix Action

Fix / Workaround

Observed Behavior

  • kanban.db-wal file becomes 0 bytes after corruption
  • The database file itself appears intact in size but is unreadable by SQLite
  • Previously created tasks are lost
  • Gateway continues running but cannot dispatch tasks

Additional Context

  • The issue occurs when using the tool API (kanban_create), not CLI commands
  • We added 1-second delays between kanban_create calls as a workaround, but this is not a fix
  • The dispatcher holds an open DB connection; concurrent writes from tool API calls may race with WAL checkpointing
  • Previous corruption incidents: 2025-05-23 (twice), 2025-05-24 (once)

Code Example

{"error": "kanban_create: database disk image is malformed"}

---

hermes gateway stop
cp ~/.hermes/kanban.db ~/.hermes/kanban.db.backup.$(date +%Y%m%d_%H%M%S)
rm -f ~/.hermes/kanban.db-shm ~/.hermes/kanban.db-wal
mv ~/.hermes/kanban.db ~/.hermes/kanban.db.corrupted.$(date +%Y%m%d_%H%M%S)
hermes kanban init
hermes gateway start
RAW_BUFFERClick to expand / collapse

Bug Report: Kanban SQLite database corruption under rapid task creation

Summary

The kanban SQLite database (~/.hermes/kanban.db) becomes corrupted (database disk image is malformed) when creating ~9-10 tasks in rapid succession via the kanban_create tool API. This has happened 3 times in 2 days under normal orchestrator workflow.

Environment

  • Hermes Agent version: v0.14.0 (2026.5.16)
  • Python: 3.11.15
  • OS: Ubuntu 22.04 (Linux 6.8.0-117-generic)
  • SQLite: bundled with Python 3.11

Steps to Reproduce

  1. Start gateway: hermes gateway start
  2. Create tasks via kanban_create tool API in a loop (or rapid succession)
  3. After ~9-10 tasks, the next kanban_create call fails with:
    {"error": "kanban_create: database disk image is malformed"}
  4. All subsequent kanban operations fail with the same error

Observed Behavior

  • kanban.db-wal file becomes 0 bytes after corruption
  • The database file itself appears intact in size but is unreadable by SQLite
  • Previously created tasks are lost
  • Gateway continues running but cannot dispatch tasks

Expected Behavior

  • Creating 10+ tasks sequentially should not corrupt the database
  • WAL mode should handle concurrent access safely
  • If corruption occurs, it should be recoverable without full re-initialization

Recovery Steps (currently required)

hermes gateway stop
cp ~/.hermes/kanban.db ~/.hermes/kanban.db.backup.$(date +%Y%m%d_%H%M%S)
rm -f ~/.hermes/kanban.db-shm ~/.hermes/kanban.db-wal
mv ~/.hermes/kanban.db ~/.hermes/kanban.db.corrupted.$(date +%Y%m%d_%H%M%S)
hermes kanban init
hermes gateway start

Additional Context

  • The issue occurs when using the tool API (kanban_create), not CLI commands
  • We added 1-second delays between kanban_create calls as a workaround, but this is not a fix
  • The dispatcher holds an open DB connection; concurrent writes from tool API calls may race with WAL checkpointing
  • Previous corruption incidents: 2025-05-23 (twice), 2025-05-24 (once)

Suggested Investigation

  1. Check if the kanban DB connection uses proper transaction isolation
  2. Verify WAL checkpoint behavior under rapid writes
  3. Consider adding an application-level write queue or mutex for kanban operations
  4. Add automatic WAL recovery on startup if -wal or -shm files are stale

Attachments

  • Will attach kanban.db and kanban.db-wal from next corruption incident if helpful

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING