hermes - 💡(How to fix) Fix RFC: Pluggable SessionDB Provider — PostgreSQL, MySQL, and Beyond

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

class SessionDBProvider(ABC): """Abstract base for pluggable session storage backends."""

@property
@abstractmethod
def name(self) -> str: ...

@abstractmethod
def is_available(self) -> bool: ...

# ── Lifecycle ──
@abstractmethod
def initialize(self, **kwargs) -> None: ...
@abstractmethod
def shutdown(self) -> None: ...

# ── Session CRUD ──
@abstractmethod
def create_session(
    self, session_id: str, source: str, *,
    user_id: str = "", model: str = "",
    model_config: str = "", system_prompt: str = "",
    parent_session_id: str = "",
) -> str: ...

@abstractmethod
def get_session(self, session_id: str) -> dict | None: ...
@abstractmethod
def end_session(self, session_id: str, end_reason: str) -> None: ...
@abstractmethod
def reopen_session(self, session_id: str) -> None: ...
@abstractmethod
def ensure_session(self, session_id: str, source: str, model: str = "", **kwargs) -> None: ...

@abstractmethod
def update_system_prompt(self, session_id: str, system_prompt: str) -> None: ...
@abstractmethod
def update_token_counts(self, session_id: str, **counts) -> None: ...

# ── Titles ──
@abstractmethod
def set_session_title(self, session_id: str, title: str) -> None: ...
@abstractmethod
def get_session_title(self, session_id: str) -> str | None: ...
@abstractmethod
def get_session_by_title(self, title: str) -> dict | None: ...
@abstractmethod
def resolve_session_by_title(self, title: str) -> str | None: ...

# ── Messages ──
@abstractmethod
def append_message(self, session_id: str, role: str, content: str, **kwargs) -> int: ...
@abstractmethod
def replace_messages(self, session_id: str, messages: list) -> None: ...
@abstractmethod
def get_messages(self, session_id: str) -> list: ...
@abstractmethod
def get_messages_as_conversation(
    self, session_id: str, include_ancestors: bool = True,
) -> list: ...

# ── Search ──
@abstractmethod
def search_messages(
    self, query: str, *,
    source_filter: str = "", exclude_sources: list = None,
    role_filter: str = "", limit: int = 20, offset: int = 0,
) -> list: ...

@abstractmethod
def search_sessions(
    self, source: str = "", limit: int = 50, offset: int = 0,
) -> list: ...

@abstractmethod
def session_count(self, source: str = "") -> int: ...
@abstractmethod
def message_count(self, session_id: str = "") -> int: ...

# ── Session Listing ──
@abstractmethod
def list_sessions_rich(
    self, source: str = "", exclude_sources: list = None,
    limit: int = 50, offset: int = 0,
    include_children: bool = False,
    project_compression_tips: bool = True,
    order_by_last_active: bool = False,
) -> list: ...

# ── Meta ──
@abstractmethod
def get_meta(self, key: str) -> str | None: ...
@abstractmethod
def set_meta(self, key: str, value: str) -> None: ...

# ── Handoff ──
@abstractmethod
def request_handoff(self, session_id: str, platform: str) -> None: ...
@abstractmethod
def get_handoff_state(self, session_id: str) -> dict | None: ...
@abstractmethod
def list_pending_handoffs(self) -> list: ...
@abstractmethod
def claim_handoff(self, session_id: str) -> bool: ...
@abstractmethod
def complete_handoff(self, session_id: str) -> None: ...
@abstractmethod
def fail_handoff(self, session_id: str, error: str) -> None: ...

# ── Maintenance ──
@abstractmethod
def export_session(self, session_id: str) -> dict: ...
@abstractmethod
def export_all(self, source: str = "") -> list: ...
@abstractmethod
def delete_session(self, session_id: str) -> None: ...
@abstractmethod
def clear_messages(self, session_id: str) -> None: ...

# ── Telegram topic mode (optional) ──
def apply_telegram_topic_migration(self) -> None: ...  # default no-op
def enable_telegram_topic_mode(self) -> None: ...
def disable_telegram_topic_mode(self) -> None: ...
def is_telegram_topic_mode_enabled(self) -> bool: ...
def get_telegram_topic_binding(self) -> str | None: ...
def bind_telegram_topic(self) -> None: ...
def is_telegram_session_linked_to_topic(self) -> bool: ...
def list_unlinked_telegram_sessions_for_user(self) -> list: ...

Fix Action

Fix / Workaround

2.3 Contention Mitigations (Already Applied)

The current code has sophisticated mitigations from PR #3249 (@teknium1, merged):

Code Example

1. git pull          ← changes Python source code
2. pip install -e .   rebuilds the package
3. SIGTERM           ← kills the running process
4. restart           ← new process opens state.db

---

sessions     — session metadata (40+ columns)
messages    — message history (15+ columns)
state_meta  — key/value store
messages_fts        — FTS5 full-text search
messages_fts_trigram — FTS5 trigram (CJK support)

---

CRUD:        create_session, end_session, get_session, delete_session, ...
Messages:    append_message, replace_messages, get_messages, get_messages_as_conversation, ...
Search:      search_messages, search_sessions, session_count, ...
FTS:         messages_fts, messages_fts_trigram (SQLite FTS5-specific)
Meta:        get_meta, set_meta
Telegram:    telegram_topic_mode, bind_telegram_topic, ...
Handoffs:    request_handoff, claim_handoff, complete_handoff, ...
Maintenance: vacuum, prune_sessions, maybe_auto_prune_and_vacuum, ...

---

plugins/
├── memory/              ← existing pattern (MemoryProvider)
└── sessiondb/NEW (SessionDBProvider)
    ├── sqlite/           ← bundled default (current behavior)
    │   ├── __init__.py
    │   ├── plugin.yaml
    │   └── sessiondb.py
    ├── postgresql/       ← primary target
    │   ├── __init__.py
    │   ├── plugin.yaml
    │   └── sessiondb.py
    └── mysql/            ← secondary target
        ├── __init__.py
        ├── plugin.yaml
        └── sessiondb.py

---

class SessionDBProvider(ABC):
    """Abstract base for pluggable session storage backends."""

    @property
    @abstractmethod
    def name(self) -> str: ...

    @abstractmethod
    def is_available(self) -> bool: ...

    # ── Lifecycle ──
    @abstractmethod
    def initialize(self, **kwargs) -> None: ...
    @abstractmethod
    def shutdown(self) -> None: ...

    # ── Session CRUD ──
    @abstractmethod
    def create_session(
        self, session_id: str, source: str, *,
        user_id: str = "", model: str = "",
        model_config: str = "", system_prompt: str = "",
        parent_session_id: str = "",
    ) -> str: ...

    @abstractmethod
    def get_session(self, session_id: str) -> dict | None: ...
    @abstractmethod
    def end_session(self, session_id: str, end_reason: str) -> None: ...
    @abstractmethod
    def reopen_session(self, session_id: str) -> None: ...
    @abstractmethod
    def ensure_session(self, session_id: str, source: str, model: str = "", **kwargs) -> None: ...

    @abstractmethod
    def update_system_prompt(self, session_id: str, system_prompt: str) -> None: ...
    @abstractmethod
    def update_token_counts(self, session_id: str, **counts) -> None: ...

    # ── Titles ──
    @abstractmethod
    def set_session_title(self, session_id: str, title: str) -> None: ...
    @abstractmethod
    def get_session_title(self, session_id: str) -> str | None: ...
    @abstractmethod
    def get_session_by_title(self, title: str) -> dict | None: ...
    @abstractmethod
    def resolve_session_by_title(self, title: str) -> str | None: ...

    # ── Messages ──
    @abstractmethod
    def append_message(self, session_id: str, role: str, content: str, **kwargs) -> int: ...
    @abstractmethod
    def replace_messages(self, session_id: str, messages: list) -> None: ...
    @abstractmethod
    def get_messages(self, session_id: str) -> list: ...
    @abstractmethod
    def get_messages_as_conversation(
        self, session_id: str, include_ancestors: bool = True,
    ) -> list: ...

    # ── Search ──
    @abstractmethod
    def search_messages(
        self, query: str, *,
        source_filter: str = "", exclude_sources: list = None,
        role_filter: str = "", limit: int = 20, offset: int = 0,
    ) -> list: ...

    @abstractmethod
    def search_sessions(
        self, source: str = "", limit: int = 50, offset: int = 0,
    ) -> list: ...

    @abstractmethod
    def session_count(self, source: str = "") -> int: ...
    @abstractmethod
    def message_count(self, session_id: str = "") -> int: ...

    # ── Session Listing ──
    @abstractmethod
    def list_sessions_rich(
        self, source: str = "", exclude_sources: list = None,
        limit: int = 50, offset: int = 0,
        include_children: bool = False,
        project_compression_tips: bool = True,
        order_by_last_active: bool = False,
    ) -> list: ...

    # ── Meta ──
    @abstractmethod
    def get_meta(self, key: str) -> str | None: ...
    @abstractmethod
    def set_meta(self, key: str, value: str) -> None: ...

    # ── Handoff ──
    @abstractmethod
    def request_handoff(self, session_id: str, platform: str) -> None: ...
    @abstractmethod
    def get_handoff_state(self, session_id: str) -> dict | None: ...
    @abstractmethod
    def list_pending_handoffs(self) -> list: ...
    @abstractmethod
    def claim_handoff(self, session_id: str) -> bool: ...
    @abstractmethod
    def complete_handoff(self, session_id: str) -> None: ...
    @abstractmethod
    def fail_handoff(self, session_id: str, error: str) -> None: ...

    # ── Maintenance ──
    @abstractmethod
    def export_session(self, session_id: str) -> dict: ...
    @abstractmethod
    def export_all(self, source: str = "") -> list: ...
    @abstractmethod
    def delete_session(self, session_id: str) -> None: ...
    @abstractmethod
    def clear_messages(self, session_id: str) -> None: ...

    # ── Telegram topic mode (optional) ──
    def apply_telegram_topic_migration(self) -> None: ...  # default no-op
    def enable_telegram_topic_mode(self) -> None: ...
    def disable_telegram_topic_mode(self) -> None: ...
    def is_telegram_topic_mode_enabled(self) -> bool: ...
    def get_telegram_topic_binding(self) -> str | None: ...
    def bind_telegram_topic(self) -> None: ...
    def is_telegram_session_linked_to_topic(self) -> bool: ...
    def list_unlinked_telegram_sessions_for_user(self) -> list: ...

---

sessiondb:
  provider: postgresql
  postgresql:
    dsn: postgresql://user:pass@localhost:5432/hermes
    pool_size: 10
    max_overflow: 5

---

hermes sessiondb migrate  --from sqlite://~/.hermes/state.db --to postgresql://...

---

-- schemas/000_hermes.sql
CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    source TEXT NOT NULL,
    user_id TEXT,
    model TEXT,
    model_config TEXT,
    system_prompt TEXT,
    parent_session_id TEXT REFERENCES sessions(id),
    started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    ended_at TIMESTAMPTZ,
    end_reason TEXT,
    message_count INTEGER DEFAULT 0,
    tool_call_count INTEGER DEFAULT 0,
    input_tokens INTEGER DEFAULT 0,
    output_tokens INTEGER DEFAULT 0,
    cache_read_tokens INTEGER DEFAULT 0,
    cache_write_tokens INTEGER DEFAULT 0,
    reasoning_tokens INTEGER DEFAULT 0,
    billing_provider TEXT,
    billing_base_url TEXT,
    billing_mode TEXT,
    estimated_cost_usd REAL,
    actual_cost_usd REAL,
    cost_status TEXT,
    cost_source TEXT,
    pricing_version TEXT,
    title TEXT,
    api_call_count INTEGER DEFAULT 0,
    handoff_state TEXT,
    handoff_platform TEXT,
    handoff_error TEXT
);

CREATE TABLE messages (
    id BIGSERIAL PRIMARY KEY,
    session_id TEXT NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    role TEXT NOT NULL,
    content TEXT,
    tool_call_id TEXT,
    tool_calls TEXT,
    tool_name TEXT,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    token_count INTEGER,
    finish_reason TEXT,
    reasoning TEXT,
    reasoning_content TEXT,
    reasoning_details TEXT,
    codex_reasoning_items TEXT,
    codex_message_items TEXT
);

-- Full-text search with pg_trgm
CREATE INDEX idx_messages_content_trgm ON messages USING GIN (content gin_trgm_ops);
CREATE INDEX idx_sessions_source ON sessions(source);
CREATE INDEX idx_sessions_parent ON sessions(parent_session_id);
CREATE INDEX idx_sessions_handoff ON sessions(handoff_state) WHERE handoff_state IS NOT NULL;

-- Telegram topic tables (only created on explicit opt-in)
CREATE TABLE IF NOT EXISTS telegram_topic (
    user_id TEXT NOT NULL,
    topic_id TEXT NOT NULL,
    session_id TEXT NOT NULL REFERENCES sessions(id),
    is_topic_mode BOOLEAN NOT NULL DEFAULT false,
    PRIMARY KEY (user_id, session_id)
);

---

# Search with trigram similarity
query = """
    SELECT m.*, s.source, s.title
    FROM messages m
    JOIN sessions s ON m.session_id = s.id
    WHERE m.content %(ilike)s %(param)s
    ORDER BY m.timestamp DESC
    LIMIT %(limit)s OFFSET %(offset)s
"""
# For short CJK queries: ILIKE with % wildcards
# For longer queries: similarity() ranking

---

# config.yaml
sessiondb:
  provider: sqlite  # sqlite | postgresql | mysql
  sqlite:
    db_path: ~/.hermes/state.db
  postgresql:
    dsn: postgresql://user:pass@localhost:5432/hermes
    pool_size: 10
    max_overflow: 5
    statement_timeout: 30000  # ms
  mysql:
    host: 127.0.0.1
    port: 3306
    database: hermes
    user: hermes
    password: "{{ env.MYSQL_PASSWORD }}"
    pool_size: 10

---

class PostgreSQLSessionDBProvider(SessionDBProvider):
    def initialize(self, **kwargs):
        config = kwargs.get("config", {})
        self._pool = asyncpg.create_pool(
            dsn=config["dsn"],
            min_size=config.get("pool_size", 5),
            max_size=config.get("max_size", 10),
        )
RAW_BUFFERClick to expand / collapse

RFC: Pluggable SessionDB Provider — PostgreSQL, MySQL, and Beyond

Tracking Issue: (to be created)


1. Problem Statement

1.1 The Hot-Update Death Spiral

When running Hermes and simultaneously updating it (git pull/hermes update), the shared state.db SQLite file inevitably suffers:

  1. Process A (old version) has an open SQLite connection with WAL mode, mid-write
  2. Process B (new version) starts, tries to connect → database is locked
  3. WAL checkpoint is interrupted mid-flight by SIGTERM
  4. Result: state.db is corrupted, session_search permanently broken until manual recovery

This is not a theoretical edge case — it's the normal development workflow for anyone running Hermes from source.

1.2 SQLite's Fundamental Limits in Multi-Process Hermes

Hermes runs multiple processes sharing one state.db:

ProcessWritesReads
CLI session✅ Every turn✅ session_search
Gateway✅ Every turn✅ session_search
Cron scheduler✅ Job results✅ Job config
TUI✅ Session list✅ Session list
API server✅ Session creation✅ Query

SQLite WAL mode allows concurrent reads, but writes are serialized at the OS level. Even with the current jitter-retry + BEGIN IMMEDIATE design, contention causes:

  • #3139 (closed, fixed): session_search permanently disabled on lock timeout
  • #5563 (open, P1): state.db corruption during normal use
  • #860 (closed): Duplicate message accumulation
  • #14210 (open): Too many open file descriptors on macOS
  • #22032 (open): NFS locking protocol silently breaks /resume

1.3 The Hard Update Problem (Undocumented)

When hermes update runs:

1. git pull          ← changes Python source code
2. pip install -e .  ← rebuilds the package
3. SIGTERM           ← kills the running process
4. restart           ← new process opens state.db

Between steps 3 and 4:

  • The old process may be flushing messages to SQLite
  • The WAL file may have un-checkpointed frames
  • The new process starts with possibly different schema expectations
  • 43MB state.db is at risk of corruption on every update

Related issues:

  • #15733 (closed): hermes update wiped state.db entirely (git clone instead of pull)
  • #6702 (open): hermes update kills cron workers mid-write with no opt-out

2. Current Architecture

2.1 SessionDB Class (hermes_state.py)

Single monolithic class: SessionDB in hermes_state.py (~2,966 lines, ~127KB).

Schema (single SQLite file):

sessions     — session metadata (40+ columns)
messages    — message history (15+ columns)
state_meta  — key/value store
messages_fts        — FTS5 full-text search
messages_fts_trigram — FTS5 trigram (CJK support)

Public interface (~60 methods):

CRUD:        create_session, end_session, get_session, delete_session, ...
Messages:    append_message, replace_messages, get_messages, get_messages_as_conversation, ...
Search:      search_messages, search_sessions, session_count, ...
FTS:         messages_fts, messages_fts_trigram (SQLite FTS5-specific)
Meta:        get_meta, set_meta
Telegram:    telegram_topic_mode, bind_telegram_topic, ...
Handoffs:    request_handoff, claim_handoff, complete_handoff, ...
Maintenance: vacuum, prune_sessions, maybe_auto_prune_and_vacuum, ...

2.2 Usage Across Codebase

SessionDB is lazy-imported and instantiated in 6+ call sites:

FilePattern
cli.py (×6)from hermes_state import SessionDB; self._session_db = SessionDB()
gateway/session.pyself._db = SessionDB()
gateway/mirror.pydb = SessionDB()
gateway/platforms/api_server.pyself._session_db = SessionDB()
cron/scheduler.py_session_db = SessionDB()

Each call site creates a new independent connection to the same SQLite file — this is the source of cross-process write contention.

2.3 Contention Mitigations (Already Applied)

The current code has sophisticated mitigations from PR #3249 (@teknium1, merged):

  • WAL mode with passive checkpoint every 50 writes
  • Jitter retry: 15 attempts, 20-150ms random backoff per attempt
  • BEGIN IMMEDIATE to surface lock contention at transaction start
  • INSERT OR IGNORE in create_session for idempotency
  • ensure_session() lazy helper for flush-time recovery
  • Never nullify _session_db on transient failures

These are band-aids, not a cure. A real RDBMS solves the problem at the architecture level.


3. Proposed Solution

3.1 Pluggable SessionDB Provider (Following the MemoryProvider Pattern)

Following the existing MemoryProvider ABC pattern (agent/memory_provider.py, used by Holographic, Honcho, Hindsight, Mem0, etc.), introduce SessionDBProvider:

plugins/
├── memory/              ← existing pattern (MemoryProvider)
└── sessiondb/           ← NEW (SessionDBProvider)
    ├── sqlite/           ← bundled default (current behavior)
    │   ├── __init__.py
    │   ├── plugin.yaml
    │   └── sessiondb.py
    ├── postgresql/       ← primary target
    │   ├── __init__.py
    │   ├── plugin.yaml
    │   └── sessiondb.py
    └── mysql/            ← secondary target
        ├── __init__.py
        ├── plugin.yaml
        └── sessiondb.py

3.2 SessionDBProvider ABC

class SessionDBProvider(ABC):
    """Abstract base for pluggable session storage backends."""

    @property
    @abstractmethod
    def name(self) -> str: ...

    @abstractmethod
    def is_available(self) -> bool: ...

    # ── Lifecycle ──
    @abstractmethod
    def initialize(self, **kwargs) -> None: ...
    @abstractmethod
    def shutdown(self) -> None: ...

    # ── Session CRUD ──
    @abstractmethod
    def create_session(
        self, session_id: str, source: str, *,
        user_id: str = "", model: str = "",
        model_config: str = "", system_prompt: str = "",
        parent_session_id: str = "",
    ) -> str: ...

    @abstractmethod
    def get_session(self, session_id: str) -> dict | None: ...
    @abstractmethod
    def end_session(self, session_id: str, end_reason: str) -> None: ...
    @abstractmethod
    def reopen_session(self, session_id: str) -> None: ...
    @abstractmethod
    def ensure_session(self, session_id: str, source: str, model: str = "", **kwargs) -> None: ...

    @abstractmethod
    def update_system_prompt(self, session_id: str, system_prompt: str) -> None: ...
    @abstractmethod
    def update_token_counts(self, session_id: str, **counts) -> None: ...

    # ── Titles ──
    @abstractmethod
    def set_session_title(self, session_id: str, title: str) -> None: ...
    @abstractmethod
    def get_session_title(self, session_id: str) -> str | None: ...
    @abstractmethod
    def get_session_by_title(self, title: str) -> dict | None: ...
    @abstractmethod
    def resolve_session_by_title(self, title: str) -> str | None: ...

    # ── Messages ──
    @abstractmethod
    def append_message(self, session_id: str, role: str, content: str, **kwargs) -> int: ...
    @abstractmethod
    def replace_messages(self, session_id: str, messages: list) -> None: ...
    @abstractmethod
    def get_messages(self, session_id: str) -> list: ...
    @abstractmethod
    def get_messages_as_conversation(
        self, session_id: str, include_ancestors: bool = True,
    ) -> list: ...

    # ── Search ──
    @abstractmethod
    def search_messages(
        self, query: str, *,
        source_filter: str = "", exclude_sources: list = None,
        role_filter: str = "", limit: int = 20, offset: int = 0,
    ) -> list: ...

    @abstractmethod
    def search_sessions(
        self, source: str = "", limit: int = 50, offset: int = 0,
    ) -> list: ...

    @abstractmethod
    def session_count(self, source: str = "") -> int: ...
    @abstractmethod
    def message_count(self, session_id: str = "") -> int: ...

    # ── Session Listing ──
    @abstractmethod
    def list_sessions_rich(
        self, source: str = "", exclude_sources: list = None,
        limit: int = 50, offset: int = 0,
        include_children: bool = False,
        project_compression_tips: bool = True,
        order_by_last_active: bool = False,
    ) -> list: ...

    # ── Meta ──
    @abstractmethod
    def get_meta(self, key: str) -> str | None: ...
    @abstractmethod
    def set_meta(self, key: str, value: str) -> None: ...

    # ── Handoff ──
    @abstractmethod
    def request_handoff(self, session_id: str, platform: str) -> None: ...
    @abstractmethod
    def get_handoff_state(self, session_id: str) -> dict | None: ...
    @abstractmethod
    def list_pending_handoffs(self) -> list: ...
    @abstractmethod
    def claim_handoff(self, session_id: str) -> bool: ...
    @abstractmethod
    def complete_handoff(self, session_id: str) -> None: ...
    @abstractmethod
    def fail_handoff(self, session_id: str, error: str) -> None: ...

    # ── Maintenance ──
    @abstractmethod
    def export_session(self, session_id: str) -> dict: ...
    @abstractmethod
    def export_all(self, source: str = "") -> list: ...
    @abstractmethod
    def delete_session(self, session_id: str) -> None: ...
    @abstractmethod
    def clear_messages(self, session_id: str) -> None: ...

    # ── Telegram topic mode (optional) ──
    def apply_telegram_topic_migration(self) -> None: ...  # default no-op
    def enable_telegram_topic_mode(self) -> None: ...
    def disable_telegram_topic_mode(self) -> None: ...
    def is_telegram_topic_mode_enabled(self) -> bool: ...
    def get_telegram_topic_binding(self) -> str | None: ...
    def bind_telegram_topic(self) -> None: ...
    def is_telegram_session_linked_to_topic(self) -> bool: ...
    def list_unlinked_telegram_sessions_for_user(self) -> list: ...

3.3 Key Design Decisions

Decision 1: Synchronous API (for now)

The current SessionDB API is synchronous. To minimize refactoring blast radius, the first iteration of SessionDBProvider keeps a synchronous interface. PostgreSQL/MySQL async drivers (asyncpg, aiomysql) can be wrapped synchronously or introduced in v2.

Decision 2: Search Adapter Strategy (the hardest part)

SQLite FTS5 is the most SQLite-specific feature. Mapping to PostgreSQL full-text search:

SQLite FTS5PostgreSQL equivalent
MATCH 'foo bar'to_tsvector('english', content) @@ plainto_tsquery('english', 'foo bar')
MATCH '"exact phrase"'phraseto_tsquery('english', 'exact phrase')
Trigram tokenizer (CJK)pg_trgm extension + ILIKE / similarity()
FTS triggersApplication-level index update (or tsvector generated column)

Fallback: Providers that cannot implement FTS5-equivalent search may use ILIKE/LIKE as a degraded mode.

Decision 3: Connection Pooling

Unlike SQLite (single file, one writer), PostgreSQL/MySQL use connection pools. The provider should accept an existing pool or DSN:

sessiondb:
  provider: postgresql
  postgresql:
    dsn: postgresql://user:pass@localhost:5432/hermes
    pool_size: 10
    max_overflow: 5

Decision 4: Schema Migration

Each provider manages its own schema. Initial migration is DDL + data import from SQLite:

hermes sessiondb migrate  --from sqlite://~/.hermes/state.db --to postgresql://...

This is a one-shot CLI command, not auto-migration on startup.


4. PostgreSQL Provider Design

The primary target. PostgreSQL solves every SQLite pain point:

Pain pointPostgreSQL solution
Single-writer lockREAD COMMITTED + connection pool = N concurrent writers
WAL corruption on SIGTERMFull ACID, crash recovery, synchronous_commit tunable
FTS5 CJK searchpg_trgm extension — SELECT ... ILIKE '%中文%'
43MB file fragilityDatabase cluster — crash-safe by design
Hot-update deathNew process = new connection, no file-level conflict

Schema (PostgreSQL)

-- schemas/000_hermes.sql
CREATE TABLE sessions (
    id TEXT PRIMARY KEY,
    source TEXT NOT NULL,
    user_id TEXT,
    model TEXT,
    model_config TEXT,
    system_prompt TEXT,
    parent_session_id TEXT REFERENCES sessions(id),
    started_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    ended_at TIMESTAMPTZ,
    end_reason TEXT,
    message_count INTEGER DEFAULT 0,
    tool_call_count INTEGER DEFAULT 0,
    input_tokens INTEGER DEFAULT 0,
    output_tokens INTEGER DEFAULT 0,
    cache_read_tokens INTEGER DEFAULT 0,
    cache_write_tokens INTEGER DEFAULT 0,
    reasoning_tokens INTEGER DEFAULT 0,
    billing_provider TEXT,
    billing_base_url TEXT,
    billing_mode TEXT,
    estimated_cost_usd REAL,
    actual_cost_usd REAL,
    cost_status TEXT,
    cost_source TEXT,
    pricing_version TEXT,
    title TEXT,
    api_call_count INTEGER DEFAULT 0,
    handoff_state TEXT,
    handoff_platform TEXT,
    handoff_error TEXT
);

CREATE TABLE messages (
    id BIGSERIAL PRIMARY KEY,
    session_id TEXT NOT NULL REFERENCES sessions(id) ON DELETE CASCADE,
    role TEXT NOT NULL,
    content TEXT,
    tool_call_id TEXT,
    tool_calls TEXT,
    tool_name TEXT,
    timestamp TIMESTAMPTZ NOT NULL DEFAULT NOW(),
    token_count INTEGER,
    finish_reason TEXT,
    reasoning TEXT,
    reasoning_content TEXT,
    reasoning_details TEXT,
    codex_reasoning_items TEXT,
    codex_message_items TEXT
);

-- Full-text search with pg_trgm
CREATE INDEX idx_messages_content_trgm ON messages USING GIN (content gin_trgm_ops);
CREATE INDEX idx_sessions_source ON sessions(source);
CREATE INDEX idx_sessions_parent ON sessions(parent_session_id);
CREATE INDEX idx_sessions_handoff ON sessions(handoff_state) WHERE handoff_state IS NOT NULL;

-- Telegram topic tables (only created on explicit opt-in)
CREATE TABLE IF NOT EXISTS telegram_topic (
    user_id TEXT NOT NULL,
    topic_id TEXT NOT NULL,
    session_id TEXT NOT NULL REFERENCES sessions(id),
    is_topic_mode BOOLEAN NOT NULL DEFAULT false,
    PRIMARY KEY (user_id, session_id)
);

CJK Search Strategy

PostgreSQL pg_trgm handles CJK natively without special tokenization:

# Search with trigram similarity
query = """
    SELECT m.*, s.source, s.title
    FROM messages m
    JOIN sessions s ON m.session_id = s.id
    WHERE m.content %(ilike)s %(param)s
    ORDER BY m.timestamp DESC
    LIMIT %(limit)s OFFSET %(offset)s
"""
# For short CJK queries: ILIKE with % wildcards
# For longer queries: similarity() ranking

5. Migration Strategy

Phase 1: ABC + SQLite Provider (Minimal Change)

  • Extract SessionDBProvider ABC from existing SessionDB
  • Rename existing SessionDBSQLiteSessionDBProvider
  • Create factory function to load provider
  • Backward compatible: zero config change, same behavior

Phase 2: PostgreSQL Provider

  • Implement PostgreSQLSessionDBProvider
  • Support sessiondb.provider: postgresql in config
  • Add hermes sessiondb migrate command

Phase 3: MySQL Provider (Nice-to-Have)

  • Implement MySQLSessionDBProvider
  • Same interface, different connection details

Phase 4: Async API (Future)

  • Optional async interface for gateway/cron
  • async def append_message(...) etc.

6. Configuration

# config.yaml
sessiondb:
  provider: sqlite  # sqlite | postgresql | mysql
  sqlite:
    db_path: ~/.hermes/state.db
  postgresql:
    dsn: postgresql://user:pass@localhost:5432/hermes
    pool_size: 10
    max_overflow: 5
    statement_timeout: 30000  # ms
  mysql:
    host: 127.0.0.1
    port: 3306
    database: hermes
    user: hermes
    password: "{{ env.MYSQL_PASSWORD }}"
    pool_size: 10

7. Implementation Plan

StepScopeEffortRisk
1. Extract ABC + factory from SessionDBhermes_state.pyMediumLow
2. Rename SessionDBSQLiteSessionDBProviderSame file, add adapter classSmallLow
3. Update all 6+ call sites to use factorycli.py, gateway/, cron/MediumMedium
4. Add sessiondb.provider config keyhermes_cli/config.pySmallLow
5. Implement PostgreSQL providerplugins/sessiondb/postgresql/LargeMedium
6. Add hermes sessiondb migrate CLIhermes_cli/commands.pyMediumLow
7. Implement MySQL providerplugins/sessiondb/mysql/MediumLow
8. Documentation + migration guidewebsite/docs/MediumLow

Estimated Total: ~2-3 weeks (part-time)


8. Benefits

Before (SQLite)After (PostgreSQL)
❌ Hot-update = corruption risk✅ Hot-update = new connection, no conflict
❌ 1 writer blocks all readers✅ N concurrent writers (connection pool)
❌ FTS5 CJK search (custom trigram)pg_trgm native
❌ 43MB file = single point of failure✅ Database cluster = crash recovery
❌ No user auth/access control✅ PostgreSQL roles + SSL
❌ No replication✅ Streaming replication, WAL archiving
❌ Manual backup = copy filepg_dump, pgBackRest, Barman
❌ Schema migration = fragile ALTER TABLE✅ Versioned migrations (Alembic)

9. Design Decisions (Final)

These decisions are locked for v1 of the implementation:

Decision 1: Dual Search — pg_trgm + tsvector

PostgreSQL provider uses both search strategies:

  • pg_trgm GIN index — primary search backend. Handles CJK natively via trigram similarity (ILIKE, % wildcards), fuzzy matching, and substring search. Covers the full FTS5 trigram tokenizer use case.
  • tsvector GIN index — secondary search for English. to_tsvector('english', content) @@ plainto_tsquery('english', query) for exact phrase matching and linguistic stemming.

The provider picks the best index based on query content: if the query contains CJK codepoints, use pg_trgm; otherwise use tsvector. This matches the current _contains_cjk() heuristic in hermes_state.py.

Decision 2: One-Shot CLI Migration

hermes sessiondb migrate — explicit, safe, auditable. Never auto-migrate on startup (too much risk of accidental data loss). The command:

  1. Reads all data from the source state.db
  2. Ensures the target PostgreSQL schema exists (CREATE TABLE IF NOT EXISTS)
  3. Batch-inserts sessions → messages → FTS data
  4. Runs integrity verification (count rows, spot-check FTS queries)
  5. Prints a summary: "Migrated 142 sessions, 15,863 messages, 44 FTS entries in 2.3s"

Decision 3: Synchronous API (v1), Async Later

SessionDBProvider exposes a synchronous interface. All 6+ current call sites (cli.py, gateway/session.py, etc.) are synchronous — forcing async on v1 would require rewriting the entire call chain with asyncio.run() wrappers, adding complexity with zero immediate benefit.

Async support (async def append_message(...)) is a v2 concern, tracked separately.

Decision 4: Provider-Owned Connection Pool

The PostgreSQL provider owns its connection pool via DSN. Simple, self-contained, no external lifecycle management:

class PostgreSQLSessionDBProvider(SessionDBProvider):
    def initialize(self, **kwargs):
        config = kwargs.get("config", {})
        self._pool = asyncpg.create_pool(
            dsn=config["dsn"],
            min_size=config.get("pool_size", 5),
            max_size=config.get("max_size", 10),
        )

Callers who want custom pool management can pass a pre-created pool via initialize(pool=...) — the provider checks for this first, falls back to DSN-based pool creation.

Decision 5: Kanban.db is Phase 2

This RFC covers SessionDB only (state.db). Kanban migration is tracked as a follow-up issue. Rationale:

  • SessionDB has the highest contention and data-loss impact (43MB, every-turn writes)
  • Kanban is read-mostly with fewer concurrent writers
  • Splitting the scope keeps the first PR reviewable (~2-3K lines vs ~5K+)

10. Related Issues

  • #3139 (closed): session_search permanently disabled by concurrent CLI + gateway writes
  • #5563 (open, P1): state.db corruption + session fragmentation (69% token waste)
  • #860 (closed): Duplicate message accumulation in SQLite
  • #14210 (open): macOS Too many open files (state.db)
  • #22032 (open): NFS locking protocol breaks /resume
  • #15733 (closed): hermes update wiped state.db
  • #6702 (open): hermes update kills cron workers mid-write
  • PR #3249 (merged): Current SQLite contention fixes (band-aids)
  • PR #4623 (merged): Pluggable MemoryProvider interface (pattern to follow)

Drafted by the community. All hands on deck. 🏴‍☠️

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING