hermes - 💡(How to fix) Fix AIAgent._ensure_db_session() drops user_id on session-row creation; bg/cron/delegate sessions land orphan in state.db [4 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

AIAgent._ensure_db_session() (run_agent.py:2427-2447) calls SessionDB.create_session(...) with a hardcoded user_id=None, regardless of what user_id was passed to the AIAgent constructor. This means every agent that creates its own session row (without going through the gateway's SessionStore.get_or_create_session()) lands as an "orphan" in state.db.sessions:

  • /background (gateway/run.py:_run_background_task)
  • /branch (gateway/run.py:10940 — also passes no user_id)
  • Sub-agent delegations (delegate_task / sub-agent loops)
  • Cron-driven agent runs (cron/scheduler)
  • oneshot invocations
# run_agent.py:2432-2440
self._session_db.create_session(
    session_id=self.session_id,
    source=self.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
    model=self.model,
    model_config=self._session_init_model_config,
    system_prompt=self._cached_system_prompt,
    user_id=None,                       # <-- hardcoded; ignores self.user_id
    parent_session_id=self._parent_session_id,
)

On a moderately-active install (mine, 4 days old): 62 orphan sessions / 514 orphan messages out of 161 total. ~38% of the sessions table is orphan dark matter today.

Root Cause

AIAgent._ensure_db_session() (run_agent.py:2427-2447) calls SessionDB.create_session(...) with a hardcoded user_id=None, regardless of what user_id was passed to the AIAgent constructor. This means every agent that creates its own session row (without going through the gateway's SessionStore.get_or_create_session()) lands as an "orphan" in state.db.sessions:

  • /background (gateway/run.py:_run_background_task)
  • /branch (gateway/run.py:10940 — also passes no user_id)
  • Sub-agent delegations (delegate_task / sub-agent loops)
  • Cron-driven agent runs (cron/scheduler)
  • oneshot invocations
# run_agent.py:2432-2440
self._session_db.create_session(
    session_id=self.session_id,
    source=self.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
    model=self.model,
    model_config=self._session_init_model_config,
    system_prompt=self._cached_system_prompt,
    user_id=None,                       # <-- hardcoded; ignores self.user_id
    parent_session_id=self._parent_session_id,
)

On a moderately-active install (mine, 4 days old): 62 orphan sessions / 514 orphan messages out of 161 total. ~38% of the sessions table is orphan dark matter today.

Fix Action

Fixed

Code Example

# run_agent.py:2432-2440
self._session_db.create_session(
    session_id=self.session_id,
    source=self.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
    model=self.model,
    model_config=self._session_init_model_config,
    system_prompt=self._cached_system_prompt,
    user_id=None,                       # <-- hardcoded; ignores self.user_id
    parent_session_id=self._parent_session_id,
)

---

# In any chat, via gateway slash:
/background what is 2+2?

---

SELECT id, user_id, parent_session_id,
       (SELECT COUNT(*) FROM messages WHERE session_id = sessions.id) AS msgs
FROM sessions
WHERE id LIKE 'bg_%'
ORDER BY started_at DESC LIMIT 5;

---

# run_agent.py:_ensure_db_session
self._session_db.create_session(
    ...
    user_id=self.user_id,           # was: None
    parent_session_id=self._parent_session_id,
)

---

# run_agent.py:_ensure_db_session
def _derive_orphan_user_id(self) -> Optional[str]:
    # Synthetic session_ids that the gateway didn't pre-register through
    # SessionStore.get_or_create_session() should still carry a user_id
    # so listing UIs can filter / group, but the user_id should be
    # NAMESPACED so it doesn't collide with the originating chat.
    if self.user_id is None:
        return None
    sid = self.session_id or ""
    if sid.startswith(("bg_", "cron_", "delegate_")):
        prefix = sid.split("_", 1)[0]
        return f"{prefix}:{self.user_id}"
    return self.user_id

self._session_db.create_session(
    ...
    user_id=self._derive_orphan_user_id(),
    ...
)
RAW_BUFFERClick to expand / collapse

Summary

AIAgent._ensure_db_session() (run_agent.py:2427-2447) calls SessionDB.create_session(...) with a hardcoded user_id=None, regardless of what user_id was passed to the AIAgent constructor. This means every agent that creates its own session row (without going through the gateway's SessionStore.get_or_create_session()) lands as an "orphan" in state.db.sessions:

  • /background (gateway/run.py:_run_background_task)
  • /branch (gateway/run.py:10940 — also passes no user_id)
  • Sub-agent delegations (delegate_task / sub-agent loops)
  • Cron-driven agent runs (cron/scheduler)
  • oneshot invocations
# run_agent.py:2432-2440
self._session_db.create_session(
    session_id=self.session_id,
    source=self.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
    model=self.model,
    model_config=self._session_init_model_config,
    system_prompt=self._cached_system_prompt,
    user_id=None,                       # <-- hardcoded; ignores self.user_id
    parent_session_id=self._parent_session_id,
)

On a moderately-active install (mine, 4 days old): 62 orphan sessions / 514 orphan messages out of 161 total. ~38% of the sessions table is orphan dark matter today.

Reproduction

Any of the above paths:

# In any chat, via gateway slash:
/background what is 2+2?

Inspect ~/.hermes/state.db:

SELECT id, user_id, parent_session_id,
       (SELECT COUNT(*) FROM messages WHERE session_id = sessions.id) AS msgs
FROM sessions
WHERE id LIKE 'bg_%'
ORDER BY started_at DESC LIMIT 5;

Resulting rows have user_id = NULL, parent_session_id = NULL. Same shape for cron_* and the timestamp-prefixed sub-agent rows.

Impact

For CLI / Slack / Telegram / Discord / WhatsApp: invisible cost — those surfaces don't browse sessions directly. Just storage growth.

For sidekick (and any other UI that does list sessions): today this is accidentally LOAD-BEARING. The sidekick drawer's recursive CTE filters WHERE user_id IS NOT NULL, so orphan sessions don't surface. If this bug is "fixed" by propagating self.user_id upstream, every /background in sidekick will start landing with user_id = current-chat-id (sidekick's data model: user_id IS the chat_id). The drawer's CTE would then roll bg sessions UNDER the user's current chat — inflating message_count, polluting transcript items with the bg agent's internal tool/assistant scratch rows.

Proposed fix

Two options, in increasing order of correctness:

Option A (minimal): plumb the constructor's user_id through

# run_agent.py:_ensure_db_session
self._session_db.create_session(
    ...
    user_id=self.user_id,           # was: None
    parent_session_id=self._parent_session_id,
)

But this BREAKS sidekick and any other UI listing sessions, since (for sidekick) the bg/cron/delegate session would now share user_id with the user's main chat.

Option B (recommended): namespace synthetic-session user_ids

Detect "fire-and-forget" agent creation paths and tag the row with a synthetic-prefixed user_id so listing UIs can distinguish:

# run_agent.py:_ensure_db_session
def _derive_orphan_user_id(self) -> Optional[str]:
    # Synthetic session_ids that the gateway didn't pre-register through
    # SessionStore.get_or_create_session() should still carry a user_id
    # so listing UIs can filter / group, but the user_id should be
    # NAMESPACED so it doesn't collide with the originating chat.
    if self.user_id is None:
        return None
    sid = self.session_id or ""
    if sid.startswith(("bg_", "cron_", "delegate_")):
        prefix = sid.split("_", 1)[0]
        return f"{prefix}:{self.user_id}"
    return self.user_id

self._session_db.create_session(
    ...
    user_id=self._derive_orphan_user_id(),
    ...
)

That way:

  • Listing UIs that previously filtered orphans (WHERE user_id IS NOT NULL) now see them and can group by the namespace prefix.
  • Sidekick can split its drawer query into "user chats" (no prefix) vs "background tasks" (prefix matches).
  • Existing CLI / chat-platform code paths are unchanged — they reach _ensure_db_session via paths where the gateway has already populated the row, so INSERT OR IGNORE is a no-op.

Why this matters now

There's no immediate functional bug — orphan rows accumulate but no UI surfaces them. The risk is the next person who "tidies up" _ensure_db_session() (it's an obvious code smell as written) will silently break every UI that lists sessions and assumes user_id identifies the originating chat. A proactive fix with the namespace prefix breaks no consumers and gives downstream UIs a clean primitive.

Happy to send a PR for either option if there's interest. Sidekick (Reimagine Robotics) carries the canonical "this is load-bearing" smoke as scripts/smoke/background-session-isolation.mjs so we'll catch any regression on upgrade.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING