hermes - 💡(How to fix) Fix Session trajectory JSON files carry no session_type or lineage metadata [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Session trajectory JSON files written by _save_session_log() (run_agent.py:5245–5256) serialize exactly ten keys. None identify the type of agent instance that produced the file. A post-conversation review agent, a delegate_task subagent, a compression continuation, and the original user conversation produce structurally identical JSON — same ten keys, same shape, no distinguishing field.

The agent object carries lineage information at runtime: self._parent_session_id (line 1934) is set for review agents and subagents. SQLite stores parent_session_id and end_reason for all session types. But _save_session_log() writes neither.

Root Cause

During diagnosis of #25839, a review agent session (session_20260515_141502_b42fde.json) could not be classified as review, subagent, or compression continuation from the JSON payload alone. Classification required reading message content for system-injected prompt strings. A session_type field would have eliminated this forensic archaeology.

Fix Action

Fixed

Code Example

{
  "session_id": "...",
  "model": "...",
  "base_url": "...",
  "platform": "...",
  "session_start": "...",
  "last_updated": "...",
  "system_prompt": "...",
  "tools": [...],
  "message_count": 187,
  "messages": [...]
}

---

{
  "session_id": "...",
  "session_type": "user" | "cron" | "review_agent" | "subagent" | "compression_split",
  "parent_session_id": "20260515_130956_aea47d79" | null,
  ...
}
RAW_BUFFERClick to expand / collapse

Summary

Session trajectory JSON files written by _save_session_log() (run_agent.py:5245–5256) serialize exactly ten keys. None identify the type of agent instance that produced the file. A post-conversation review agent, a delegate_task subagent, a compression continuation, and the original user conversation produce structurally identical JSON — same ten keys, same shape, no distinguishing field.

The agent object carries lineage information at runtime: self._parent_session_id (line 1934) is set for review agents and subagents. SQLite stores parent_session_id and end_reason for all session types. But _save_session_log() writes neither.

Why this matters

During diagnosis of #25839, a review agent session (session_20260515_141502_b42fde.json) could not be classified as review, subagent, or compression continuation from the JSON payload alone. Classification required reading message content for system-injected prompt strings. A session_type field would have eliminated this forensic archaeology.

What is known

Current JSON schema (lines 5245–5256)

Every trajectory file written by _save_session_log() contains exactly:

{
  "session_id": "...",
  "model": "...",
  "base_url": "...",
  "platform": "...",
  "session_start": "...",
  "last_updated": "...",
  "system_prompt": "...",
  "tools": [...],
  "message_count": 187,
  "messages": [...]
}

Verified across 2,726 trajectory JSON files in ~/.hermes/sessions/. Zero files contain session_type, parent_session_id, agent_context, spawned_by, or any lineage field. (The directory also contains 55 request_dump_*.json files with a different schema, and one sessions.json index — these are not trajectory files.)

Concrete example: review agent vs. parent session

From the #25839 incident:

FilenameInternal session_idPlatformsession_start
Parent sessionsession_20260515_130956_aea47d79.json20260515_130956_aea47d79telegram2026-05-15T13:46:21.391084
Review agentsession_20260515_141502_b42fde.json20260515_130956_aea47d79 ✗ (parent's)telegram2026-05-15T13:46:21.391084 (same)

The review agent's filename and embedded session_id disagree. This happens because session_id is pinned to the parent's at line 4342, but session_log_file (set at line 1913 from self.session_id in __init__) is never repointed. The result: a file whose filename says one thing and whose session_id field says another — with no field explaining why.

Scale: 60 trajectory files exhibit this filename/ID mismatch across the sessions directory. Additionally, the SQLite sessions table has no row for 20260515_141502_b42fde — the file exists on disk but has no DB record, making it invisible to session search tools.

Three spawn paths, three different behaviors — none recorded in JSON

Spawn typesession_idsession_startparent_session_id at runtimeReaches JSON?
Review agentPinned to parent's (line 4342)Pinned to parent's (line 4341)Passed via constructor (line 4306), stored at line 1934No
SubagentAuto-generated (fresh)Auto-generated (fresh)Passed via constructor (delegate_tool.py:1124), stored at line 1934No
Compression splitRotated to new ID (line 10468)NOT reset — inherits lineage-root start timeWritten to SQLite only (line 10483); same agent object, no constructor callNo — SQLite only

These three paths handle session_id, session_start, and lineage differently. Without a metadata field, distinguishing them requires reading either message content or SQLite.

SQLite stores what JSON drops

The SessionDB sessions table (hermes_state.py:197,200) stores parent_session_id and end_reason for DB-backed sessions. Across the database, 852 session rows have a non-null parent_session_id — lineage data that exists at query time but is absent from every trajectory JSON file. (Notably, the review agent session 20260515_141502_b42fde has no DB row at all — its JSON file exists on disk with no corresponding SQLite record.) Compression splits mark the pre-rotation session as end_reason="compression" at line 10466, then create the continuation row with parent_session_id=old_session_id at line 10483.

Platform provides partial identification, but not enough

Some spawn types write identifiable platform values: "cron" (229 files), "curator" (3), "acp" (3), "api_server", "tui", "cli". Additionally, most cron files (225 of 229) use the session_cron_ filename prefix. But platform does not distinguish review agents, subagents, or compression continuations — all of which inherit "telegram" (or whatever the parent used).

Proposed solution

Add two flat top-level fields to the JSON entry dict at _save_session_log():

{
  "session_id": "...",
  "session_type": "user" | "cron" | "review_agent" | "subagent" | "compression_split",
  "parent_session_id": "20260515_130956_aea47d79" | null,
  ...
}

Flat fields are preferred over a nested object: they are easier to grep, less migration-heavy, and consistent with the existing top-level key style. parent_session_id mirrors the existing SQLite column name.

Implementation: add a session_type attribute or constructor parameter, defaulting to "user". Set it at construction for user, cron, review-agent (line 4306), and subagent (delegate_tool.py:1124) sessions. For compression splits, update session_type on the existing object during rotation alongside the session_id and session_log_file update (line 10468–10476). self._parent_session_id already exists on the object at line 1934 and just needs to be written.

Alternatives considered

  1. Filename prefixes (session_review_...) — Fragile: if a session file is copied out of the sessions directory for incident analysis, the filename context is lost. JSON metadata travels with the file.
  2. Separate manifest / sidecar — Introduces a synchronization problem: the session file and manifest can disagree. The JSON file is the canonical record; metadata should live with it.
  3. Content heuristics — Scan message content for tool calls or system prompts to guess session type. This is what #25839 forced us to do. It is fragile, slow, and unreliable.

Related

  • #25839 — system impersonation of role: "user" to trick review agents; diagnosis blocked by missing lineage metadata
  • #20527 — hook-level turn_source proposal (complementary: plugin hooks vs. serialized trajectory files)

Environment

  • Hermes Agent commit: 9fb40e6a3d6338b6a6a616010de7a16672148924
  • Session files sampled: 2,726 trajectory JSON files in ~/.hermes/sessions/
  • SQLite: ~/.hermes/state.db, 852 rows with non-null parent_session_id

(Generated by AI Agent)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING