hermes - ✅(Solved) Fix `session_search` does not index `tool_calls` or `tool_name` [1 pull requests, 1 participants]

bradleylab · 2026-04-28T01:13:25Z

[hermes] PR 16770: fix session-search : index tool calls and tool name columns in FTS5 16751 - Repository: NousResearch/hermes-agent - Author: briandevans - St… # PR #16770: fix(session-search): index tool_calls and tool_name columns in FTS5 (#16751) - Repository: NousResearch/hermes-agent - Author: briandevans - State: open | merged: False - Link: https://github.com/NousResearch/hermes-agent/pull/16770 ## Description (problem / solution / changelog) ## Summary - Extend `messages_fts` and `messages_fts_trigram` with the `tool_calls` and `tool_name` columns from `messages` so `session_search` finds tokens that only appear in serialized tool-call args or tool names. - Add v11 migration that drops + recreates both FTS tables and backfills from `messages`. - Update triggers and the snippet column index (`iCol = -1`) to keep the snippet centered on the actual hit. - Mirror the broadened column set in the short-CJK LIKE fallback so all query paths share the same searchable surface. ## The bug (#16751) `messages_fts` is an external-content FTS5 table with a single `content` column. The triggers only insert `new.content`, so any token that lives in `messages.tool_calls` (TEXT JSON of function name + arguments) or `messages.tool_name` is never indexed. `messages_fts_trigram` (added in v10 for CJK substring search) has the same single-column shape. The reporter showed that even with ASCII tokens, `db.search_messages("FUNCNAMEMARKER")` returns 0 hits when the marker only appears inside a tool call — although the row is in the DB. This is **not** the CJK tokenizer issue tracked in #14829 / #15500; the gap is at the schema/trigger layer. ## The fix External-content FTS5 columns are fixed at CREATE time, so this requires drop-and-recreate plus a backfill, not an `ALTER TABLE`. * `FTS_SQL` and `FTS_TRIGRAM_SQL` now declare three columns (`content`, `tool_calls`, `tool_name`) matching the source columns in `messages`. `INSERT`/`DELETE`/`UPDATE` triggers populate all three. * `_init_schema` gains a v11 step that drops the old triggers + tables, recreates them via the updated SQL, and backfills with `INSERT INTO messages_fts(rowid, content, tool_calls, tool_name) SELECT id, content, tool_calls, tool_name FROM messages` (no `content IS NOT NULL` filter so tool-only assistant messages with empty content are also indexed). * `snippet(messages_fts, 0, …)` → `snippet(messages_fts, -1, …)` (and the same for the trigram path). Per FTS5 docs, `iCol = -1` lets the function pick the column with the highest score, so a hit inside `tool_calls` produces a snippet of the JSON args instead of an empty/wrong slice of `content`. Content-only hits still pick column 0 (no behavior change for the common path). * The 1–2 char CJK `LIKE` fallback now `OR`s the same three columns so the short-query path stays consistent with the FTS path. ## Test plan - [x] New `TestFTS5SearchToolCallsAndToolName` regression suite (`tests/test_hermes_state.py`): - token only in tool-call function name → found - token only in tool-call arguments → found - token only in `tool_name` → found - content-only match still works (invariant preserved) - `role_filter` still applies to tool-call matches - v11 migration: hand-built v10 DB with `tool_calls`/`tool_name` rows is searchable after `SessionDB(...)` opens it - [x] Full `tests/test_hermes_state.py` (230 tests): `230 passed` - [x] Adjacent suites that consume `SessionDB`: `tests/tools/test_session_search.py`, `tests/hermes_state/`, `tests/gateway/test_resume_command.py`, `tests/plugins/memory/test_hindsight_provider.py` — `328 passed` total - [x] Regression guard: stashed `hermes_state.py`, reran the new tests, 5/6 fail with the expected "0 hits" symptom; restored, 6/6 pass ## Contract protected - **Invariant:** every token persisted in `messages.content`, `messages.tool_calls`, or `messages.tool_name` is reachable through `SessionDB.search_messages(...)` for the matching row. - **Known-bad inputs (now covered):** assistant messages whose `content` is empty but whose tool-call function name or argument JSON contains the search token; tool-result messages whose `tool_name` is the only place the token appears. - **Future-input coverage:** the column list is keyed off the `messages` schema, so adding new tool-related text columns in the future only requires extending the FTS column list + triggers (the migration pattern is already in place). - **Negative case:** `test_role_filter_applies_to_tool_call_matches` — even with the broader index, `role_filter=["user"]` still excludes assistant tool-call rows. (Existing `test_search_special_chars_do_not_crash`, `test_search_quoted_phrase_preserved`, etc. still pass — the FTS5 query syntax surface is unchanged.) ## Related - Fixes #16751 - Out of scope (and unchanged): #14829 / #15500 (CJK tokenizer behavior on `content`) ## Changed files - `hermes_state.py` (modified, +91/-19) - `tests/test_hermes_state.py` (modified, +207/-3) ## Fixed - Fixed by PR: fix(session-search): index to

hermes2026-04-28 01:13:25

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#16751•Fetched 2026-04-28 06:50:59

View on GitHub

Comments

Participants

Timeline

Reactions

Author

bradleylab

Participants

bradleylab

Timeline (top)

labeled ×3referenced ×3cross-referenced ×1

Root Cause

User-facing symptom: Incomplete recall. session_search returns nothing for a token the user knows was in a prior session, because the token only ever appeared in tool_calls or tool_name.

Fix Action

Fixed

Fixed by PR: fix(session-search): index tool_calls and tool_name columns in FTS5 (#16751) (https://github.com/NousResearch/hermes-agent/pull/16770)

PR fix notes

PR #16770: fix(session-search): index tool_calls and tool_name columns in FTS5 (#16751)

Repository: NousResearch/hermes-agent
Author: briandevans
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/16770

Description (problem / solution / changelog)

Summary

Extend messages_fts and messages_fts_trigram with the tool_calls and tool_name columns from messages so session_search finds tokens that only appear in serialized tool-call args or tool names.
Add v11 migration that drops + recreates both FTS tables and backfills from messages.
Update triggers and the snippet column index (iCol = -1) to keep the snippet centered on the actual hit.
Mirror the broadened column set in the short-CJK LIKE fallback so all query paths share the same searchable surface.

The bug (#16751)

messages_fts is an external-content FTS5 table with a single content column. The triggers only insert new.content, so any token that lives in messages.tool_calls (TEXT JSON of function name + arguments) or messages.tool_name is never indexed. messages_fts_trigram (added in v10 for CJK substring search) has the same single-column shape.

The reporter showed that even with ASCII tokens, db.search_messages("FUNCNAMEMARKER") returns 0 hits when the marker only appears inside a tool call — although the row is in the DB. This is not the CJK tokenizer issue tracked in #14829 / #15500; the gap is at the schema/trigger layer.

The fix

External-content FTS5 columns are fixed at CREATE time, so this requires drop-and-recreate plus a backfill, not an ALTER TABLE.

FTS_SQL and FTS_TRIGRAM_SQL now declare three columns (content, tool_calls, tool_name) matching the source columns in messages. INSERT/DELETE/UPDATE triggers populate all three.
_init_schema gains a v11 step that drops the old triggers + tables, recreates them via the updated SQL, and backfills with INSERT INTO messages_fts(rowid, content, tool_calls, tool_name) SELECT id, content, tool_calls, tool_name FROM messages (no content IS NOT NULL filter so tool-only assistant messages with empty content are also indexed).
snippet(messages_fts, 0, …) → snippet(messages_fts, -1, …) (and the same for the trigram path). Per FTS5 docs, iCol = -1 lets the function pick the column with the highest score, so a hit inside tool_calls produces a snippet of the JSON args instead of an empty/wrong slice of content. Content-only hits still pick column 0 (no behavior change for the common path).
The 1–2 char CJK LIKE fallback now ORs the same three columns so the short-query path stays consistent with the FTS path.

Test plan

New TestFTS5SearchToolCallsAndToolName regression suite (tests/test_hermes_state.py):
- token only in tool-call function name → found
- token only in tool-call arguments → found
- token only in tool_name → found
- content-only match still works (invariant preserved)
- role_filter still applies to tool-call matches
- v11 migration: hand-built v10 DB with tool_calls/tool_name rows is searchable after SessionDB(...) opens it
Full tests/test_hermes_state.py (230 tests): 230 passed
Adjacent suites that consume SessionDB: tests/tools/test_session_search.py, tests/hermes_state/, tests/gateway/test_resume_command.py, tests/plugins/memory/test_hindsight_provider.py — 328 passed total
Regression guard: stashed hermes_state.py, reran the new tests, 5/6 fail with the expected "0 hits" symptom; restored, 6/6 pass

Contract protected

Invariant: every token persisted in messages.content, messages.tool_calls, or messages.tool_name is reachable through SessionDB.search_messages(...) for the matching row.
Known-bad inputs (now covered): assistant messages whose content is empty but whose tool-call function name or argument JSON contains the search token; tool-result messages whose tool_name is the only place the token appears.
Future-input coverage: the column list is keyed off the messages schema, so adding new tool-related text columns in the future only requires extending the FTS column list + triggers (the migration pattern is already in place).
Negative case: test_role_filter_applies_to_tool_call_matches — even with the broader index, role_filter=["user"] still excludes assistant tool-call rows. (Existing test_search_special_chars_do_not_crash, test_search_quoted_phrase_preserved, etc. still pass — the FTS5 query syntax surface is unchanged.)

Fixes #16751
Out of scope (and unchanged): #14829 / #15500 (CJK tokenizer behavior on content)

Changed files

hermes_state.py (modified, +91/-19)
tests/test_hermes_state.py (modified, +207/-3)

Code Example

import sys, tempfile
from pathlib import Path
sys.path.insert(0, "/path/to/hermes-agent")  # repo root of clean clone
import hermes_state

with tempfile.TemporaryDirectory() as td:
    db = hermes_state.SessionDB(db_path=Path(td) / "state.db")
    db.create_session(session_id="s1", source="cli")
    db.create_session(session_id="s2", source="cli")
    db.append_message("s1", role="assistant",
                      content="Uploading to BUCKETMARKER_CONTENT.")
    db.append_message("s2", role="assistant", content="",
        tool_calls=[{"id": "c1", "type": "function",
                     "function": {"name": "FUNCNAMEMARKER",
                                  "arguments": '{"cmd": "aws s3 cp x s3://BUCKETMARKER_TOOLCALL/"}'}}],
        tool_name="TOOLNAMEMARKER")

    print("via search_messages:")
    for tok in ("BUCKETMARKER_CONTENT", "BUCKETMARKER_TOOLCALL",
                "FUNCNAMEMARKER", "TOOLNAMEMARKER"):
        print(f"  {tok}: {len(db.search_messages(tok))} hit(s)")

    # Direct SQL check of persisted columns, not a public API path.
    print("via direct LIKE on messages table:")
    for tok in ("BUCKETMARKER_CONTENT", "BUCKETMARKER_TOOLCALL",
                "FUNCNAMEMARKER", "TOOLNAMEMARKER"):
        n = db._conn.execute(
            "SELECT COUNT(*) FROM messages "
            "WHERE content LIKE ? OR tool_calls LIKE ? OR tool_name LIKE ?",
            (f"%{tok}%", f"%{tok}%", f"%{tok}%"),
        ).fetchone()[0]
        print(f"  {tok}: {n} hit(s)")

---

via search_messages:
  BUCKETMARKER_CONTENT:  1 hit(s)
  BUCKETMARKER_TOOLCALL: 0 hit(s)
  FUNCNAMEMARKER:        0 hit(s)
  TOOLNAMEMARKER:        0 hit(s)
via direct LIKE on messages table:
  BUCKETMARKER_CONTENT:  1 hit(s)
  BUCKETMARKER_TOOLCALL: 1 hit(s)
  FUNCNAMEMARKER:        1 hit(s)
  TOOLNAMEMARKER:        1 hit(s)

---

CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
    content,
    content=messages,
    content_rowid=id
);

CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
END;
-- _delete and _update triggers also read only old.content / new.content.

RAW_BUFFERClick to expand / collapse

Summary: messages_fts is a virtual FTS5 table with a single content column whose external-content source is messages.content. The triggers populating it read only new.content / old.content. The tool_calls (TEXT, serialized JSON args) and tool_name columns on messages are not reached by the search_messages path, so tokens that only appear in those columns aren't found by session_search even when the row is in the DB. On main, messages_fts_trigram (added in schema v10) has the same single-column, content-only-trigger pattern.

This is not the CJK tokenizer issue tracked in #14829 (and #15500, closed as a duplicate of #14829). The repro below uses ASCII tokens, so the gap is at the schema and trigger layer, not in tokenization.

Reproduction (clean clone of main HEAD 46b4cf8d, schema v10; same behavior reproduces on v0.11.0 bf196a3fc, schema v8):

import sys, tempfile
from pathlib import Path
sys.path.insert(0, "/path/to/hermes-agent")  # repo root of clean clone
import hermes_state

with tempfile.TemporaryDirectory() as td:
    db = hermes_state.SessionDB(db_path=Path(td) / "state.db")
    db.create_session(session_id="s1", source="cli")
    db.create_session(session_id="s2", source="cli")
    db.append_message("s1", role="assistant",
                      content="Uploading to BUCKETMARKER_CONTENT.")
    db.append_message("s2", role="assistant", content="",
        tool_calls=[{"id": "c1", "type": "function",
                     "function": {"name": "FUNCNAMEMARKER",
                                  "arguments": '{"cmd": "aws s3 cp x s3://BUCKETMARKER_TOOLCALL/"}'}}],
        tool_name="TOOLNAMEMARKER")

    print("via search_messages:")
    for tok in ("BUCKETMARKER_CONTENT", "BUCKETMARKER_TOOLCALL",
                "FUNCNAMEMARKER", "TOOLNAMEMARKER"):
        print(f"  {tok}: {len(db.search_messages(tok))} hit(s)")

    # Direct SQL check of persisted columns, not a public API path.
    print("via direct LIKE on messages table:")
    for tok in ("BUCKETMARKER_CONTENT", "BUCKETMARKER_TOOLCALL",
                "FUNCNAMEMARKER", "TOOLNAMEMARKER"):
        n = db._conn.execute(
            "SELECT COUNT(*) FROM messages "
            "WHERE content LIKE ? OR tool_calls LIKE ? OR tool_name LIKE ?",
            (f"%{tok}%", f"%{tok}%", f"%{tok}%"),
        ).fetchone()[0]
        print(f"  {tok}: {n} hit(s)")

Expected (assuming session_search is meant to cover searchable message context stored in the messages table): all four tokens find the relevant row.

Actual (identical on v0.11.0 bf196a3fc and main HEAD 46b4cf8d):

via search_messages:
  BUCKETMARKER_CONTENT:  1 hit(s)
  BUCKETMARKER_TOOLCALL: 0 hit(s)
  FUNCNAMEMARKER:        0 hit(s)
  TOOLNAMEMARKER:        0 hit(s)
via direct LIKE on messages table:
  BUCKETMARKER_CONTENT:  1 hit(s)
  BUCKETMARKER_TOOLCALL: 1 hit(s)
  FUNCNAMEMARKER:        1 hit(s)
  TOOLNAMEMARKER:        1 hit(s)

On main the trigram table was probed separately with SELECT COUNT(*) FROM messages_fts_trigram WHERE messages_fts_trigram MATCH ? for each token, with the same result (1, 0, 0, 0). search_messages routes to the trigram table only for CJK queries with cjk_count >= 3, so non-CJK tokens like the ASCII markers above never hit either FTS table.

Where the gap is (hermes_state.py, identical on both refs; main also has messages_fts_trigram with the same single-column, content-only-trigger pattern):

CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
    content,
    content=messages,
    content_rowid=id
);

CREATE TRIGGER IF NOT EXISTS messages_fts_insert AFTER INSERT ON messages BEGIN
    INSERT INTO messages_fts(rowid, content) VALUES (new.id, new.content);
END;
-- _delete and _update triggers also read only old.content / new.content.

User-facing symptom: Incomplete recall. session_search returns nothing for a token the user knows was in a prior session, because the token only ever appeared in tool_calls or tool_name.

Possible implementation directions. A naive "concat tool_calls and tool_name into the existing FTS content column" approach breaks the external-content invariant: content=messages ties the FTS column to messages.content, so any divergence between indexed text and the source column desyncs FTS rebuild and snippet() / highlight(). Three sound alternatives:

Add tool_calls and tool_name as additional FTS5 columns alongside content, with matching source columns so the external-content contract holds. Enables column-scoped queries.
Add a denormalized messages.search_text column maintained from content, tool_calls, and tool_name, and have FTS mirror that column. Requires a backfill plus INSERT INTO messages_fts(messages_fts) VALUES('rebuild').
Switch to an internal-content or contentless FTS5 table whose indexed text the triggers manage directly. Trades off some snippet() / highlight() ergonomics.

I searched for prior work and didn't find an existing report on tool_calls / tool_name indexing. #15500 (closed as duplicate of open #14829) is about CJK tokenization and is unrelated to this gap.

extent analysis

TL;DR

The most likely fix is to add tool_calls and tool_name as additional FTS5 columns alongside content to enable complete recall of search queries.

Guidance

Identify the current FTS5 table structure and triggers in the hermes_state.py file to understand the existing implementation.
Consider the three proposed implementation directions: adding tool_calls and tool_name as additional FTS5 columns, creating a denormalized messages.search_text column, or switching to an internal-content or contentless FTS5 table.
Evaluate the trade-offs of each approach, including the impact on snippet() and highlight() functionality, data consistency, and query performance.
Choose the most suitable approach based on the specific requirements and constraints of the project.

Example

-- Add tool_calls and tool_name as additional FTS5 columns
CREATE VIRTUAL TABLE IF NOT EXISTS messages_fts USING fts5(
    content,
    tool_calls,
    tool_name,
    content=messages.content,
    tool_calls=messages.tool_calls,
    tool_name=messages.tool_name,
    content_rowid=id
);

Notes

The chosen solution should ensure data consistency and query performance while addressing the incomplete recall issue. It is essential to consider the implications of each approach on the existing codebase and user experience.

Recommendation

Apply a workaround by adding tool_calls and tool_name as additional FTS5 columns, as this approach enables column-scoped queries and maintains the external-content contract. This solution requires careful evaluation and testing to ensure it meets the project's requirements and does not introduce unintended consequences.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #prompt template #agent execution #callback error #memory management

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix `session_search` does not index `tool_calls` or `tool_name` [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #16770: fix(session-search): index tool_calls and tool_name columns in FTS5 (#16751)

Description (problem / solution / changelog)

Summary

The bug (#16751)

The fix

Test plan

Contract protected

Related

Changed files

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix `session_search` does not index `tool_calls` or `tool_name` [1 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #16770: fix(session-search): index tool_calls and tool_name columns in FTS5 (#16751)

Description (problem / solution / changelog)

Summary

The bug (#16751)

The fix

Test plan

Contract protected

Related

Changed files

Code Example

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING