hermes - ✅(Solved) Fix Background review notification includes stale tool results from conversation history [3 pull requests, 2 comments, 3 participants]

hermes2026-04-24 06:27:03

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#14944•Fetched 2026-04-24 10:44:08

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

labeled ×5cross-referenced ×3commented ×2closed ×1

Root Cause

prevents automatic background memory review from triggering, but this is not ideal because it disables useful automatic memory review behavior.

Fix Action

Workaround

Setting:

memory:
  nudge_interval: 0

prevents automatic background memory review from triggering, but this is not ideal because it disables useful automatic memory review behavior.

The issue can still occur when automatic memory review is enabled, for example:

memory:
  nudge_interval: 10

PR fix notes

PR #14967: fix(agent): exclude prior-history tool messages from background review summary

Repository: NousResearch/hermes-agent
Author: luyao618
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14967

Description (problem / solution / changelog)

What does this PR do?

Fixes a bug where the background memory/skill review's user-visible summary (💾 ...) re-surfaces stale tool successes from the prior conversation as if they had just happened.

_spawn_background_review forks a new AIAgent initialized with conversation_history=messages_snapshot. The forked agent's _session_messages therefore contains tool messages copied from the prior conversation. The post-review scan that builds the summary walked the entire _session_messages list and reported every successful tool result it found, so historical actions (e.g. an earlier Cron job '...' created.) were re-announced — sometimes repeatedly across unrelated background-review runs.

Related Issue

Fixes #14944

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

Extracted the scan into a new AIAgent._summarize_background_review_actions staticmethod for testability.
Before scanning, collect every tool_call_id already present in messages_snapshot and skip review messages whose tool_call_id matches — those are inherited from the prior conversation, not new actions.
For tool messages without a tool_call_id, fall back to content-equality against the prior snapshot's anonymous tool messages.
Hardened data handling so a non-dict JSON payload no longer raises in the data.get("success") branch.

How to Test

Repro per the issue: in a gateway session create a one-shot cron reminder, then continue chatting until the background memory/profile review fires. Before this fix, the next review's 💾 notification could include Cron job '<reminder>' created. even though no cron was created during that review. After the fix it doesn't.
Run the targeted tests: pytest tests/run_agent/test_background_review_summary.py -v
Run the broader run_agent suite: pytest tests/run_agent/test_run_agent.py -q

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix
I've run pytest tests/ -q and all tests pass
I've added tests for my changes
I've tested on my platform: macOS (Darwin 25.4.0, Apple Silicon), Python 3.11

Documentation & Housekeeping

Updated relevant documentation — N/A
Updated cli-config.yaml.example — N/A
Updated contributing / agents docs — N/A
Considered cross-platform impact — N/A (logic-only, no platform-specific paths)
Updated tool descriptions/schemas — N/A

Changed files

run_agent.py (modified, +72/-26)
tests/run_agent/test_background_review_summary.py (added, +130/-0)

PR #14969: fix(agent): scan only new tool results in bg review, skip snapshot history (#14944)

Repository: NousResearch/hermes-agent
Author: Bartok9
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/14969

Description (problem / solution / changelog)

Problem

Background review notifications were surfacing old successful tool results from the conversation history as if they had just been performed by the review agent.

For example, after a user created a cron job in an earlier turn, a later unrelated background review would repeat:

💾 Cron job '<reminder name>' created. · User profile updated

...even though the review agent never touched cron — it only updated the profile.

Reported in #14944. Related code path also noted in #9055 and #9696.

Root Cause

_spawn_background_review passes conversation_history=messages_snapshot to the review agent, which populates _session_messages with the full prior history (including old role=tool messages). The post-run scan then iterated over all of _session_messages — treating historical tool results as new review actions.

# Before: scans entire _session_messages including snapshot history
for msg in getattr(review_agent, "_session_messages", []):
    ...

Fix

Record the snapshot length before run_conversation(), then slice to only the messages the review agent actually appended:

snapshot_len = len(messages_snapshot)
all_msgs = getattr(review_agent, "_session_messages", [])
new_msgs = all_msgs[snapshot_len:]  # only messages added by the review agent

for msg in new_msgs:
    ...

This is a minimal, zero-risk change — it only affects which messages are summarised in the notification, not the review agent's behaviour or memory writes.

Tests

Added tests/agent/test_bg_review_stale_tool_results.py with 7 unit tests:

Test	Verifies
`test_stale_tool_result_not_surfaced`	Historical tool results ignored
`test_new_tool_result_is_surfaced`	Review agent's own results reported
`test_mixed_stale_and_new_only_new_surfaced`	Mixed history: only new results show
`test_empty_snapshot_all_msgs_scanned`	No snapshot → all messages are new
`test_failed_tool_results_excluded`	Failures never shown
`test_deduplication_preserved`	Duplicate strings collapsed
`test_regression_issue_14944`	Exact scenario from the bug report

All 7 pass ✅

Closes #14944

Changed files

run_agent.py (modified, +15/-3)
tests/agent/test_bg_review_stale_tool_results.py (added, +216/-0)

PR #15057: fix(agent): exclude prior-history tool messages from background review summary (salvage #14967)

Repository: NousResearch/hermes-agent
Author: teknium1
State: closed | merged: True
Link: https://github.com/NousResearch/hermes-agent/pull/15057

Description (problem / solution / changelog)

Salvage of #14967 by @luyao618 onto current main. Chosen over the parallel #14969 for robustness.

Closes #14944.

What this PR does

Stops the background memory/skill review from re-surfacing stale tool results from the prior conversation as if they just happened. After e.g. creating a cron reminder, subsequent 💾 background-review notifications would include Cron job '<name>' created. again on every run, even though cron wasn't touched.

How

The review agent forks with conversation_history=messages_snapshot, so its _session_messages contains inherited tool messages. The scan that builds the 💾 summary walked the whole list and treated historical tool successes as new review actions.

@luyao618 extracts the scan into a testable AIAgent._summarize_background_review_actions staticmethod that:

Records every tool_call_id in the snapshot and skips review messages whose tool_call_id matches
Falls back to content-equality for tool messages that lack a tool_call_id
Hardens the data.get('success') branch against non-dict JSON payloads (latent bug — bare-string/list content previously raised)

Why this over #14969

#14969 used a slice approach (_session_messages[len(snapshot):]) which is smaller but brittle: if any future init step reorders, filters, or deduplicates the history (compression, prefix-cache replay, future hydration logic), the slice boundary silently drifts and stale results leak through again. ID-based matching is immune. #14967 also matches the issue author's explicit suggested approach verbatim and fixes the non-dict JSON crash.

Validation

	Before	After
Stale 'Cron created' surfaced in later review	yes	skipped
New 'User profile updated' action surfaced	yes	yes
Non-dict JSON tool content (bare string/list)	crash	gracefully skipped

tests/run_agent/test_background_review_summary.py — 8/8 pass (new file)
Full tests/run_agent/ — 940/940 pass (the 2 other failures are pre-existing on current main, unrelated)
E2E: reproduced the exact #14944 scenario; fix filters stale cron + preserves new review action

Co-authored-by: @luyao618

Changed files

run_agent.py (modified, +72/-26)
tests/run_agent/test_background_review_summary.py (added, +130/-0)

Code Example

💾 Cron job '<reminder name>' created. · User profile updated

---

💾 User profile updated

---

review_agent.run_conversation(
    user_message=prompt,
    conversation_history=messages_snapshot,
)

# Scan the review agent's messages for successful tool actions
# and surface a compact summary to the user.
actions = []
for msg in getattr(review_agent, "_session_messages", []):
    if not isinstance(msg, dict) or msg.get("role") != "tool":
        continue
    ...
    message = data.get("message", "")
    ...
    if "created" in message.lower():
        actions.append(message)

---

conversation_history=messages_snapshot

---

Create a one-shot cron reminder.

---

💾 Cron job '<reminder name>' created. · User profile updated

---

existing_tool_call_ids = {
    msg.get("tool_call_id")
    for msg in messages_snapshot
    if isinstance(msg, dict)
    and msg.get("role") == "tool"
    and msg.get("tool_call_id")
}

...

for msg in getattr(review_agent, "_session_messages", []):
    if msg.get("role") != "tool":
        continue
    if msg.get("tool_call_id") in existing_tool_call_ids:
        continue
    ...

---

memory:
  nudge_interval: 0

---

memory:
  nudge_interval: 10

RAW_BUFFERClick to expand / collapse

Bug Description

The background memory/skill review can repeatedly surface old successful tool actions as if they had just happened.

For example, after creating a one-shot cron reminder, later unrelated background review notifications may include the old cron creation success message again:

💾 Cron job '<reminder name>' created. · User profile updated

This can happen multiple times even though the cron job was not recreated.

Expected Behavior

Background memory/skill review notifications should only summarize actions performed by the background review agent itself.

For example:

💾 User profile updated

They should not include successful tool results that already existed in the conversation history before the background review started.

Actual Behavior

The background review appears to scan successful tool messages from the review agent’s full _session_messages, including tool messages copied from the prior conversation history.

As a result, older tool results such as cron job creation can be included again in a new background notification summary.

Diagnosis

The relevant logic appears to be in run_agent.py, around _spawn_background_review():

review_agent.run_conversation(
    user_message=prompt,
    conversation_history=messages_snapshot,
)

# Scan the review agent's messages for successful tool actions
# and surface a compact summary to the user.
actions = []
for msg in getattr(review_agent, "_session_messages", []):
    if not isinstance(msg, dict) or msg.get("role") != "tool":
        continue
    ...
    message = data.get("message", "")
    ...
    if "created" in message.lower():
        actions.append(message)

Since the review agent is initialized with:

conversation_history=messages_snapshot

its _session_messages may include old tool messages from the main conversation. The summarizer then treats those historical tool results as newly performed actions.

Steps to Reproduce

In a gateway session, perform a tool action that returns a successful message containing "created".

Example:
```
Create a one-shot cron reminder.
```
Continue chatting until background memory/profile review triggers.
Trigger or allow a user profile or memory update.
Observe the background review notification.

Observed Result

The notification may include a stale tool result:

💾 Cron job '<reminder name>' created. · User profile updated

even though the cron job was created earlier and was not recreated.

Suggested Fix

When summarizing actions from the background review agent, ignore tool messages that were already present in messages_snapshot.

Possible approaches:

Record existing tool_call_ids before running the background review, then skip tool messages with those IDs.
Record existing tool message contents as a fallback for messages without tool_call_id.
Alternatively, collect only tool messages appended after review_agent.run_conversation() starts, instead of scanning the full _session_messages.

Example:

existing_tool_call_ids = {
    msg.get("tool_call_id")
    for msg in messages_snapshot
    if isinstance(msg, dict)
    and msg.get("role") == "tool"
    and msg.get("tool_call_id")
}

...

for msg in getattr(review_agent, "_session_messages", []):
    if msg.get("role") != "tool":
        continue
    if msg.get("tool_call_id") in existing_tool_call_ids:
        continue
    ...

Workaround

Setting:

memory:
  nudge_interval: 0

prevents automatic background memory review from triggering, but this is not ideal because it disables useful automatic memory review behavior.

The issue can still occur when automatic memory review is enabled, for example:

memory:
  nudge_interval: 10

Affected Area

Gateway sessions
Background memory/skill review
Background review notification summaries
run_agent.py / _spawn_background_review()

extent analysis

TL;DR

The most likely fix is to modify the background review agent to ignore tool messages that were already present in the conversation history before the review started.

Guidance

Identify the existing tool messages in the conversation history before running the background review, and skip them when summarizing actions.
Consider recording existing tool_call_ids or tool message contents to filter out historical tool results.
Modify the run_agent.py file, specifically the _spawn_background_review() function, to implement the suggested fix.
Test the changes by reproducing the issue and verifying that the background review notification no longer includes stale tool results.

Example

existing_tool_call_ids = {
    msg.get("tool_call_id")
    for msg in messages_snapshot
    if isinstance(msg, dict)
    and msg.get("role") == "tool"
    and msg.get("tool_call_id")
}

for msg in getattr(review_agent, "_session_messages", []):
    if msg.get("role") != "tool":
        continue
    if msg.get("tool_call_id") in existing_tool_call_ids:
        continue
    # Process the message

Notes

The suggested fix assumes that the tool_call_id is unique for each tool action. If this is not the case, an alternative approach may be needed. Additionally, the workaround of setting nudge_interval to 0 is not recommended as it disables useful automatic memory review behavior.

Recommendation

Apply the suggested fix by modifying the run_agent.py file to ignore tool messages that were already present in the conversation history. This approach addresses the root cause of the issue and allows for the background review to function as intended.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#conversation history #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Background review notification includes stale tool results from conversation history [3 pull requests, 2 comments, 3 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

PR fix notes

PR #14967: fix(agent): exclude prior-history tool messages from background review summary

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

Changed files

PR #14969: fix(agent): scan only new tool results in bg review, skip snapshot history (#14944)

Description (problem / solution / changelog)

Problem

Root Cause

Fix

Tests

Changed files

PR #15057: fix(agent): exclude prior-history tool messages from background review summary (salvage #14967)

Description (problem / solution / changelog)

What this PR does

How

Why this over #14969

Validation

Changed files

Code Example

Bug Description

Expected Behavior

Actual Behavior

Diagnosis

Steps to Reproduce

Observed Result

Suggested Fix

Workaround

Affected Area

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING