hermes - ✅(Solved) Fix Background review notification includes stale tool results from conversation history [3 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#14944Fetched 2026-04-24 10:44:08
View on GitHub
Comments
2
Participants
3
Timeline
11
Reactions
0
Timeline (top)
labeled ×5cross-referenced ×3commented ×2closed ×1

Root Cause

prevents automatic background memory review from triggering, but this is not ideal because it disables useful automatic memory review behavior.

Fix Action

Workaround

Setting:

memory:
  nudge_interval: 0

prevents automatic background memory review from triggering, but this is not ideal because it disables useful automatic memory review behavior.

The issue can still occur when automatic memory review is enabled, for example:

memory:
  nudge_interval: 10

PR fix notes

PR #14967: fix(agent): exclude prior-history tool messages from background review summary

Description (problem / solution / changelog)

What does this PR do?

Fixes a bug where the background memory/skill review's user-visible summary (💾 ...) re-surfaces stale tool successes from the prior conversation as if they had just happened.

_spawn_background_review forks a new AIAgent initialized with conversation_history=messages_snapshot. The forked agent's _session_messages therefore contains tool messages copied from the prior conversation. The post-review scan that builds the summary walked the entire _session_messages list and reported every successful tool result it found, so historical actions (e.g. an earlier Cron job '...' created.) were re-announced — sometimes repeatedly across unrelated background-review runs.

Related Issue

Fixes #14944

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • Extracted the scan into a new AIAgent._summarize_background_review_actions staticmethod for testability.
  • Before scanning, collect every tool_call_id already present in messages_snapshot and skip review messages whose tool_call_id matches — those are inherited from the prior conversation, not new actions.
  • For tool messages without a tool_call_id, fall back to content-equality against the prior snapshot's anonymous tool messages.
  • Hardened data handling so a non-dict JSON payload no longer raises in the data.get("success") branch.

How to Test

  1. Repro per the issue: in a gateway session create a one-shot cron reminder, then continue chatting until the background memory/profile review fires. Before this fix, the next review's 💾 notification could include Cron job '<reminder>' created. even though no cron was created during that review. After the fix it doesn't.
  2. Run the targeted tests: pytest tests/run_agent/test_background_review_summary.py -v
  3. Run the broader run_agent suite: pytest tests/run_agent/test_run_agent.py -q

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes
  • I've tested on my platform: macOS (Darwin 25.4.0, Apple Silicon), Python 3.11

Documentation & Housekeeping

  • Updated relevant documentation — N/A
  • Updated cli-config.yaml.example — N/A
  • Updated contributing / agents docs — N/A
  • Considered cross-platform impact — N/A (logic-only, no platform-specific paths)
  • Updated tool descriptions/schemas — N/A

Changed files

  • run_agent.py (modified, +72/-26)
  • tests/run_agent/test_background_review_summary.py (added, +130/-0)

PR #14969: fix(agent): scan only new tool results in bg review, skip snapshot history (#14944)

Description (problem / solution / changelog)

Problem

Background review notifications were surfacing old successful tool results from the conversation history as if they had just been performed by the review agent.

For example, after a user created a cron job in an earlier turn, a later unrelated background review would repeat:

💾 Cron job '<reminder name>' created. · User profile updated

...even though the review agent never touched cron — it only updated the profile.

Reported in #14944. Related code path also noted in #9055 and #9696.

Root Cause

_spawn_background_review passes conversation_history=messages_snapshot to the review agent, which populates _session_messages with the full prior history (including old role=tool messages). The post-run scan then iterated over all of _session_messages — treating historical tool results as new review actions.

# Before: scans entire _session_messages including snapshot history
for msg in getattr(review_agent, "_session_messages", []):
    ...

Fix

Record the snapshot length before run_conversation(), then slice to only the messages the review agent actually appended:

snapshot_len = len(messages_snapshot)
all_msgs = getattr(review_agent, "_session_messages", [])
new_msgs = all_msgs[snapshot_len:]  # only messages added by the review agent

for msg in new_msgs:
    ...

This is a minimal, zero-risk change — it only affects which messages are summarised in the notification, not the review agent's behaviour or memory writes.

Tests

Added tests/agent/test_bg_review_stale_tool_results.py with 7 unit tests:

TestVerifies
test_stale_tool_result_not_surfacedHistorical tool results ignored
test_new_tool_result_is_surfacedReview agent's own results reported
test_mixed_stale_and_new_only_new_surfacedMixed history: only new results show
test_empty_snapshot_all_msgs_scannedNo snapshot → all messages are new
test_failed_tool_results_excludedFailures never shown
test_deduplication_preservedDuplicate strings collapsed
test_regression_issue_14944Exact scenario from the bug report

All 7 pass ✅

Closes #14944

Changed files

  • run_agent.py (modified, +15/-3)
  • tests/agent/test_bg_review_stale_tool_results.py (added, +216/-0)

PR #15057: fix(agent): exclude prior-history tool messages from background review summary (salvage #14967)

Description (problem / solution / changelog)

Salvage of #14967 by @luyao618 onto current main. Chosen over the parallel #14969 for robustness.

Closes #14944.

What this PR does

Stops the background memory/skill review from re-surfacing stale tool results from the prior conversation as if they just happened. After e.g. creating a cron reminder, subsequent 💾 background-review notifications would include Cron job '<name>' created. again on every run, even though cron wasn't touched.

How

The review agent forks with conversation_history=messages_snapshot, so its _session_messages contains inherited tool messages. The scan that builds the 💾 summary walked the whole list and treated historical tool successes as new review actions.

@luyao618 extracts the scan into a testable AIAgent._summarize_background_review_actions staticmethod that:

  • Records every tool_call_id in the snapshot and skips review messages whose tool_call_id matches
  • Falls back to content-equality for tool messages that lack a tool_call_id
  • Hardens the data.get('success') branch against non-dict JSON payloads (latent bug — bare-string/list content previously raised)

Why this over #14969

#14969 used a slice approach (_session_messages[len(snapshot):]) which is smaller but brittle: if any future init step reorders, filters, or deduplicates the history (compression, prefix-cache replay, future hydration logic), the slice boundary silently drifts and stale results leak through again. ID-based matching is immune. #14967 also matches the issue author's explicit suggested approach verbatim and fixes the non-dict JSON crash.

Validation

BeforeAfter
Stale 'Cron created' surfaced in later reviewyesskipped
New 'User profile updated' action surfacedyesyes
Non-dict JSON tool content (bare string/list)crashgracefully skipped
  • tests/run_agent/test_background_review_summary.py — 8/8 pass (new file)
  • Full tests/run_agent/ — 940/940 pass (the 2 other failures are pre-existing on current main, unrelated)
  • E2E: reproduced the exact #14944 scenario; fix filters stale cron + preserves new review action

Co-authored-by: @luyao618

Changed files

  • run_agent.py (modified, +72/-26)
  • tests/run_agent/test_background_review_summary.py (added, +130/-0)

Code Example

💾 Cron job '<reminder name>' created. · User profile updated

---

💾 User profile updated

---

review_agent.run_conversation(
    user_message=prompt,
    conversation_history=messages_snapshot,
)

# Scan the review agent's messages for successful tool actions
# and surface a compact summary to the user.
actions = []
for msg in getattr(review_agent, "_session_messages", []):
    if not isinstance(msg, dict) or msg.get("role") != "tool":
        continue
    ...
    message = data.get("message", "")
    ...
    if "created" in message.lower():
        actions.append(message)

---

conversation_history=messages_snapshot

---

Create a one-shot cron reminder.

---

💾 Cron job '<reminder name>' created. · User profile updated

---

existing_tool_call_ids = {
    msg.get("tool_call_id")
    for msg in messages_snapshot
    if isinstance(msg, dict)
    and msg.get("role") == "tool"
    and msg.get("tool_call_id")
}

...

for msg in getattr(review_agent, "_session_messages", []):
    if msg.get("role") != "tool":
        continue
    if msg.get("tool_call_id") in existing_tool_call_ids:
        continue
    ...

---

memory:
  nudge_interval: 0

---

memory:
  nudge_interval: 10
RAW_BUFFERClick to expand / collapse

Bug Description

The background memory/skill review can repeatedly surface old successful tool actions as if they had just happened.

For example, after creating a one-shot cron reminder, later unrelated background review notifications may include the old cron creation success message again:

💾 Cron job '<reminder name>' created. · User profile updated

This can happen multiple times even though the cron job was not recreated.

Expected Behavior

Background memory/skill review notifications should only summarize actions performed by the background review agent itself.

For example:

💾 User profile updated

They should not include successful tool results that already existed in the conversation history before the background review started.

Actual Behavior

The background review appears to scan successful tool messages from the review agent’s full _session_messages, including tool messages copied from the prior conversation history.

As a result, older tool results such as cron job creation can be included again in a new background notification summary.

Diagnosis

The relevant logic appears to be in run_agent.py, around _spawn_background_review():

review_agent.run_conversation(
    user_message=prompt,
    conversation_history=messages_snapshot,
)

# Scan the review agent's messages for successful tool actions
# and surface a compact summary to the user.
actions = []
for msg in getattr(review_agent, "_session_messages", []):
    if not isinstance(msg, dict) or msg.get("role") != "tool":
        continue
    ...
    message = data.get("message", "")
    ...
    if "created" in message.lower():
        actions.append(message)

Since the review agent is initialized with:

conversation_history=messages_snapshot

its _session_messages may include old tool messages from the main conversation. The summarizer then treats those historical tool results as newly performed actions.

Steps to Reproduce

  1. In a gateway session, perform a tool action that returns a successful message containing "created".

    Example:

    Create a one-shot cron reminder.
  2. Continue chatting until background memory/profile review triggers.

  3. Trigger or allow a user profile or memory update.

  4. Observe the background review notification.

Observed Result

The notification may include a stale tool result:

💾 Cron job '<reminder name>' created. · User profile updated

even though the cron job was created earlier and was not recreated.

Suggested Fix

When summarizing actions from the background review agent, ignore tool messages that were already present in messages_snapshot.

Possible approaches:

  1. Record existing tool_call_ids before running the background review, then skip tool messages with those IDs.
  2. Record existing tool message contents as a fallback for messages without tool_call_id.
  3. Alternatively, collect only tool messages appended after review_agent.run_conversation() starts, instead of scanning the full _session_messages.

Example:

existing_tool_call_ids = {
    msg.get("tool_call_id")
    for msg in messages_snapshot
    if isinstance(msg, dict)
    and msg.get("role") == "tool"
    and msg.get("tool_call_id")
}

...

for msg in getattr(review_agent, "_session_messages", []):
    if msg.get("role") != "tool":
        continue
    if msg.get("tool_call_id") in existing_tool_call_ids:
        continue
    ...

Workaround

Setting:

memory:
  nudge_interval: 0

prevents automatic background memory review from triggering, but this is not ideal because it disables useful automatic memory review behavior.

The issue can still occur when automatic memory review is enabled, for example:

memory:
  nudge_interval: 10

Affected Area

  • Gateway sessions
  • Background memory/skill review
  • Background review notification summaries
  • run_agent.py / _spawn_background_review()

extent analysis

TL;DR

The most likely fix is to modify the background review agent to ignore tool messages that were already present in the conversation history before the review started.

Guidance

  • Identify the existing tool messages in the conversation history before running the background review, and skip them when summarizing actions.
  • Consider recording existing tool_call_ids or tool message contents to filter out historical tool results.
  • Modify the run_agent.py file, specifically the _spawn_background_review() function, to implement the suggested fix.
  • Test the changes by reproducing the issue and verifying that the background review notification no longer includes stale tool results.

Example

existing_tool_call_ids = {
    msg.get("tool_call_id")
    for msg in messages_snapshot
    if isinstance(msg, dict)
    and msg.get("role") == "tool"
    and msg.get("tool_call_id")
}

for msg in getattr(review_agent, "_session_messages", []):
    if msg.get("role") != "tool":
        continue
    if msg.get("tool_call_id") in existing_tool_call_ids:
        continue
    # Process the message

Notes

The suggested fix assumes that the tool_call_id is unique for each tool action. If this is not the case, an alternative approach may be needed. Additionally, the workaround of setting nudge_interval to 0 is not recommended as it disables useful automatic memory review behavior.

Recommendation

Apply the suggested fix by modifying the run_agent.py file to ignore tool messages that were already present in the conversation history. This approach addresses the root cause of the issue and allows for the background review to function as intended.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Background review notification includes stale tool results from conversation history [3 pull requests, 2 comments, 3 participants]