hermes - 💡(How to fix) Fix [Resilience] REASONING_SCRATCHPAD false positive detection triggers retry loop [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#11620Fetched 2026-04-18 05:59:48
View on GitHub
Comments
2
Participants
2
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
commented ×2

Error Message

  1. Context contamination: When the conversation context contains code examples, error logs, or tool outputs that include the literal string <REASONING_SCRATCHPAD> (e.g., from searching source code), the model may reference this text in its response, triggering the detector.

Code Example

def has_incomplete_scratchpad(text: str) -> bool:
    import re
    cleaned = re.sub(r'
RAW_BUFFERClick to expand / collapse

Problem

The has_incomplete_scratchpad check in trajectory.py uses a simple string-contains check for <REASONING_SCRATCHPAD>. This causes false positives in several scenarios:

  1. Context contamination: When the conversation context contains code examples, error logs, or tool outputs that include the literal string <REASONING_SCRATCHPAD> (e.g., from searching source code), the model may reference this text in its response, triggering the detector.

  2. Retry loop: After the first false positive triggers a retry, the context does not change, so the model produces the same output, triggering another false positive. After verification_gate_retries attempts, the entire response is dropped.

  3. No XML structure validation: The check only looks for the opening tag, not whether it is actually an incomplete XML block at the root level of the response.

Suggested Fix

Strip code blocks before checking, preventing false positives from quoted source code:

def has_incomplete_scratchpad(text: str) -> bool:
    import re
    cleaned = re.sub(r'```.*?```', '', text, flags=re.DOTALL)
    cleaned = re.sub(r'`[^`]*`', '', cleaned)
    opens = len(re.findall(r'<REASONING_SCRATCHPAD>', cleaned))
    closes = len(re.findall(r'</REASONING_SCRATCHPAD>', cleaned))
    return opens > closes

Environment

  • Hermes Agent (latest main)
  • Providers affected: zai (glm-5.1), MiniMax M2.7

Impact

Medium - causes intermittent response failures when context contains the literal string.

extent analysis

TL;DR

Implementing a more robust check for incomplete scratchpad, such as the suggested fix, can help mitigate false positives by properly handling context contamination and ensuring accurate XML structure validation.

Guidance

  • Modify the has_incomplete_scratchpad function to strip code blocks and validate XML structure, as suggested, to reduce false positives.
  • Verify the effectiveness of the fix by testing scenarios that previously triggered false positives, such as conversations with code examples or error logs containing the literal string <REASONING_SCRATCHPAD>.
  • Consider adding additional logging or monitoring to track the frequency of false positives before and after implementing the fix to measure its impact.
  • Review the verification_gate_retries mechanism to ensure it is appropriately configured to handle legitimate retries without dropping responses unnecessarily.

Example

The provided code snippet in the issue body demonstrates a potential solution:

def has_incomplete_scratchpad(text: str) -> bool:
    import re
    cleaned = re.sub(r'```.*?```', '', text, flags=re.DOTALL)
    cleaned = re.sub(r'`[^`]*`', '', cleaned)
    opens = len(re.findall(r'<REASONING_SCRATCHPAD>', cleaned))
    closes = len(re.findall(r'</REASONING_SCRATCHPAD>', cleaned))
    return opens > closes

This example strips code blocks and checks for the presence of opening and closing tags to determine if a scratchpad is incomplete.

Notes

The suggested fix assumes that the <REASONING_SCRATCHPAD> tag is always properly formatted and that the issue is primarily caused by context contamination and lack of XML structure validation. Additional testing may be necessary to ensure the fix does not introduce new issues or affect other parts of the system.

Recommendation

Apply the suggested workaround by modifying the has_incomplete_scratchpad function as described, to mitigate false positives and improve the overall reliability of the response generation system. This approach directly addresses the identified causes of the issue and provides a clear, step-by-step solution to enhance the system's performance.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING