hermes - ✅(Solved) Fix Bug: read_file tool's dedup mechanism pollutes file content with cache hint text [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Change the dedup return to use the error field instead of content, making it clear this is a skip signal rather than file content:

Use 'error' field instead of 'content' to avoid pollution.

"error": (

  1. Clear semantics: error field indicates this is not file content
  2. Consistent: Matches the pattern used elsewhere in file_tools.py for error returns Second read → ✅ Returns error field (dedup: true) Third read → ✅ Still returns error, no accumulation
  • Behavior change: Dedup hits return error instead of content, LLM receives explicit "skip" signal

Root Cause

Root Cause Analysis

Fix Action

Fixed

PR fix notes

PR #13098: fix(tools): return read_file dedup hint in error field, not content (#13079)

Description (problem / solution / changelog)

Problem

When the same read_file call hit the in-memory dedup cache, the tool returned the skip hint in the content field. Downstream tooling — in particular _subdirectory_hints.check_tool_call in run_agent.py — treats content as literal file text with line-number prefixes, which allowed the hint to splice into the file payload and nest across repeated reads, eventually corrupting what the model sees.

Fixes #13079.

Fix

Return the dedup hint in error instead. The dedup: True flag is preserved so callers that want to distinguish "skip" from a real error can still do so, but consumers that route content and error differently (including anything that line-number-prefixes file text) can no longer confuse the stub for file content.

Test plan

  • Updated test_second_read_returns_dedup_stub to assert the new contract (no content key, hint text lives in error).
  • pytest tests/tools/test_file_read_guards.py tests/tools/test_file_operations.py — 67 passed.
  • Reran the Python reproducer from the issue: second read returns {dedup: true, error: "SKIP: File unchanged..."} with no content key; first reads and post-modification reads still return normal content.

Changed files

  • tests/tools/test_file_read_guards.py (modified, +6/-2)
  • tools/file_tools.py (modified, +11/-4)

PR #13210: fix(file_tools): use error field instead of content in dedup stub

Description (problem / solution / changelog)

Summary

Dedup stub was returning a message in the 'content' field, which pollutes downstream processing that expects line-numbered file content. Now uses the 'error' field to signal a skip without corrupting content.

Changes

  • tools/file_tools.py: Change dedup stub to use 'error' field instead of 'content' to avoid corrupting downstream content processing
  • tests/tools/test_file_read_guards.py: Update test assertion to check 'error' field instead of 'content'

Fixes #13079

Changed files

  • tests/tools/test_file_read_guards.py (modified, +1/-378)
  • tools/file_tools.py (modified, +1/-858)

Code Example

Bug: read_file tool's dedup mechanism pollutes file content with cache hint text

---

## Describe the bug

The `read_file` tool's deduplication mechanism (in `tools/file_tools.py`) returns cache hint text in the `content` field when a file is re-read with the same parameters. This text gets appended to the actual file content, corrupting the file with pollution like:

---

After multiple re-reads, the file accumulates multiple copies of this hint text, making it unusable.

## Root Cause Analysis

### Normal read return format:

---

### Dedup hit return format (❌ broken):

---

**Problem**: The `content` field contains plain text instead of line-numbered format. When downstream logic (e.g., `subdir_hints` in `run_agent.py:7624-7625`) appends to this content, the cache hint becomes part of the file content and gets line numbers added on subsequent reads, creating nested pollution:

---

## Affected Code

**File**: `tools/file_tools.py`  
**Lines**: 347-355

---

## Proposed Fix

Change the dedup return to use the `error` field instead of `content`, making it clear this is a skip signal rather than file content:

---

## Benefits of This Fix

1. **Clear semantics**: `error` field indicates this is not file content
2. **No pollution**: The hint won't be appended to file content
3. **LLM-friendly**: Explicitly tells the LLM to stop re-reading
4. **Consistent**: Matches the pattern used elsewhere in file_tools.py for error returns

## Testing

I've verified this fix works correctly:

---

## Impact

- **Affected scenarios**: Re-reading the same file region within a session
- **Behavior change**: Dedup hits return `error` instead of `content`, LLM receives explicit "skip" signal
- **Backward compatibility**: No impact on first reads or reads after file changes

## Additional Context

This bug was discovered during a Wiki knowledge base audit where 10 files were corrupted with 395 lines of cache pollution text. The files have been cleaned, and the fix has been applied locally with successful verification.

## Environment

- Hermes Agent version: Latest (as of 2026-04-20)
- Python version: 3.x
- OS: Linux/Ubuntu

---

**Labels**: `bug`, `tools`, `high-priority`
RAW_BUFFERClick to expand / collapse

title: "Bug Report: read_file tool dedup cache pollution corrupts file content" created: 2026-04-20 updated: 2026-04-20 type: query tags: [bug, tool, web] sources: []

GitHub Issue Draft for Hermes Agent

Issue Title

Bug: read_file tool's dedup mechanism pollutes file content with cache hint text

Issue Body

## Describe the bug

The `read_file` tool's deduplication mechanism (in `tools/file_tools.py`) returns cache hint text in the `content` field when a file is re-read with the same parameters. This text gets appended to the actual file content, corrupting the file with pollution like:
 1|---
 2|title: "..."
 1|File unchanged since last read. The content from the earlier read_file result...

After multiple re-reads, the file accumulates multiple copies of this hint text, making it unusable.

## Root Cause Analysis

### Normal read return format:
```json
{
  "content": "     1|---\n     2|title: \"...\"\n     3|...",
  "total_lines": 50,
  "file_size": 1234
}

Dedup hit return format (❌ broken):

{
  "content": "File unchanged since last read. The content from the earlier read_file result in this conversation is still current — refer to that instead of re-reading.",
  "path": "...",
  "dedup": true
}

Problem: The content field contains plain text instead of line-numbered format. When downstream logic (e.g., subdir_hints in run_agent.py:7624-7625) appends to this content, the cache hint becomes part of the file content and gets line numbers added on subsequent reads, creating nested pollution:

     1|---
     1|     1|---
     2|     2|title: "..."

Affected Code

File: tools/file_tools.py
Lines: 347-355

if current_mtime == cached_mtime:
    return json.dumps({
        "content": (
            "File unchanged since last read. The content from "
            "the earlier read_file result in this conversation is "
            "still current — refer to that instead of re-reading."
        ),
        "path": path,
        "dedup": True,
    }, ensure_ascii=False)

Proposed Fix

Change the dedup return to use the error field instead of content, making it clear this is a skip signal rather than file content:

if current_mtime == cached_mtime:
    # Use 'error' field instead of 'content' to avoid pollution.
    # This signals to the LLM that this is a skip/warning, not file content.
    return json.dumps({
        "error": (
            "SKIP: File unchanged since last read. You already have this content "
            "from the earlier read_file call in this conversation. "
            "STOP re-reading and proceed with your task."
        ),
        "path": path,
        "dedup": True,
    }, ensure_ascii=False)

Benefits of This Fix

  1. Clear semantics: error field indicates this is not file content
  2. No pollution: The hint won't be appended to file content
  3. LLM-friendly: Explicitly tells the LLM to stop re-reading
  4. Consistent: Matches the pattern used elsewhere in file_tools.py for error returns

Testing

I've verified this fix works correctly:

First read  → ✅ Returns normal content (with line numbers)
Second read → ✅ Returns error field (dedup: true)
Third read  → ✅ Still returns error, no accumulation

Impact

  • Affected scenarios: Re-reading the same file region within a session
  • Behavior change: Dedup hits return error instead of content, LLM receives explicit "skip" signal
  • Backward compatibility: No impact on first reads or reads after file changes

Additional Context

This bug was discovered during a Wiki knowledge base audit where 10 files were corrupted with 395 lines of cache pollution text. The files have been cleaned, and the fix has been applied locally with successful verification.

Environment

  • Hermes Agent version: Latest (as of 2026-04-20)
  • Python version: 3.x
  • OS: Linux/Ubuntu

Labels: bug, tools, high-priority


## Reference

- Fix documentation: `/root/wiki/plans/fix-read-file-dedup.md`
- Audit report: `/root/wiki/logs/lint/audit-2026-04-20.md`

extent analysis

TL;DR

Change the read_file tool's deduplication mechanism to return a skip signal using the error field instead of the content field to avoid cache pollution.

Guidance

  • Identify the affected code in tools/file_tools.py (lines 347-355) and modify the dedup return to use the error field.
  • Verify the fix by testing multiple reads of the same file and checking that the cache hint text is not appended to the file content.
  • Review the run_agent.py file (lines 7624-7625) to ensure that the downstream logic handles the new error field correctly.
  • Test the fix in different scenarios, including re-reading the same file region within a session, to ensure that the behavior change does not introduce any regressions.

Example

if current_mtime == cached_mtime:
    return json.dumps({
        "error": (
            "SKIP: File unchanged since last read. You already have this content "
            "from the earlier read_file call in this conversation. "
            "STOP re-reading and proceed with your task."
        ),
        "path": path,
        "dedup": True,
    }, ensure_ascii=False)

Notes

The proposed fix assumes that the error field is handled correctly by the downstream logic in run_agent.py. Additional testing and verification may be necessary to ensure that the fix works as expected in all scenarios.

Recommendation

Apply the proposed fix to change the dedup return to use the error field instead of the content field, as it provides clear semantics and avoids cache pollution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Bug: read_file tool's dedup mechanism pollutes file content with cache hint text [2 pull requests]