hermes - ✅(Solved) Fix read_file returns line-numbered content that pollutes files when written back via write_file [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#19798Fetched 2026-05-05 06:05:09
View on GitHub
Comments
2
Participants
2
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
labeled ×3commented ×2cross-referenced ×1

Error Message

A non-aggressive approach (mirroring the existing _is_internal_file_status_text pattern): add a detection function that recognizes when content appears to be line-number-polluted, and reject the write with a clear error message telling the model to strip the prefixes.

Root Cause

read_file always line-numbers its output (tools/file_operations.py:594-595):

return ReadResult(
    content=self._add_line_numbers(read_output, offset),  # ← always line-numbered
    ...
)

Where _add_line_numbers() (line 440-451) prepends {i:6d}| to every line.

write_file writes content as-is with zero sanitization (tools/file_operations.py:749):

write_cmd = f"cat > {self._escape_shell_arg(path)}"
write_result = self._exec(write_cmd, stdin_data=content)  # ← no stripping

The existing guard only catches one specific case (tools/file_tools.py:798-802):

if _is_internal_file_status_text(content):
    return tool_error(
        "Refusing to write internal read_file status text as file content. ..."
    )

_is_internal_file_status_text() (line 274-302) ONLY checks for _READ_DEDUP_STATUS_MESSAGE — it is completely unaware of line-number pollution. Content like " 1|REAL_CONFIG=value" passes through undetected.

read_file_raw exists but is NOT exposed to the LLM (tools/file_operations.py:654). It reads without line numbers using cat, but there is no read_file_raw_tool wrapper, no READ_FILE_RAW_SCHEMA, and no registration in the tool registry. It is only used internally by patch_parser.py.

Fix Action

Fix / Workaround

read_file_raw exists but is NOT exposed to the LLM (tools/file_operations.py:654). It reads without line numbers using cat, but there is no read_file_raw_tool wrapper, no READ_FILE_RAW_SCHEMA, and no registration in the tool registry. It is only used internally by patch_parser.py.

PR fix notes

PR #19820: fix(file): reject read_file line-numbered writeback

Description (problem / solution / changelog)

Summary

  • Rejects write_file calls when the content is dominated by read_file's LINE_NUM|CONTENT display format.
  • Keeps sparse literal pipe content allowed so legitimate files with an occasional 1|value line still write normally.
  • Adds regression coverage for issue #19798.

Why

read_file intentionally shows line numbers to the model. When those display prefixes are echoed back into write_file, the prefixes are persisted into real source/config files. This adds a narrow write-side guard for consecutive line-numbered content before file operations run.

Test plan

  • scripts/run_tests.sh tests/tools/test_file_tools.py::TestWriteFileHandler -q
  • scripts/run_tests.sh tests/tools/test_file_tools.py tests/tools/test_file_write_safety.py -q

Fixes #19798

Changed files

  • tests/tools/test_file_tools.py (modified, +27/-0)
  • tools/file_tools.py (modified, +47/-3)

Code Example

1|setting: value
        2|other: thing

---

return ReadResult(
    content=self._add_line_numbers(read_output, offset),  # ← always line-numbered
    ...
)

---

write_cmd = f"cat > {self._escape_shell_arg(path)}"
write_result = self._exec(write_cmd, stdin_data=content)  # ← no stripping

---

if _is_internal_file_status_text(content):
    return tool_error(
        "Refusing to write internal read_file status text as file content. ..."
    )

---

def _is_line_number_polluted(content: str) -> bool:
    """Detect content that looks like it was copy-pasted from read_file output."""
    if not isinstance(content, str) or not content.strip():
        return False
    lines = content.split('\n')
    if len(lines) < 2:
        return False
    # Check if a significant portion of non-empty lines start with NUMBER|
    import re
    pattern = re.compile(r'^\s*\d+\|')
    non_empty = [l for l in lines if l.strip()]
    if len(non_empty) < 3:
        return False
    polluted = sum(1 for l in non_empty if pattern.match(l))
    return polluted / len(non_empty) > 0.8
RAW_BUFFERClick to expand / collapse

What happens

When the LLM reads a file with read_file and later writes modified content back with write_file, the LINE_NUM|CONTENT format from read_file can end up persisted in the actual file. This silently corrupts configuration files, source code, and especially .env — embedding line-number prefixes like 1| into real file content.

Steps to reproduce

  1. Have the agent read any text file: read_file("config.yaml")
  2. The agent receives:
         1|setting: value
         2|other: thing
  3. Agent modifies the content and calls: write_file("config.yaml", " 1|setting: new_value\n 2|other: thing")
  4. File on disk now literally starts with 1|setting: new_value — the 1| prefix is permanent.

This is not hypothetical. It happens in practice when models echo read_file output into write_file content, especially with weaker models or complex multi-step transformations where the model loses track of the display format vs. actual content distinction.

Root cause

read_file always line-numbers its output (tools/file_operations.py:594-595):

return ReadResult(
    content=self._add_line_numbers(read_output, offset),  # ← always line-numbered
    ...
)

Where _add_line_numbers() (line 440-451) prepends {i:6d}| to every line.

write_file writes content as-is with zero sanitization (tools/file_operations.py:749):

write_cmd = f"cat > {self._escape_shell_arg(path)}"
write_result = self._exec(write_cmd, stdin_data=content)  # ← no stripping

The existing guard only catches one specific case (tools/file_tools.py:798-802):

if _is_internal_file_status_text(content):
    return tool_error(
        "Refusing to write internal read_file status text as file content. ..."
    )

_is_internal_file_status_text() (line 274-302) ONLY checks for _READ_DEDUP_STATUS_MESSAGE — it is completely unaware of line-number pollution. Content like " 1|REAL_CONFIG=value" passes through undetected.

read_file_raw exists but is NOT exposed to the LLM (tools/file_operations.py:654). It reads without line numbers using cat, but there is no read_file_raw_tool wrapper, no READ_FILE_RAW_SCHEMA, and no registration in the tool registry. It is only used internally by patch_parser.py.

Evidence — exact code locations (commit a11aed1ac)

FileLinesWhat
tools/file_operations.py440-451_add_line_numbers() — prepends {i:6d}| to every line
tools/file_operations.py594-595read_file() calls _add_line_numbers() unconditionally
tools/file_operations.py749-750write_file() pipes content to cat with NO stripping
tools/file_tools.py274-302_is_internal_file_status_text() — only checks dedup message
tools/file_tools.py798-802write_file_tool guard — only calls above function
tools/file_operations.py654-680read_file_raw() — exists, reads without line numbers, but NOT registered as an LLM tool

Suggested fix

A non-aggressive approach (mirroring the existing _is_internal_file_status_text pattern): add a detection function that recognizes when content appears to be line-number-polluted, and reject the write with a clear error message telling the model to strip the prefixes.

def _is_line_number_polluted(content: str) -> bool:
    """Detect content that looks like it was copy-pasted from read_file output."""
    if not isinstance(content, str) or not content.strip():
        return False
    lines = content.split('\n')
    if len(lines) < 2:
        return False
    # Check if a significant portion of non-empty lines start with NUMBER|
    import re
    pattern = re.compile(r'^\s*\d+\|')
    non_empty = [l for l in lines if l.strip()]
    if len(non_empty) < 3:
        return False
    polluted = sum(1 for l in non_empty if pattern.match(l))
    return polluted / len(non_empty) > 0.8

Then call it alongside _is_internal_file_status_text in write_file_tool with a similar rejection message.

The alternative — unconditionally stripping ^\s*\d+\| prefixes in write_file — is too aggressive since legitimate files could start with that pattern.

Impact

  • Data corruption: .env, config.yaml, and other config files get silently polluted
  • Hard to notice: The pollution is prefix-only; the rest of the file looks normal at a glance
  • Cascading: Once a file is polluted, subsequent read_file calls show double pollution ( 1| 1|real content)
  • Security: If .env gets polluted, API keys may fail to parse, leading to confusing auth failures

extent analysis

TL;DR

To prevent line-number pollution in files, implement a detection function _is_line_number_polluted and use it to reject writes with a clear error message.

Guidance

  • Add a detection function _is_line_number_polluted to recognize line-number-polluted content and reject writes.
  • Call _is_line_number_polluted alongside _is_internal_file_status_text in write_file_tool with a similar rejection message.
  • Consider exposing read_file_raw to the LLM to avoid line numbering issues.
  • Test the detection function with various file types and contents to ensure it does not produce false positives.

Example

def _is_line_number_polluted(content: str) -> bool:
    # implementation as suggested in the issue
    if not isinstance(content, str) or not content.strip():
        return False
    lines = content.split('\n')
    if len(lines) < 2:
        return False
    import re
    pattern = re.compile(r'^\s*\d+\|')
    non_empty = [l for l in lines if l.strip()]
    if len(non_empty) < 3:
        return False
    polluted = sum(1 for l in non_empty if pattern.match(l))
    return polluted / len(non_empty) > 0.8

Notes

  • The suggested fix is non-aggressive and mirrors the existing _is_internal_file_status_text pattern.
  • The alternative approach of unconditionally stripping prefixes in write_file is too aggressive and may cause issues with legitimate files.
  • Exposing read_file_raw to the LLM may provide a more robust solution, but its implementation and registration are not detailed in the issue.

Recommendation

Apply the workaround by implementing the _is_line_number_polluted detection function and using it to reject writes with a clear error message, as it provides a safe and effective solution to prevent line-number pollution in files

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix read_file returns line-numbered content that pollutes files when written back via write_file [1 pull requests, 2 comments, 2 participants]