hermes - ✅(Solved) Fix read_file returns line-numbered content that pollutes files when written back via write_file [1 pull requests, 2 comments, 2 participants]

hermes2026-05-04 14:43:44

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#19798•Fetched 2026-05-05 06:05:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

eiraho

Participants

alt-glitch

eiraho

Timeline (top)

labeled ×3commented ×2cross-referenced ×1

Error Message

A non-aggressive approach (mirroring the existing _is_internal_file_status_text pattern): add a detection function that recognizes when content appears to be line-number-polluted, and reject the write with a clear error message telling the model to strip the prefixes.

Root Cause

read_file always line-numbers its output (tools/file_operations.py:594-595):

return ReadResult(
    content=self._add_line_numbers(read_output, offset),  # ← always line-numbered
    ...
)

Where _add_line_numbers() (line 440-451) prepends {i:6d}| to every line.

write_file writes content as-is with zero sanitization (tools/file_operations.py:749):

write_cmd = f"cat > {self._escape_shell_arg(path)}"
write_result = self._exec(write_cmd, stdin_data=content)  # ← no stripping

The existing guard only catches one specific case (tools/file_tools.py:798-802):

if _is_internal_file_status_text(content):
    return tool_error(
        "Refusing to write internal read_file status text as file content. ..."
    )

_is_internal_file_status_text() (line 274-302) ONLY checks for _READ_DEDUP_STATUS_MESSAGE — it is completely unaware of line-number pollution. Content like " 1|REAL_CONFIG=value" passes through undetected.

read_file_raw exists but is NOT exposed to the LLM (tools/file_operations.py:654). It reads without line numbers using cat, but there is no read_file_raw_tool wrapper, no READ_FILE_RAW_SCHEMA, and no registration in the tool registry. It is only used internally by patch_parser.py.

Fix Action

Fix / Workaround

PR fix notes

PR #19820: fix(file): reject read_file line-numbered writeback

Repository: NousResearch/hermes-agent
Author: Beandon13
State: open | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/19820

Description (problem / solution / changelog)

Summary

Rejects write_file calls when the content is dominated by read_file's LINE_NUM|CONTENT display format.
Keeps sparse literal pipe content allowed so legitimate files with an occasional 1|value line still write normally.
Adds regression coverage for issue #19798.

Why

read_file intentionally shows line numbers to the model. When those display prefixes are echoed back into write_file, the prefixes are persisted into real source/config files. This adds a narrow write-side guard for consecutive line-numbered content before file operations run.

Test plan

scripts/run_tests.sh tests/tools/test_file_tools.py::TestWriteFileHandler -q
scripts/run_tests.sh tests/tools/test_file_tools.py tests/tools/test_file_write_safety.py -q

Fixes #19798

Changed files

tests/tools/test_file_tools.py (modified, +27/-0)
tools/file_tools.py (modified, +47/-3)

Code Example

1|setting: value
        2|other: thing

---

return ReadResult(
    content=self._add_line_numbers(read_output, offset),  # ← always line-numbered
    ...
)

---

write_cmd = f"cat > {self._escape_shell_arg(path)}"
write_result = self._exec(write_cmd, stdin_data=content)  # ← no stripping

---

if _is_internal_file_status_text(content):
    return tool_error(
        "Refusing to write internal read_file status text as file content. ..."
    )

---

def _is_line_number_polluted(content: str) -> bool:
    """Detect content that looks like it was copy-pasted from read_file output."""
    if not isinstance(content, str) or not content.strip():
        return False
    lines = content.split('\n')
    if len(lines) < 2:
        return False
    # Check if a significant portion of non-empty lines start with NUMBER|
    import re
    pattern = re.compile(r'^\s*\d+\|')
    non_empty = [l for l in lines if l.strip()]
    if len(non_empty) < 3:
        return False
    polluted = sum(1 for l in non_empty if pattern.match(l))
    return polluted / len(non_empty) > 0.8

RAW_BUFFERClick to expand / collapse

What happens

When the LLM reads a file with read_file and later writes modified content back with write_file, the LINE_NUM|CONTENT format from read_file can end up persisted in the actual file. This silently corrupts configuration files, source code, and especially .env — embedding line-number prefixes like 1| into real file content.

Steps to reproduce

Have the agent read any text file: read_file("config.yaml")

The agent receives:

     1|setting: value
     2|other: thing

Agent modifies the content and calls: write_file("config.yaml", " 1|setting: new_value\n 2|other: thing")
File on disk now literally starts with 1|setting: new_value — the 1| prefix is permanent.

This is not hypothetical. It happens in practice when models echo read_file output into write_file content, especially with weaker models or complex multi-step transformations where the model loses track of the display format vs. actual content distinction.

Root cause

read_file always line-numbers its output (tools/file_operations.py:594-595):

return ReadResult(
    content=self._add_line_numbers(read_output, offset),  # ← always line-numbered
    ...
)

Where _add_line_numbers() (line 440-451) prepends {i:6d}| to every line.

write_file writes content as-is with zero sanitization (tools/file_operations.py:749):

write_cmd = f"cat > {self._escape_shell_arg(path)}"
write_result = self._exec(write_cmd, stdin_data=content)  # ← no stripping

The existing guard only catches one specific case (tools/file_tools.py:798-802):

if _is_internal_file_status_text(content):
    return tool_error(
        "Refusing to write internal read_file status text as file content. ..."
    )

Evidence — exact code locations (commit a11aed1ac)

File	Lines	What
`tools/file_operations.py`	440-451	`_add_line_numbers()` — prepends `{i:6d}\|` to every line
`tools/file_operations.py`	594-595	`read_file()` calls `_add_line_numbers()` unconditionally
`tools/file_operations.py`	749-750	`write_file()` pipes content to `cat` with NO stripping
`tools/file_tools.py`	274-302	`_is_internal_file_status_text()` — only checks dedup message
`tools/file_tools.py`	798-802	`write_file_tool` guard — only calls above function
`tools/file_operations.py`	654-680	`read_file_raw()` — exists, reads without line numbers, but NOT registered as an LLM tool

Suggested fix

def _is_line_number_polluted(content: str) -> bool:
    """Detect content that looks like it was copy-pasted from read_file output."""
    if not isinstance(content, str) or not content.strip():
        return False
    lines = content.split('\n')
    if len(lines) < 2:
        return False
    # Check if a significant portion of non-empty lines start with NUMBER|
    import re
    pattern = re.compile(r'^\s*\d+\|')
    non_empty = [l for l in lines if l.strip()]
    if len(non_empty) < 3:
        return False
    polluted = sum(1 for l in non_empty if pattern.match(l))
    return polluted / len(non_empty) > 0.8

Then call it alongside _is_internal_file_status_text in write_file_tool with a similar rejection message.

The alternative — unconditionally stripping ^\s*\d+\| prefixes in write_file — is too aggressive since legitimate files could start with that pattern.

Impact

Data corruption: .env, config.yaml, and other config files get silently polluted
Hard to notice: The pollution is prefix-only; the rest of the file looks normal at a glance
Cascading: Once a file is polluted, subsequent read_file calls show double pollution ( 1| 1|real content)
Security: If .env gets polluted, API keys may fail to parse, leading to confusing auth failures

extent analysis

TL;DR

To prevent line-number pollution in files, implement a detection function _is_line_number_polluted and use it to reject writes with a clear error message.

Guidance

Add a detection function _is_line_number_polluted to recognize line-number-polluted content and reject writes.
Call _is_line_number_polluted alongside _is_internal_file_status_text in write_file_tool with a similar rejection message.
Consider exposing read_file_raw to the LLM to avoid line numbering issues.
Test the detection function with various file types and contents to ensure it does not produce false positives.

Example

def _is_line_number_polluted(content: str) -> bool:
    # implementation as suggested in the issue
    if not isinstance(content, str) or not content.strip():
        return False
    lines = content.split('\n')
    if len(lines) < 2:
        return False
    import re
    pattern = re.compile(r'^\s*\d+\|')
    non_empty = [l for l in lines if l.strip()]
    if len(non_empty) < 3:
        return False
    polluted = sum(1 for l in non_empty if pattern.match(l))
    return polluted / len(non_empty) > 0.8

Notes

The suggested fix is non-aggressive and mirrors the existing _is_internal_file_status_text pattern.
The alternative approach of unconditionally stripping prefixes in write_file is too aggressive and may cause issues with legitimate files.
Exposing read_file_raw to the LLM may provide a more robust solution, but its implementation and registration are not detailed in the issue.

Recommendation

Apply the workaround by implementing the _is_line_number_polluted detection function and using it to reject writes with a clear error message, as it provides a safe and effective solution to prevent line-number pollution in files

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - ✅(Solved) Fix read_file returns line-numbered content that pollutes files when written back via write_file [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #19820: fix(file): reject read_file line-numbered writeback

Description (problem / solution / changelog)

Summary

Why

Test plan

Changed files

Code Example

What happens

Steps to reproduce

Root cause

Evidence — exact code locations (commit a11aed1ac)

Suggested fix

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

hermes - ✅(Solved) Fix read_file returns line-numbered content that pollutes files when written back via write_file [1 pull requests, 2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #19820: fix(file): reject read_file line-numbered writeback

Description (problem / solution / changelog)

Summary

Why

Test plan

Changed files

Code Example

What happens

Steps to reproduce

Root cause

Evidence — exact code locations (commit a11aed1ac)

Suggested fix

Impact

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING