- write_file with bytes_written should count as landed even when lint diagnostics are present. - patch with success true should count as landed even when LSP diagnostics are present. - Genuine top level mutation failures should still produce the verifier footer.

hermes - ✅(Solved) Fix Fix file mutation verifier false positive for diagnostic-bearing writes [3 pull requests, 1 participants]

hermes2026-05-13 09:35:13

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#24927•Fetched 2026-05-14 03:50:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

GodsBoy

Participants

GodsBoy

Timeline (top)

labeled ×4cross-referenced ×3closed ×1

Successful file mutations can be reported as failed when their tool result includes nested diagnostics that contain an error status or error text.

Error Message

Successful file mutations can be reported as failed when their tool result includes nested diagnostics that contain an error status or error text. The file mutation verifier and related tool failure classification currently look for generic error markers in the whole tool result. A successful write_file result with bytes_written plus lint diagnostics, or a successful patch result with LSP diagnostics, can therefore be treated as a failed mutation even though the file was changed on disk.

Root Cause

Successful file mutations can be reported as failed when their tool result includes nested diagnostics that contain an error status or error text.

Fix Action

Fix / Workaround

The file mutation verifier and related tool failure classification currently look for generic error markers in the whole tool result. A successful write_file result with bytes_written plus lint diagnostics, or a successful patch result with LSP diagnostics, can therefore be treated as a failed mutation even though the file was changed on disk.

write_file with bytes_written should count as landed even when lint diagnostics are present.
patch with success true should count as landed even when LSP diagnostics are present.
Genuine top level mutation failures should still produce the verifier footer.

Add regression coverage for write_file, patch, display failure tags, and tool guardrail classification.

PR fix notes

PR #24929: fix: classify landed file mutations with diagnostics

Repository: NousResearch/hermes-agent
Author: GodsBoy
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/24929

Description (problem / solution / changelog)

What does this PR do?

Fixes a false positive in file mutation result handling where successful write_file and patch calls could be treated as failed when their JSON result contained nested diagnostic text such as lint errors or LSP diagnostics. The change centralizes landed mutation classification so verifier footers, CLI tool status, and tool guardrails agree on the same success contract.

Related Issue

Fixes #24927

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Security fix
Documentation update
Tests (adding or improving test coverage)
Refactor (no behavior change)
New skill (bundled or hub)

Changes Made

Added shared file mutation result classification in agent/tool_result_classification.py.
Updated the turn-end file mutation verifier to clear stale failures when a diagnostic-bearing mutation result proves the write landed.
Updated CLI display and tool guardrail failure classification to avoid marking landed file mutations as failed because of nested diagnostics.
Added regression coverage for verifier recovery, display status, guardrail classification, and the shared classifier helper.

How to Test

uv run --extra dev python -m pytest -q tests/run_agent/test_file_mutation_verifier.py tests/agent/test_display.py::TestCuteToolMessagePreviewLength tests/agent/test_tool_guardrails.py tests/agent/test_tool_result_classification.py
uv run --extra dev python -m ruff check agent/tool_result_classification.py agent/display.py agent/tool_guardrails.py run_agent.py tests/agent/test_tool_result_classification.py tests/agent/test_display.py tests/agent/test_tool_guardrails.py tests/run_agent/test_file_mutation_verifier.py
git diff --check

Checklist

Code

I've read the Contributing Guide
My commit messages follow Conventional Commits
I searched for existing PRs to make sure this isn't a duplicate
My PR contains only changes related to this fix/feature
I've run pytest tests/ -q and all tests pass
I've added tests for my changes
I've tested on my platform: Windows 11

Documentation & Housekeeping

I've updated relevant documentation (README, docs/, docstrings) or N/A
I've updated cli-config.yaml.example if I added/changed config keys or N/A
I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows or N/A
I've considered cross-platform impact (Windows, macOS) per the compatibility guide or N/A
I've updated tool descriptions/schemas if I changed tool behavior or N/A

For New Skills

N/A

Screenshots / Logs

Targeted verifier, display, guardrail, and classifier tests passed: 55 passed. Ruff passed on changed Python files. Whitespace check passed with git diff --check.

Changed files

agent/display.py (modified, +3/-0)
agent/tool_guardrails.py (modified, +3/-0)
agent/tool_result_classification.py (added, +26/-0)
run_agent.py (modified, +3/-1)
tests/agent/test_display.py (modified, +22/-0)
tests/agent/test_tool_guardrails.py (modified, +16/-0)
tests/agent/test_tool_result_classification.py (added, +30/-0)
tests/run_agent/test_file_mutation_verifier.py (modified, +50/-0)

PR #24960: fix(agent): prevent mutation verifier false positives from lint diagnostics

Repository: NousResearch/hermes-agent
Author: zccyman
State: closed | merged: False
Link: https://github.com/NousResearch/hermes-agent/pull/24960

Description (problem / solution / changelog)

Summary

Fixes #24927

Successful file mutations (write_file, patch) were incorrectly reported as failures when their tool results included nested lint or LSP diagnostics containing "error" or "failed" keys.

Problem

The generic heuristic in _detect_tool_failure() and classify_tool_failure() searches the entire result string for '"error"' or '"failed"' substrings. This causes false positives when:

write_file returns {"bytes_written": 100, "lint": {"errors": [{"message": "unused variable", "severity": "error"}]}} — the nested "error" in lint diagnostics triggers the heuristic
patch returns {"success": true, "diagnostics": {"error": "type mismatch"}} — the nested "error" in LSP diagnostics triggers the heuristic

Root Cause

Both _detect_tool_failure() in agent/display.py and its mirror classify_tool_failure() in agent/tool_guardrails.py apply a generic substring heuristic that doesn't account for file-mutation tools returning structured diagnostic payloads alongside success markers.

Fix

Added an early-exit check for write_file and patch tools before the generic heuristic:

write_file: if bytes_written is present and no top-level error → return success
patch: if success is True → return success regardless of diagnostics

Genuine failures (top-level error key, or no success markers) still fall through to the heuristic correctly.

Testing

Added 12 regression tests across both files:

TestDetectToolFailure in test_display.py (8 tests):
- write_file with bytes_written → success
- write_file with lint diagnostics → success
- write_file with top-level error → failure
- patch with success=true → success
- patch without success → failure
- None result → success
- terminal non-zero exit → failure
- generic tools with error → failure
TestClassifyToolFailureMutationFalsePositive in test_tool_guardrails.py (4 tests):
- Mirror tests for the guardrail classifier

All 53 tests pass (41 existing + 12 new), confirming zero regression.

Files Changed

agent/display.py — early-exit for mutation tools in _detect_tool_failure()
agent/tool_guardrails.py — matching early-exit in classify_tool_failure()
tests/agent/test_display.py — 8 new tests
tests/agent/test_tool_guardrails.py — 4 new tests

Changed files

agent/display.py (modified, +13/-0)
agent/tool_guardrails.py (modified, +11/-0)
tests/agent/test_display.py (modified, +87/-0)
tests/agent/test_tool_guardrails.py (modified, +45/-0)

PR #25011: salvage: batch 2 LSP-related fixes (#24738 lsp gating + #24929 mutation diagnostics false positive)

Repository: NousResearch/hermes-agent
Author: kshitijk4poor
State: closed | merged: True
Link: https://github.com/NousResearch/hermes-agent/pull/25011

Description (problem / solution / changelog)

Salvage of two independent LSP-related fixes

Cherry-picks two small, unrelated bug fixes from open PRs onto current main. Both authors are preserved as commit authors and are already in AUTHOR_MAP.

Original PR	Author	Subsystem	Fix
#24738	@briandevans	CLI startup gating	Adds `lsp` to `_BUILTIN_SUBCOMMANDS` so `hermes lsp …` skips the ~500-650ms plugin-discovery pass.
#24929	@GodsBoy	Tool result classification	Stops misclassifying successful `write_file`/`patch` results as failures when their JSON payload contains nested lint or LSP diagnostics with `"error"` keys (issue #24927).

Two independent fixes are batched here because each is small, the diffs touch disjoint files, and neither has interactions with the other. Each retains its original contributor as commit author.

#24738 — `lsp` missing from `_BUILTIN_SUBCOMMANDS`

The parity guard added in #22120 (tests/hermes_cli/test_startup_plugin_gating.py::test_builtin_set_covers_every_registered_subcommand) has been failing on main since lsp shipped via #24168. Verified empirically:

$ bash scripts/run_tests.sh tests/hermes_cli/test_startup_plugin_gating.py
FAILED test_builtin_set_covers_every_registered_subcommand
  AssertionError: _BUILTIN_SUBCOMMANDS is missing these live subcommands: ['lsp'].

After the fix: 37 passed.

#24929 — File mutation false positive on diagnostic-bearing results (#24927)

_detect_tool_failure() and classify_tool_failure() substring-match '"error"' and '"failed"' over the full result string. After PR #24168 wired post-write LSP diagnostics into write_file and patch, any successful mutation whose result includes nested lint/LSP errors gets a [error] failure tag in the CLI display, plus a turn-end verifier footer claiming the write didn't land — even though it did.

This PR introduces a shared agent/tool_result_classification.py::file_mutation_result_landed() helper and patches three callsites:

agent/display.py::_detect_tool_failure() — CLI failure tag
agent/tool_guardrails.py::classify_tool_failure() — guardrail mirror
run_agent.py::_record_file_mutation_result() — turn-end verifier upstream

A duplicate PR (#24960) was filed against the same issue. It has a real bug (data.get("bytes_written") truthiness check, which misclassifies an empty 0-byte write with lint diagnostics as a failure). #24929 uses "bytes_written" in data (presence check) which is correct. We are closing #24960 in favor of this salvage.

Test plan

bash scripts/run_tests.sh \
  tests/hermes_cli/test_startup_plugin_gating.py \
  tests/agent/test_display.py \
  tests/agent/test_tool_guardrails.py \
  tests/agent/test_tool_result_classification.py \
  tests/run_agent/test_file_mutation_verifier.py

Result: 117 passed.

E2E sanity:

'lsp' in _BUILTIN_SUBCOMMANDS → True
_detect_tool_failure("write_file", '{"bytes_written": 0, "lint": {"errors": [{"severity": "error"}]}}') → (False, "") (would be (True, " [error]") without the fix; this is the case PR #24960 mishandles)
_detect_tool_failure("write_file", '{"error": "Permission denied"}') → (True, " [error]") (genuine failures still detected)
_detect_tool_failure("patch", '{"success": true, "lsp_diagnostics": "<diagnostics>ERROR foo</diagnostics>"}') → (False, "")

Closes #24927 (via #24929). Supersedes #24738, #24929, #24960.

cc @briandevans @GodsBoy — your commits are preserved as-is in this salvage; original authorship intact in git log.

Changed files

agent/display.py (modified, +3/-0)
agent/tool_guardrails.py (modified, +3/-0)
agent/tool_result_classification.py (added, +26/-0)
hermes_cli/main.py (modified, +4/-4)
run_agent.py (modified, +7/-2)
tests/agent/test_display.py (modified, +22/-0)
tests/agent/test_tool_guardrails.py (modified, +16/-0)
tests/agent/test_tool_result_classification.py (added, +30/-0)
tests/run_agent/test_file_mutation_verifier.py (modified, +50/-0)

RAW_BUFFERClick to expand / collapse

Summary

Successful file mutations can be reported as failed when their tool result includes nested diagnostics that contain an error status or error text.

Problem

Expected behavior

write_file with bytes_written should count as landed even when lint diagnostics are present.
patch with success true should count as landed even when LSP diagnostics are present.
Genuine top level mutation failures should still produce the verifier footer.

Verification target

Add regression coverage for write_file, patch, display failure tags, and tool guardrail classification.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

write_file with bytes_written should count as landed even when lint diagnostics are present.
patch with success true should count as landed even when LSP diagnostics are present.
Genuine top level mutation failures should still produce the verifier footer.

#request error #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

hermes - ✅(Solved) Fix Fix file mutation verifier false positive for diagnostic-bearing writes [3 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #24929: fix: classify landed file mutations with diagnostics

Description (problem / solution / changelog)

What does this PR do?

Related Issue

Type of Change

Changes Made

How to Test

Checklist

Code

Documentation & Housekeeping

For New Skills

Screenshots / Logs

Changed files

PR #24960: fix(agent): prevent mutation verifier false positives from lint diagnostics

Description (problem / solution / changelog)

Summary

Problem

Root Cause

Fix

Testing

Files Changed

Changed files

PR #25011: salvage: batch 2 LSP-related fixes (#24738 lsp gating + #24929 mutation diagnostics false positive)

Description (problem / solution / changelog)

Salvage of two independent LSP-related fixes

#24738 — lsp missing from _BUILTIN_SUBCOMMANDS

#24929 — File mutation false positive on diagnostic-bearing results (#24927)

Test plan

Changed files

Summary

Problem

Expected behavior

Verification target

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

#24738 — `lsp` missing from `_BUILTIN_SUBCOMMANDS`