hermes - ✅(Solved) Fix bug: _read_json_file doesn't catch UnicodeDecodeError — corrupted gateway_state.json causes persistent warnings [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#28579Fetched 2026-05-20 04:03:17
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
cross-referenced ×3labeled ×3commented ×1

Error Message

def _read_json_file(path: Path) -> Optional[dict[str, Any]]: if not path.exists(): return None try: raw = path.read_text(encoding="utf-8").strip() except OSError: # ← Only catches OSError return None # UnicodeDecodeError is NOT a subclass of OSError ...

Root Cause

In gateway/status.py lines 225-231:

def _read_json_file(path: Path) -> Optional[dict[str, Any]]:
    if not path.exists():
        return None
    try:
        raw = path.read_text(encoding="utf-8").strip()
    except OSError:          # ← Only catches OSError
        return None          #    UnicodeDecodeError is NOT a subclass of OSError
    ...

UnicodeDecodeError inherits from ValueError, not OSError. When the file contains non-UTF-8 bytes (e.g. TLS record data accidentally appended), read_text(encoding="utf-8") raises UnicodeDecodeError which escapes the except block.

This propagates to _write_runtime_status_safe() in gateway/platforms/base.py which catches it but only logs a warning — the write is skipped, so the corrupted file is never overwritten with a clean version.

Fix Action

Fixed

PR fix notes

PR #28599: fix: catch UnicodeDecodeError in _read_json_file for corrupted gateway_state.json

Description (problem / solution / changelog)

Summary

Fixes a bug where _read_json_file() in gateway/status.py only caught OSError when reading JSON files, but not UnicodeDecodeError. When gateway_state.json contains non-UTF-8 bytes (e.g. TLS record data accidentally appended via a file descriptor leak), the read_text(encoding="utf-8") call raises UnicodeDecodeError which is NOT a subclass of OSError, escaping the except block and propagating as a persistent warning.

Root Cause

# gateway/status.py L229-231
try:
    raw = path.read_text(encoding="utf-8").strip()
except OSError:      # UnicodeDecodeError is a subclass of ValueError, NOT OSError!
    return None

UnicodeDecodeError inherits from ValueError, so when a corrupted file is read, the error propagates to _write_runtime_status_safe() which logs a warning but skips the write — leaving the corrupted file permanently in place.

Fix

except (OSError, UnicodeDecodeError):

When the file is treated as missing (return None), the next write_runtime_status() call will rebuild it cleanly via _build_runtime_status_record().

Changes

FileChange
gateway/status.pyexcept OSErrorexcept (OSError, UnicodeDecodeError)
tests/gateway/test_status.py+3 unit tests for _read_json_file

Test Results

tests/gateway/test_status.py — 53 passed (including 3 new)

Closes #28579

Changed files

  • gateway/status.py (modified, +1/-1)
  • tests/gateway/test_status.py (modified, +29/-0)

PR #28600: fix(gateway): recover from corrupt runtime state (#28579)

Description (problem / solution / changelog)

What does this PR do?

Summary

This teaches gateway runtime-state reads to treat non-UTF-8 corruption the same way they already treat missing or invalid JSON.

If gateway_state.json contains stray non-UTF-8 bytes, Hermes now falls back to rebuilding the runtime status payload instead of surfacing the decode error on every status write.

Problem

The gateway status reader currently catches OSError and JSONDecodeError, but not UnicodeDecodeError.

That leaves one awkward recovery path open: when gateway_state.json is present but has trailing non-UTF-8 bytes, every runtime status update tries to read the file, fails during UTF-8 decoding, and bubbles the exception back to the platform writer. The write gets skipped, so the bad file never gets replaced with a healthy one.

What this changes

  • catches UnicodeDecodeError in _read_json_file()
  • treats that corrupted file as unreadable state instead of a hard failure
  • lets write_runtime_status() rebuild the payload and overwrite the corrupted file on the next write
  • adds a regression test that writes raw non-UTF-8 bytes into gateway_state.json and verifies Hermes recovers cleanly

Why this shape

This stays intentionally small.

The rest of the runtime status pipeline already knows how to rebuild a missing or invalid state file. The missing piece was simply classifying UTF-8 corruption as another unreadable-state case so the existing recovery path can do its job.

Tests

  • python -m pytest tests/gateway/test_status.py -q -n 4

Fixes #28579.

Solution Sketch

  • fix the root cause in the touched subsystem instead of layering a broad workaround around the symptom
  • keep surrounding behavior stable and avoid unrelated refactors while the area is under review
  • prove the change with focused checks on the exact path that regressed

Related Issue

Fixes #28579.

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • preserved the existing technical rationale and validation notes inside the template body
  • scoped this PR description to the implementation already present on the branch
  • aligned the delivery format with .github/PULL_REQUEST_TEMPLATE.md

How to Test

  1. Review the existing validation notes preserved in this PR body.
  2. Run the focused checks for the touched area.
  3. Confirm the scoped change still behaves as described above.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: <!-- e.g. Ubuntu 24.04, macOS 15.2, Windows 11 -->

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A

Screenshots / Logs

  • N/A.

Changed files

  • gateway/status.py (modified, +1/-1)
  • tests/gateway/test_status.py (modified, +13/-0)

PR #28607: fix(gateway): catch UnicodeDecodeError in _read_json_file for corrupted state files (#28579)

Description (problem / solution / changelog)

Problem

_read_json_file() in gateway/status.py only catches OSError, but UnicodeDecodeError (which inherits from ValueError, not OSError) escapes unhandled when gateway_state.json is corrupted with non-UTF-8 bytes.

Result: persistent WARNING spam on every platform status write as long as the corrupted file exists — the downstream _write_runtime_status_safe() catches the exception and logs it, but never overwrites the corrupted file with clean data.

Corruption pattern from the bug report: a 24-byte TLS 1.2 Application Data record trailer was appended to gateway_state.json (presumably from a file-descriptor leak in a Feishu/WeChat connection), causing:

WARNING gateway.platforms.base: Failed to write runtime status (connected) for feishu:
'utf-8' codec can't decode byte 0xc7 in position 787: invalid continuation byte

Fix

Add UnicodeDecodeError to the except clause alongside OSError:

except (OSError, UnicodeDecodeError):
    return None

Both are treated as "file unreadable" — the function returns None and the caller falls back to a fresh status record, allowing the file to be overwritten cleanly.

Tests

Six new regression tests in TestReadJsonFileRobustness:

TestVerifies
test_returns_none_for_missing_filebaseline
test_returns_none_for_valid_json_objectsanity / no regression
test_returns_none_for_empty_fileempty file → None
test_returns_none_for_invalid_jsonmalformed JSON → None
test_returns_none_for_non_utf8_bytesthe exact TLS-trailer from the bug report → None (not UnicodeDecodeError)
test_returns_none_for_json_arrayJSON arrays are not dicts → None

All 56 existing test_status.py tests continue to pass.

Closes #28579

Changed files

  • gateway/status.py (modified, +6/-1)
  • tests/gateway/test_status.py (modified, +58/-0)

Code Example

echo -ne '\x17\x03\x03\x00\x13\x68\xc7' >> ~/.hermes/gateway_state.json

---

WARNING gateway.platforms.base: Failed to write runtime status (connected) for feishu: 'utf-8' codec can't decode byte 0xc7 in position 787: invalid continuation byte

---

def _read_json_file(path: Path) -> Optional[dict[str, Any]]:
    if not path.exists():
        return None
    try:
        raw = path.read_text(encoding="utf-8").strip()
    except OSError:          # ← Only catches OSError
        return None          #    UnicodeDecodeError is NOT a subclass of OSError
    ...

---

JSON ends at byte 780 with `}`
Trailing garbage (24 bytes): 17 03 03 00 13 68 c7 13 3d c6 99 a7 2c a2 ef ee 11 3d 1a bf f5 ce 19 8c
                             ^^^^^^^^^^^^
                             TLS record: type=0x17 app_data, ver=TLS 1.2, length=0x0013

---

def _read_json_file(path: Path) -> Optional[dict[str, Any]]:
    if not path.exists():
        return None
    try:
        raw = path.read_text(encoding="utf-8").strip()
    except (OSError, UnicodeDecodeError):  # ← Also catch UnicodeDecodeError
        return None
    ...
RAW_BUFFERClick to expand / collapse

Bug Description

_read_json_file() in gateway/status.py only catches OSError when reading files, but not UnicodeDecodeError. When gateway_state.json gets corrupted with non-UTF-8 bytes, the error propagates up and causes persistent warnings on every platform status write.

Steps to Reproduce

  1. Corrupt ~/.hermes/gateway_state.json by appending non-UTF-8 bytes after valid JSON:
    echo -ne '\x17\x03\x03\x00\x13\x68\xc7' >> ~/.hermes/gateway_state.json
  2. Start the gateway
  3. Observe repeated warnings in errors.log:
    WARNING gateway.platforms.base: Failed to write runtime status (connected) for feishu: 'utf-8' codec can't decode byte 0xc7 in position 787: invalid continuation byte

Root Cause

In gateway/status.py lines 225-231:

def _read_json_file(path: Path) -> Optional[dict[str, Any]]:
    if not path.exists():
        return None
    try:
        raw = path.read_text(encoding="utf-8").strip()
    except OSError:          # ← Only catches OSError
        return None          #    UnicodeDecodeError is NOT a subclass of OSError
    ...

UnicodeDecodeError inherits from ValueError, not OSError. When the file contains non-UTF-8 bytes (e.g. TLS record data accidentally appended), read_text(encoding="utf-8") raises UnicodeDecodeError which escapes the except block.

This propagates to _write_runtime_status_safe() in gateway/platforms/base.py which catches it but only logs a warning — the write is skipped, so the corrupted file is never overwritten with a clean version.

How the File Got Corrupted

I found 2 corrupted files in state-snapshots/ (both have identical 24 trailing bytes that are a TLS 1.2 Application Data record):

JSON ends at byte 780 with `}`
Trailing garbage (24 bytes): 17 03 03 00 13 68 c7 13 3d c6 99 a7 2c a2 ef ee 11 3d 1a bf f5 ce 19 8c
                             ^^^^^^^^^^^^
                             TLS record: type=0x17 app_data, ver=TLS 1.2, length=0x0013

The TLS data likely leaked from a file descriptor belonging to a gateway platform connection (Feishu/WeChat/Yuanbao). The exact mechanism of how it got appended to gateway_state.json is unclear, but it happened between 2026-05-12 and 2026-05-16.

Suggested Fix

def _read_json_file(path: Path) -> Optional[dict[str, Any]]:
    if not path.exists():
        return None
    try:
        raw = path.read_text(encoding="utf-8").strip()
    except (OSError, UnicodeDecodeError):  # ← Also catch UnicodeDecodeError
        return None
    ...

This way, a corrupted file is treated as missing, and write_runtime_status() will rebuild it from defaults via _build_runtime_status_record().

Impact

  • Severity: Low (warnings only, gateway continues to function)
  • Recovery: Gateway restart overwrites the corrupted file, clearing the issue
  • Frequency: Rare (corruption appears to be a one-time event from a TLS fd leak)

Environment

  • Hermes Agent v0.14.0 (2026.5.16)
  • Python 3.11.15
  • WSL (Windows Subsystem for Linux)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING