hermes - ✅(Solved) Fix [Bug]: save_trajectory() has no file locking — concurrent processes corrupt JSONL output [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#12684Fetched 2026-04-20 12:17:28
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×3referenced ×2labeled ×1

Error Message

Additional Logs / Traceback (optional)

Root Cause

Root Cause Analysis (optional)

Fix Action

Fixed

PR fix notes

PR #12685: fix(agent): add fcntl.flock to save_trajectory to prevent JSONL corru…

Description (problem / solution / changelog)

What does this PR do?

save_trajectory() in agent/trajectory.py appends to JSONL files using plain open(filename, "a") + f.write() with no file locking. When multiple processes write concurrently (e.g. during batch trajectory generation with batch_runner.py), writes can interleave and produce corrupt JSONL lines that fail json.loads().

This adds fcntl.flock(LOCK_EX) around the write path to serialise concurrent appends. The import is guarded with try/except ImportError for Windows compatibility where fcntl is unavailable, per the CONTRIBUTING.md cross-platform guidelines.

Related Issue

Fixes #12684

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✅ Tests (adding or improving test coverage)

Changes Made

  • agent/trajectory.py — add fcntl.flock(LOCK_EX/LOCK_UN) around the write in save_trajectory(), with try/except ImportError guard for Windows
  • tests/agent/test_trajectory_file_locking.py — two reproduction tests:
    • test_interleaved_writes_corrupt_jsonl: 4 concurrent processes × 15 writes with monkeypatched split-writes — verifies no JSONL corruption
    • test_save_trajectory_ignores_exclusive_flock: proves save_trajectory respects an exclusive flock held by another process

How to Test

  1. Run pytest tests/agent/test_trajectory_file_locking.py -v -s -o "addopts="
  2. Both tests should PASS
  3. To verify the bug existed: revert the agent/trajectory.py changes and re-run — both tests will FAIL

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(agent): ...)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes
  • I've tested on my platform: macOS (Apple Silicon, Python 3.11.9)

Documentation & Housekeeping

  • I've updated relevant documentation — N/A
  • I've updated cli-config.yaml.example — N/A
  • I've updated CONTRIBUTING.md or AGENTS.md — N/A
  • I've considered cross-platform impact — Yes, fcntl import is guarded with try/except ImportError for Windows
  • I've updated tool descriptions/schemas — N/A

Screenshots / Logs

# With fix applied (PASS):
tests/agent/test_trajectory_file_locking.py::test_interleaved_writes_corrupt_jsonl PASSED
tests/agent/test_trajectory_file_locking.py::test_save_trajectory_ignores_exclusive_flock PASSED

# Without fix (FAIL):
BUG CONFIRMED: 29/60 lines corrupted by interleaved writes
FAILED - save_trajectory() wrote 1 line(s) while another process held an exclusive flock

Changed files

  • agent/trajectory.py (modified, +11/-0)
  • tests/agent/test_trajectory_file_locking.py (added, +246/-0)

PR #12694: fix(trajectory): add file locking to \save_trajectory\ to prevent JSONL corruption on concurrent writes

Description (problem / solution / changelog)

Fixes #12684. This PR introduces cross-platform file locking (\ cntl\ on POSIX, \msvcrt\ on Windows) to \save_trajectory(). Without locking, concurrent execution could interleave writes, corrupting the JSONL output. By ensuring exclusive access before appending, it guarantees atomic line writes.

Changed files

  • agent/trajectory.py (modified, +30/-1)

PR #12713: fix(trajectory): add file locking to prevent JSONL corruption under concurrent writes

Description (problem / solution / changelog)

Fixes #12684

Problem

save_trajectory() uses plain open('a') + write() with no file locking, causing JSONL corruption under concurrent writes.

Solution

  • Added fcntl.flock(LOCK_EX) around the write operation to serialize concurrent appends
  • try/finally ensures the lock is always released
  • msvcrt.locking fallback for Windows
  • Full JSON line including newline written and flushed atomically within the lock

Tests

  • Added tests/test_trajectory_locking.py — 20 concurrent threads verify all lines are valid JSONL (2 tests passing)

Changed files

  • agent/trajectory.py (modified, +19/-1)
  • tests/test_trajectory_locking.py (added, +51/-0)

Code Example

N/Athis is a code-level race condition reproducible with the test suite,
not dependent on a running Hermes instance.

---
RAW_BUFFERClick to expand / collapse

Bug Description

save_trajectory() in agent/trajectory.py appends to a JSONL file using a plain open(filename, "a") + f.write() with no file locking. When multiple processes call it concurrently (e.g. during batch trajectory generation), writes interleave and produce corrupt JSONL lines that cannot be parsed.

Steps to Reproduce

  1. Call save_trajectory() from two or more processes simultaneously, pointing at the same output file.
  2. Use large trajectory payloads (600+ byte content fields) to make write-splitting more likely.
  3. Inspect the output JSONL file — lines will contain truncated or merged JSON objects that fail json.loads().

Expected Behavior

Each call to save_trajectory() produces exactly one complete, valid JSON line. Concurrent calls serialize their writes so no line is interleaved with another.

Actual Behavior

29/60 JSONL lines are corrupted — JSON is split mid-object across two lines. Additionally, save_trajectory() writes through an exclusive fcntl.flock held by another process, proving there is zero lock coordination.

Example corrupt output: Line 1: {"conversations": [..., "content": "P1S0:AAAAAAA ← truncated mid-write Line 2: AAAAAAAAAAAAAAAA..."} ← continuation of line 1

Affected Component

Agent Core (conversation loop, context compression, memory)

Messaging Platform (if gateway-related)

No response

Debug Report

N/A — this is a code-level race condition reproducible with the test suite,
not dependent on a running Hermes instance.

Operating System

macOS 15 (also affects Linux )

Python Version

3.11.9

Hermes Version

0.10.0

Additional Logs / Traceback (optional)

Root Cause Analysis (optional)

agent/trajectory.py, lines 53-57:

with open(filename, "a", encoding="utf-8") as f:
    f.write(json.dumps(entry, ensure_ascii=False) + "\n")

No fcntl.flock() is called before the write. On macOS/APFS, individual write() syscalls are atomic for small payloads, hiding the bug in normal use. Under load, or on Linux ext4/NFS where large writes are not guaranteed atomic, writes from concurrent processes interleave and split JSON objects across multiple lines, producing invalid JSONL.

Proposed Fix (optional)

import fcntl

with open(filename, "a", encoding="utf-8") as f: fcntl.flock(f, fcntl.LOCK_EX) f.write(json.dumps(entry, ensure_ascii=False) + "\n") f.flush() fcntl.flock(f, fcntl.LOCK_UN)

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

extent analysis

TL;DR

To fix the issue, use file locking with fcntl.flock() to prevent concurrent writes to the JSONL file.

Guidance

  • The proposed fix in the issue body is a good starting point, which involves acquiring an exclusive lock on the file before writing and releasing it afterwards.
  • To implement this fix, modify the save_trajectory() function in agent/trajectory.py to include file locking using fcntl.flock().
  • Make sure to flush the file after writing to ensure the data is persisted to disk.
  • Consider adding error handling for cases where the lock cannot be acquired or the write fails.

Example

import fcntl

with open(filename, "a", encoding="utf-8") as f:
    fcntl.flock(f, fcntl.LOCK_EX)
    f.write(json.dumps(entry, ensure_ascii=False) + "\n")
    f.flush()
    fcntl.flock(f, fcntl.LOCK_UN)

Notes

  • This fix assumes that the issue is solely due to concurrent writes to the same file. If there are other factors contributing to the corruption, additional debugging may be necessary.
  • The use of fcntl.flock() may have performance implications, especially if the file is being written to frequently. Alternative synchronization mechanisms, such as using a queue or a thread-safe writer, may be worth exploring.

Recommendation

Apply the proposed fix using fcntl.flock() to prevent concurrent writes and ensure data integrity. This approach is straightforward and directly addresses the identified root cause.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING