hermes - 💡(How to fix) Fix Checkpoint backends: shadow-git vs CoW clonefile — has this tradeoff been considered?

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Writing this up as an issue rather than a PR because I don't want to propose replacing something that works; I'm more curious whether a backend choice has been considered, and whether sharing numbers would be useful.

RAW_BUFFERClick to expand / collapse

I maintain a small project (cowback) that does roughly what tools/checkpoint_manager.py does — transparent per-turn snapshots of the working directory, rollback on demand — but backed by filesystem Copy-on-Write (APFS clonefile() on macOS, FICLONE ioctl on BTRFS/XFS) rather than shadow git repos.

Writing this up as an issue rather than a PR because I don't want to propose replacing something that works; I'm more curious whether a backend choice has been considered, and whether sharing numbers would be useful.

Quick comparison

Shadow git (Hermes today)CoW clonefile
PortabilityAny filesystemAPFS / BTRFS / XFS (fallback to copy elsewhere)
DependencyGit binaryKernel syscall (already present)
Snapshot costO(files × file-scan + hash + blob write)O(files × metadata-only clone)
DedupContent-addressable blobs, great across turnsBlock-sharing via CoW, transparent to FS
Restore costgit checkout / worktree resetRe-clone from snapshot dir
Cross-FS repo (e.g. /tmp → $HOME)WorksFalls back to copy when source and dest differ
Huge binariesGit has to hash/store them each time they changeCoW shares blocks, clonefile is constant time

Rough numbers (from my benchmark on APFS, M-series Mac)

Project sizecp -rshadow git commitcowback CoW
100 files × 10KB51ms~200ms13ms
1,000 files × 10KB426ms~1.2s122ms
5,000 files × 10KB1,970ms~5s487ms

(Shadow-git numbers are rough — I haven't run them against checkpoint_manager.py exactly; those are from running git add -A && git commit from scratch on synthetic trees. A real comparison would need to match the shadow-repo setup.)

Where each wins

Shadow git is clearly the right default for a cross-platform agent: works on every filesystem, brings dedup across long sessions, uses well-understood tooling. For a project Hermes's size this matters — you don't want to ship a backend that breaks on someone's ext4.

CoW wins when:

  • The user has an APFS / BTRFS / XFS-reflink filesystem (most Mac devs, many Linux setups)
  • Projects contain large binaries / assets / lockfiles that change frequently
  • Snapshot latency per turn matters (LLM-speed loops benefit from ms-level snapshots)

Possible directions (if useful)

  1. Nothing changes. The current impl is well-engineered and the perf is fine for most users. Totally reasonable answer.
  2. Optional backend. A HERMES_CHECKPOINT_BACKEND=cow env / config flag that uses copyFileSync(..., COPYFILE_FICLONE_FORCE) when supported, falls back to shadow git otherwise. Extra ~100 lines of code.
  3. Just the perf learnings. Even without a backend switch, some of the optimizations (e.g., skipping hash on large files) might be worth porting the other direction.

Happy to write up (2) as a PR against optional-skills/ or tools/ if there's interest — but only if the direction makes sense to the maintainers. If not, I'm happy to hear why; shadow-git has real advantages (hash dedup over long conversations) that I might be under-weighting.

Thanks for the well-written checkpoint_manager.py — reading it was the clearest "here's what agent checkpointing should look like" document I've seen.

extent analysis

TL;DR

Consider adding an optional CoW backend for improved performance on supported filesystems, using an environment variable or config flag to switch between shadow git and CoW.

Guidance

  • Evaluate the trade-offs between shadow git and CoW backends, considering factors like portability, dependency, snapshot cost, and deduplication.
  • Assess the potential benefits of using CoW for projects with large binaries or assets that change frequently, and for users who require low-latency snapshots.
  • Investigate the feasibility of adding an optional CoW backend, using a flag like HERMES_CHECKPOINT_BACKEND=cow, which would require approximately 100 lines of additional code.
  • Consider porting optimizations from the CoW approach to the shadow git implementation, such as skipping hash on large files, to improve performance without switching backends.

Example

No code example is provided, as the issue focuses on discussing the trade-offs and potential directions for improvement rather than implementing a specific solution.

Notes

The decision to add a CoW backend or stick with the current shadow git implementation depends on the project's priorities and user needs. The CoW approach offers performance benefits on supported filesystems, but may not be suitable for all users or projects.

Recommendation

Apply workaround: Add an optional CoW backend, as it offers improved performance on supported filesystems and can be implemented with a relatively small amount of code, while still allowing users to fall back to the shadow git implementation if needed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Checkpoint backends: shadow-git vs CoW clonefile — has this tradeoff been considered?