openclaw - ✅(Solved) Fix openclaw backup create fails with "did not encounter expected EOF" on live installations [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72249Fetched 2026-04-27 05:32:34
View on GitHub
Comments
1
Participants
2
Timeline
3
Reactions
0
Timeline (top)
closed ×1commented ×1cross-referenced ×1

Error Message

Error: did not encounter expected EOF at WriteEntry.<anonymous> (.../node_modules/tar/dist/...) ... Process exited with code 1

Root Cause

node-tar's WriteEntry records the file size from the initial lstat(). It then streams file contents via fs.read(). If the file grows between the lstat() and the end of the read, the header size no longer matches the byte count actually consumed, the stream errors with "did not encounter expected EOF", and tar.c() rejects. Because the backup is a single transaction, the whole archive is aborted — partial writes are not salvaged.

This is not an OpenClaw bug in the strict sense: it's a fundamental mismatch between how tar packs files and how any live service writes logs. But every user who runs openclaw backup create against a running install will eventually hit it, so the CLI needs to handle it.

Fix Action

Fix / Workaround

  • Severity: High for users with a running gateway and a non-trivial state directory — backups can fail repeatedly until the gateway is stopped.
  • Workaround today: stop the gateway before running backup create. Not viable for scheduled/automated backups.

PR fix notes

PR #72251: fix(backup): retry on tar EOF race and skip known volatile files

Description (problem / solution / changelog)

Problem

openclaw backup create fails with Error: did not encounter expected EOF on any live installation. Root cause: node-tar's WriteEntry records size from the initial lstat(); if the file grows between lstat() and the end of fs.read(), the header no longer matches the consumed bytes, the stream errors, and tar.c() aborts the whole archive. Common culprits are append-only JSONL transcripts and .log files under the state directory.

Details and generic reproducer: #72249. Related but distinct races: #67417 (ENOENT when a session file is deleted mid-backup). Broader exclude-rule proposal: #67990.

Impact

Affects any live install where backup source files are written concurrently — session transcripts, cron run logs, gateway logs. Today the workaround is to stop the gateway before backing up, which is not viable for scheduled backups.

Fix

  1. Default volatile-path filter in the backup archiver — no user config required:

    • {stateDir}/sessions/**/*.{jsonl,log}
    • {stateDir}/cron/runs/**/*.log
    • {stateDir}/logs/**/*.{jsonl,log}
    • *.{sock,pid,tmp,lock} anywhere

    Rationale: these files are either actively appended to (logs), transient (sockets/pid/lock), or have no restoration value from a partial tail. Snapshotting them racily has no upside.

  2. Retry on EOF-class errors (up to 3 attempts, 10s / 20s backoff) for residual races on non-filtered files. The partial temp archive is removed between attempts, so no half-written artifact escapes.

  3. Error enrichment: on final failure, surface err.path (the offending file) and attempt count. Turns an opaque error into an actionable one.

  4. skippedVolatileCount observability: extend BackupCreateResult with the skip count; surface it in both the stdout summary and the --json output.

Backwards compatibility

  • Filter is additive; no previously-included path stops being included unless it matches a volatile rule. The volatile rules cover files that already produced errors or had no snapshot value.
  • --json output gains a new optional field skippedVolatileCount. Existing consumers that ignore unknown fields are unaffected.
  • Error message format changes only on failure, and only to add diagnostic context.

Testing

Added:

  • src/infra/backup-volatile-filter.test.ts — predicate unit tests for each rule and the anywhere-patterns, plus negative cases.
  • src/infra/backup-create.test.ts — retry tests using a mocked tar runner:
    • Throws EOF twice then succeeds → 3 calls, 10s + 20s backoff, logs emitted.
    • Throws EOF on every attempt → final error includes offending path + attempt count.
    • Throws non-EOF → no retry, no sleep.
  • formatBackupCreateSummary updated to include the skip-count line when > 0.

Verified locally: a previously-failing live-install backup (large state tree with actively written session and cron logs) now completes cleanly; the skipped-volatile count is reported and the archive is usable.

Future work (not in this PR)

A --quiesce flag that pauses cron and drains in-flight work before archival, for users who want stronger consistency guarantees than "retry-plus-filter".

Fixes #72249 See also #67417, #67990

Changed files

  • src/commands/backup.ts (modified, +4/-1)
  • src/infra/backup-create.test.ts (modified, +160/-2)
  • src/infra/backup-create.ts (modified, +137/-15)
  • src/infra/backup-volatile-filter.test.ts (added, +49/-0)
  • src/infra/backup-volatile-filter.ts (added, +81/-0)

Code Example

Error: did not encounter expected EOF
    at WriteEntry.<anonymous> (.../node_modules/tar/dist/...)
    ...
Process exited with code 1

---

mkdir -p /tmp/backup-eof-repro/src
cd /tmp/backup-eof-repro

# Generate a non-trivial file that will be appended to during archiving.
yes "$(head -c 2000 /dev/urandom | base64)" | head -c 100M > src/live.log

# In one shell, keep appending:
while true; do date >> src/live.log; sleep 0.05; done &
APPENDER=$!

# In another, try to tar it:
tar -czf out.tar.gz src/  # or via node-tar c({ file, gzip: true }, ['src'])
# Observe: intermittent "did not encounter expected EOF" error.

kill $APPENDER
RAW_BUFFERClick to expand / collapse

Bug Description

openclaw backup create aborts with Error: did not encounter expected EOF when the archiver reads a file that is being actively appended to during the tar.c() stream. On any live OpenClaw installation, this is reliably reproducible against session transcript .jsonl files, cron run .log files, and state logs/*.jsonl — all of which are append-only on a running gateway.

Exact Error

Error: did not encounter expected EOF
    at WriteEntry.<anonymous> (.../node_modules/tar/dist/...)
    ...
Process exited with code 1

err.path points at the file being appended to mid-stream (typically a live session transcript or gateway log).

Root Cause

node-tar's WriteEntry records the file size from the initial lstat(). It then streams file contents via fs.read(). If the file grows between the lstat() and the end of the read, the header size no longer matches the byte count actually consumed, the stream errors with "did not encounter expected EOF", and tar.c() rejects. Because the backup is a single transaction, the whole archive is aborted — partial writes are not salvaged.

This is not an OpenClaw bug in the strict sense: it's a fundamental mismatch between how tar packs files and how any live service writes logs. But every user who runs openclaw backup create against a running install will eventually hit it, so the CLI needs to handle it.

Steps to Reproduce

Generic reproducer (no OpenClaw state needed):

mkdir -p /tmp/backup-eof-repro/src
cd /tmp/backup-eof-repro

# Generate a non-trivial file that will be appended to during archiving.
yes "$(head -c 2000 /dev/urandom | base64)" | head -c 100M > src/live.log

# In one shell, keep appending:
while true; do date >> src/live.log; sleep 0.05; done &
APPENDER=$!

# In another, try to tar it:
tar -czf out.tar.gz src/  # or via node-tar c({ file, gzip: true }, ['src'])
# Observe: intermittent "did not encounter expected EOF" error.

kill $APPENDER

In a real OpenClaw install, just run openclaw backup create against a state directory that has any active session (i.e. a running gateway). With a state tree in the tens of GB and dozens of live sessions, failure is nearly deterministic.

Environment

  • OpenClaw: observed on 2026.4.x line
  • Node: v22–v25
  • OS: macOS and Linux both reproduce
  • Triggering files: {stateDir}/sessions/**/*.{jsonl,log}, {stateDir}/cron/runs/**/*.log, {stateDir}/logs/**/*.{jsonl,log}

Impact

  • Severity: High for users with a running gateway and a non-trivial state directory — backups can fail repeatedly until the gateway is stopped.
  • Workaround today: stop the gateway before running backup create. Not viable for scheduled/automated backups.

Expected Behavior

openclaw backup create should complete on a running install. Files known to be volatile (live logs, sockets, pid/lock markers) are not meaningful to snapshot anyway and can safely be skipped; transient races on other files should be retried.

Proposed Fix

  1. Default-exclude known volatile paths in the backup archiver:
    • {stateDir}/sessions/**/*.{jsonl,log}
    • {stateDir}/cron/runs/**/*.log
    • {stateDir}/logs/**/*.{jsonl,log}
    • *.{sock,pid,tmp,lock} anywhere
  2. Retry tar.c() on EOF-class errors (up to 3 attempts, 10s/20s backoff) for residual races on other files. Clean the partial temp archive between attempts.
  3. On final failure, include err.path and attempt count in the thrown message so users get an actionable report.
  4. Surface the skipped-volatile count in stdout and in --json output for observability.

This is distinct from #67417 (ENOENT when a session file is deleted mid-backup — same race family, different failure mode and different fix). A broader exclude-rule system is proposed in #67990; this bug asks for the minimum viable built-in filter that makes backup create work out of the box on any live install, without requiring user configuration.

A PR implementing the above is on the way.

extent analysis

TL;DR

To fix the openclaw backup create issue, exclude volatile paths from the backup archiver and implement retry logic for residual races on other files.

Guidance

  • Identify and exclude known volatile paths in the backup archiver, such as {stateDir}/sessions/**/*.{jsonl,log}, {stateDir}/cron/runs/**/*.log, and {stateDir}/logs/**/*.{jsonl,log}.
  • Implement retry logic for tar.c() on EOF-class errors, with a limited number of attempts and exponential backoff.
  • Clean the partial temp archive between retry attempts to prevent data corruption.
  • Include err.path and attempt count in the thrown message to provide actionable reports.

Example

No code snippet is provided as the issue is more related to the design of the backup process rather than a specific code fix.

Notes

The proposed fix is specific to the openclaw backup create command and may not apply to other backup tools or scenarios. The fix is also distinct from other issues, such as #67417, which deals with a different failure mode.

Recommendation

Apply the proposed workaround by excluding volatile paths and implementing retry logic, as this is the most straightforward way to address the issue and ensure reliable backups.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix openclaw backup create fails with "did not encounter expected EOF" on live installations [1 pull requests, 1 comments, 2 participants]