openclaw - 💡(How to fix) Fix Bug: `saveCronStore` overwrites jobs.json from partial in-memory state after restart, causing silent job loss [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#53746Fetched 2026-04-08 01:24:01
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Timeline (top)
cross-referenced ×2

When the gateway restarts and a cron fires before full state is loaded, saveCronStore writes a partial in-memory job list over the full on-disk job list — silently wiping all jobs not present in memory at that moment. We lost 86 jobs twice in 24 hours from this.

Error Message

  1. All other jobs are gone — no error, no warning

Root Cause

File: src/cron/store.ts (dist: config-runtime-BYNizC50.js)
Function: saveCronStore(storePath, store, opts)

Current write path:

in-memory state → write to .tmp → rename to jobs.json

The function writes whatever is currently in memory as the complete canonical job list. If an isolated cron session fires 1–2 seconds after a gateway restart, only its own jobs exist in its memory scope. When it writes, it replaces all 50+ other jobs with its 1–2 job state.

OpenClaw already uses atomic writes (tmp → rename) which prevents file corruption — but atomic writes of the wrong data still cause silent data loss.

Fix Action

Fix / Workaround

Current Workaround

Code Example

in-memory state → write to .tmp → rename to jobs.json

---

// Proposed change to saveCronStore
async function saveCronStore(storePath, store, opts) {
  // Read current disk state
  const diskJobs = await readJobsFromDisk(storePath) ?? [];
  
  // Merge: apply delta from in-memory store onto disk state
  const merged = mergeJobStates(diskJobs, store.jobs);
  
  // Backup + atomic write (existing behavior, preserved)
  await backupAndAtomicWrite(storePath, merged);
}
RAW_BUFFERClick to expand / collapse

Summary

When the gateway restarts and a cron fires before full state is loaded, saveCronStore writes a partial in-memory job list over the full on-disk job list — silently wiping all jobs not present in memory at that moment. We lost 86 jobs twice in 24 hours from this.

Root Cause

File: src/cron/store.ts (dist: config-runtime-BYNizC50.js)
Function: saveCronStore(storePath, store, opts)

Current write path:

in-memory state → write to .tmp → rename to jobs.json

The function writes whatever is currently in memory as the complete canonical job list. If an isolated cron session fires 1–2 seconds after a gateway restart, only its own jobs exist in its memory scope. When it writes, it replaces all 50+ other jobs with its 1–2 job state.

OpenClaw already uses atomic writes (tmp → rename) which prevents file corruption — but atomic writes of the wrong data still cause silent data loss.

Reproduction

  1. Create 50+ cron jobs
  2. Restart the gateway
  3. Within ~5 seconds of restart, a cron fires in an isolated session
  4. That session has only 1–2 jobs in memory
  5. saveCronStore writes those 1–2 jobs to disk
  6. All other jobs are gone — no error, no warning

This is amplified by any crash loop or rapid restart cycle (e.g., watchdog, config changes).

Proposed Fix: Read-Merge-Write Pattern

Instead of writing in-memory state directly, saveCronStore should:

  1. Read current jobs.json from disk
  2. Merge — apply only the in-memory delta (add/modify/remove the specific job that changed)
  3. Backup — copy current jobs.jsonjobs.json.bak (already done, keep this)
  4. Write — write merged result to .tmp, then rename to jobs.json
// Proposed change to saveCronStore
async function saveCronStore(storePath, store, opts) {
  // Read current disk state
  const diskJobs = await readJobsFromDisk(storePath) ?? [];
  
  // Merge: apply delta from in-memory store onto disk state
  const merged = mergeJobStates(diskJobs, store.jobs);
  
  // Backup + atomic write (existing behavior, preserved)
  await backupAndAtomicWrite(storePath, merged);
}

This ensures:

  • A session with 1 job in memory cannot wipe 51 jobs from disk
  • Add/modify/delete operations apply as deltas, not full replacements
  • Behavior is identical to current for the normal (non-restart-race) case

Current Workaround

External watchdog (cron-guardian.sh via launchd) running every 5 minutes:

  • Detects count regression (< 10 jobs)
  • Auto-restores from rotating timestamped backups
  • Sends Telegram alert
  • Preserves forensics file

This mitigates impact but does not prevent the race. Restoration window is ~5 minutes worst-case.

Environment

  • OpenClaw version: 2026.3.23-2
  • OS: macOS 15.x (Darwin arm64)
  • Gateway: launchd-managed, self-heal watchdog enabled
  • Cron jobs at time of loss: ~86 (first incident), ~52 (second incident)

Related

  • Issue #53481 — cron.onChange webhook (filed separately — a registry hook would also help detect this faster)

extent analysis

Fix Plan

To address the issue of silent data loss when the gateway restarts and a cron fires before full state is loaded, we will implement a read-merge-write pattern in the saveCronStore function.

Here are the concrete steps:

  • Read the current jobs.json from disk using readJobsFromDisk.
  • Merge the in-memory job list with the disk state using mergeJobStates.
  • Backup the current jobs.json to jobs.json.bak.
  • Write the merged job list to a temporary file and then rename it to jobs.json using backupAndAtomicWrite.

Example code:

async function saveCronStore(storePath, store, opts) {
  // Read current disk state
  const diskJobs = await readJobsFromDisk(storePath) ?? [];
  
  // Merge: apply delta from in-memory store onto disk state
  const merged = mergeJobStates(diskJobs, store.jobs);
  
  // Backup + atomic write (existing behavior, preserved)
  await backupAndAtomicWrite(storePath, merged);
}

// Helper function to merge job states
function mergeJobStates(diskJobs, memoryJobs) {
  const mergedJobs = [...diskJobs];
  
  // Add or update jobs from memory
  memoryJobs.forEach((job) => {
    const existingJob = mergedJobs.find((j) => j.id === job.id);
    if (existingJob) {
      Object.assign(existingJob, job);
    } else {
      mergedJobs.push(job);
    }
  });
  
  // Remove deleted jobs
  return mergedJobs.filter((job) => !memoryJobs.find((j) => j.id === job.id && j.deleted));
}

Verification

To verify that the fix worked, you can:

  • Restart the gateway and trigger a cron job within 5 seconds of restart.
  • Check the jobs.json file to ensure that all jobs are still present.
  • Verify that the jobs.json.bak file is updated correctly.

Extra Tips

  • Make sure to test the mergeJobStates function thoroughly to ensure it handles all edge cases correctly.
  • Consider adding additional logging or monitoring to detect any issues with the new implementation.
  • Review the cron-guardian.sh script to ensure it is still necessary and effective with the new implementation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Bug: `saveCronStore` overwrites jobs.json from partial in-memory state after restart, causing silent job loss [1 participants]