openclaw - 💡(How to fix) Fix Cron: lastRunAtMs not persisted until Phase 3 — gateway restart causes duplicate job execution [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#61343Fetched 2026-04-08 02:59:41
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Root Cause

onTimer in timer.ts uses three phases:

PhaseWhat it doesPersisted?
1 (L611-616)Sets runningAtMsawait persist(state)
2 (L624-669)Executes job (unbounded time, no lock)
3 (L675-689)Calls applyOutcomeToStoredJob → sets lastRunAtMs✓ but only if reached

If gateway dies between Phase 1 and Phase 3, lastRunAtMs is never updated. The runningAtMs marker is cleaned up on restart (ops.ts:102-114), but nothing prevents the missed-job check from using the stale lastRunAtMs.

The codebase already acknowledges Phase-1 crash scenarios — see the comment at ops.ts:390-392 referencing #17554 — but only for the runningAtMs marker, not for lastRunAtMs.

Code Example

// After executeJobCoreWithTimeout returns, before delivery:
await locked(state, async () => {
  await ensureLoaded(state, { forceReload: true, skipRecompute: true });
  const stored = state.store.jobs.find(j => j.id === job.id);
  if (stored) {
    stored.state.lastRunAtMs = startedAt;
    // Optionally: stored.state.lastRunStatus = "executed-pending-delivery"
  }
  await persist(state);
});
RAW_BUFFERClick to expand / collapse

Bug

When a cron job executes successfully but the gateway restarts before the outcome persist (Phase 3) completes, lastRunAtMs is never written to jobs.json. On restart, the stale lastRunAtMs (potentially days old) causes runMissedJobs to consider the job overdue and execute it again.

Result: Duplicate delivery (e.g., two morning briefings).

Reproduction

  1. Cron job fires at 08:00, Phase 1 persists runningAtMs
  2. Job executes and delivers successfully (Phase 2, no lock held)
  3. Gateway restarts (SIGUSR1, crash, deploy) before Phase 3 persist
  4. On restart: runningAtMs is cleared (ops.ts:102-114), but lastRunAtMs is still stale
  5. isRunnableJob (timer.ts:784-801) sees previousRunAtMs > lastRunAtMs → job re-runs
  6. User receives a second delivery

This happened in production on 2025-04-05: a daily-summary job delivered at 08:00, then again at 12:06 after a restart cascade.

Root cause

onTimer in timer.ts uses three phases:

PhaseWhat it doesPersisted?
1 (L611-616)Sets runningAtMsawait persist(state)
2 (L624-669)Executes job (unbounded time, no lock)
3 (L675-689)Calls applyOutcomeToStoredJob → sets lastRunAtMs✓ but only if reached

If gateway dies between Phase 1 and Phase 3, lastRunAtMs is never updated. The runningAtMs marker is cleaned up on restart (ops.ts:102-114), but nothing prevents the missed-job check from using the stale lastRunAtMs.

The codebase already acknowledges Phase-1 crash scenarios — see the comment at ops.ts:390-392 referencing #17554 — but only for the runningAtMs marker, not for lastRunAtMs.

Suggested fix

Persist lastRunAtMs immediately after job execution completes, before delivery attempt:

// After executeJobCoreWithTimeout returns, before delivery:
await locked(state, async () => {
  await ensureLoaded(state, { forceReload: true, skipRecompute: true });
  const stored = state.store.jobs.find(j => j.id === job.id);
  if (stored) {
    stored.state.lastRunAtMs = startedAt;
    // Optionally: stored.state.lastRunStatus = "executed-pending-delivery"
  }
  await persist(state);
});

This way, even if the gateway crashes during or after delivery, the job won't be considered missed on restart. Delivery status can still be updated in Phase 3.

Alternative: on restart, if a runningAtMs marker is found and cleared, also set lastRunAtMs = runningAtMs before clearing — treating interrupted jobs as "ran but status unknown" rather than "never ran."

Affected files

  • src/cron/service/timer.tsonTimer (L572-731), applyJobResult (L296-474), isRunnableJob (L784-801)
  • src/cron/service/ops.ts — startup cleanup (L102-114), missed job detection

extent analysis

TL;DR

Persist lastRunAtMs immediately after job execution completes to prevent duplicate deliveries due to stale lastRunAtMs values.

Guidance

  • Update the onTimer function in timer.ts to persist lastRunAtMs after job execution, as suggested in the issue.
  • Consider implementing a lock mechanism to prevent concurrent job executions and ensure data consistency.
  • Review the isRunnableJob function to handle cases where lastRunAtMs is not updated due to a gateway restart.
  • Test the updated code to ensure that lastRunAtMs is correctly updated and duplicate deliveries are prevented.

Example

await locked(state, async () => {
  await ensureLoaded(state, { forceReload: true, skipRecompute: true });
  const stored = state.store.jobs.find(j => j.id === job.id);
  if (stored) {
    stored.state.lastRunAtMs = startedAt;
    await persist(state);
  }
});

Notes

The suggested fix assumes that the persist function is correctly implemented and updates the lastRunAtMs value in the storage. Additionally, the alternative approach of setting lastRunAtMs = runningAtMs on restart may require further testing to ensure it handles all edge cases correctly.

Recommendation

Apply the suggested fix by persisting lastRunAtMs immediately after job execution completes, as it directly addresses the root cause of the issue and prevents duplicate deliveries.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING