openclaw - 💡(How to fix) Fix Cron: gateway restart loop should pause/defer scheduled jobs [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59301Fetched 2026-04-08 02:26:16
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1
RAW_BUFFERClick to expand / collapse

Problem

When the gateway enters a restart loop (e.g., after a version downgrade), cron jobs can fire during a brief ~5-minute window, only to be killed by SIGTERM before they complete. No run record is written, and the next gateway instance schedules the job for the next occurrence — silently dropping the current one.

Observed Behavior

  1. Gateway restarts every ~5 minutes due to instability
  2. A cron job fires at its scheduled time (e.g., 15 19 * * 0)
  3. ~35 seconds later, SIGTERM kills the gateway mid-execution
  4. New gateway starts, sees scheduled time has passed, moves nextRunAtMs forward
  5. No run entry is written — the job is silently lost
  6. openclaw cron runs --id <job-id> shows 0 entries

Expected Behavior

The gateway should detect restart instability (e.g., >3 restarts in 15 minutes) and defer cron execution until the gateway has been stable for a minimum window (e.g., 5 minutes). This prevents jobs from firing in doomed windows.

Alternatively, the cron scheduler could write a "started" record before execution, so the next instance knows a job was interrupted and can retry.

Environment

  • OpenClaw v2026.3.11
  • macOS (arm64)
  • Cron job using opus model (slow to initialize, making the kill window especially dangerous)

Suggested Approaches

  1. Stability gate: Track gateway start time. Don't fire cron jobs until uptime > N minutes.
  2. Pre-execution journaling: Write a started record to the runs JSONL before firing. On startup, check for started without completed and retry.
  3. Restart detection: If the gateway detects it's been restarted >3 times in 15 min, enter a "degraded" mode that defers non-critical crons.

extent analysis

TL;DR

Implement a stability gate or pre-execution journaling mechanism to prevent cron jobs from firing during gateway restart loops.

Guidance

  • Consider implementing a stability gate that tracks gateway uptime and defers cron job execution until the gateway has been stable for a minimum window (e.g., 5 minutes).
  • Alternatively, implement pre-execution journaling by writing a "started" record to the runs JSONL before firing a cron job, and retrying incomplete jobs on startup.
  • To mitigate the issue, focus on detecting restart instability (e.g., >3 restarts in 15 minutes) and deferring non-critical cron jobs during this time.
  • Review the cron job scheduling logic to ensure it accounts for potential interruptions and can recover from them.

Example

// Example of a "started" record in the runs JSONL
{
  "jobId": "<job-id>",
  "startedAt": "<timestamp>",
  "status": "started"
}

Notes

The suggested approaches require modifications to the gateway's cron job scheduling logic and may involve additional error handling and retry mechanisms.

Recommendation

Apply a workaround by implementing a stability gate or pre-execution journaling mechanism, as upgrading to a fixed version is not mentioned as an option in the issue. This will help prevent cron jobs from firing during gateway restart loops and ensure that jobs are not silently dropped.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING