hermes - 💡(How to fix) Fix feat(cron): add `catchup` option to execute missed recurring jobs instead of silently skipping them

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

For a daily job with a 2-hour grace window, if the gateway was down for 3 hours, the job is permanently missed. The user sees no error, no log entry — the job just doesn't run.

Fix Action

Fix / Workaround

Workaround (Current)

Code Example

grace = _compute_grace_seconds(schedule)
if kind in ("cron", "interval") and (now - next_run_dt).total_seconds() > grace:
    new_next = compute_next_run(schedule, now.isoformat())
    # Job is simply skipped — never executes, no warning

---

# In the cron job config (jobs.json)
{
  "id": "abc123",
  "name": "daily_backup",
  "catchup": true,
  "schedule": { "kind": "cron", "expr": "0 7 * * *" },
  ...
}

---

hermes cron create --schedule "0 7 * * *" --catchup --script backup.sh
hermes cron edit <job_id> --catchup true

---

cronjob(action="create", schedule="0 7 * * *", catchup=True, script="backup.sh")
cronjob(action="update", job_id="abc123", catchup=True)

---

# Current code (line ~880):
grace = _compute_grace_seconds(schedule)
if kind in ("cron", "interval") and (now - next_run_dt).total_seconds() > grace:
    # Job is past its catch-up grace windowthis is a stale missed run.
    # Grace scales with schedule period: daily=2h, hourly=30m, 10min=5m.

    if job.get("catchup", False):
        # Catchup enabled — execute the missed run immediately.
        logger.info(
            "Job '%s' missed its scheduled time (%s, grace=%ds). "
            "Catchup enabled — executing now.",
            job.get("name", job["id"]),
            next_run,
            grace,
        )
        due.append(job)
        continue

    new_next = compute_next_run(schedule, now.isoformat())
    # ... fast-forward logic unchanged
RAW_BUFFERClick to expand / collapse

Feature Request: Cron Job Catchup / Compensation for Missed Runs

Problem

When the gateway is down (machine reboot, Docker container restart, crash) or the scheduler is blocked by long-running jobs, recurring cron jobs silently miss their scheduled executions.

Current behavior in cron/jobs.pyget_due_jobs():

grace = _compute_grace_seconds(schedule)
if kind in ("cron", "interval") and (now - next_run_dt).total_seconds() > grace:
    new_next = compute_next_run(schedule, now.isoformat())
    # Job is simply skipped — never executes, no warning

For a daily job with a 2-hour grace window, if the gateway was down for 3 hours, the job is permanently missed. The user sees no error, no log entry — the job just doesn't run.

This is increasingly common for Hermes users running in Docker containers on personal computers that aren't always on.

Related Issues

  • #9086 — Serial cron tick causes silent job skipping (jobs blocked by long-running siblings)
  • #5518 / #18722 — Null next_run_at recovery (related gap, fixed for null case but not for missed runs)
  • #3396 (PR) — Added advance_next_run() to prevent crash-loop re-firing (at-most-once semantics)
  • #16265 — Recurring jobs silently marked completed when croniter missing

Proposed Solution

Add a per-job catchup: boolean option (default false for backward compatibility).

Behavior

catchupGateway was down, job missedScheduler was busy, job missed
false (default)Fast-forward to next run (current behavior)Fast-forward to next run (current behavior)
trueExecute immediately on next tick, then resume normal scheduleExecute immediately on next tick, then resume normal schedule

Configuration

# In the cron job config (jobs.json)
{
  "id": "abc123",
  "name": "daily_backup",
  "catchup": true,
  "schedule": { "kind": "cron", "expr": "0 7 * * *" },
  ...
}

CLI:

hermes cron create --schedule "0 7 * * *" --catchup --script backup.sh
hermes cron edit <job_id> --catchup true

Tool API (cronjob):

cronjob(action="create", schedule="0 7 * * *", catchup=True, script="backup.sh")
cronjob(action="update", job_id="abc123", catchup=True)

Implementation

The change is minimal and localized to cron/jobs.py → `get_due_jobs():

# Current code (line ~880):
grace = _compute_grace_seconds(schedule)
if kind in ("cron", "interval") and (now - next_run_dt).total_seconds() > grace:
    # Job is past its catch-up grace window — this is a stale missed run.
    # Grace scales with schedule period: daily=2h, hourly=30m, 10min=5m.

    if job.get("catchup", False):
        # Catchup enabled — execute the missed run immediately.
        logger.info(
            "Job '%s' missed its scheduled time (%s, grace=%ds). "
            "Catchup enabled — executing now.",
            job.get("name", job["id"]),
            next_run,
            grace,
        )
        due.append(job)
        continue

    new_next = compute_next_run(schedule, now.isoformat())
    # ... fast-forward logic unchanged

Files to modify:

  1. cron/jobs.py — Add catchup field to create_job(), _get_due_jobs_locked()
  2. tools/cronjob_tools.py — Add catchup parameter to cronjob() tool, pass to create_job() and update_job()
  3. cron/jobs.py — Add catchup to update_job() updates dict

UX Considerations

  • Default false — backward compatible, existing behavior unchanged
  • Per-job opt-in — users choose which jobs need catchup (not all jobs should catchup; some like "send daily digest" make no sense to run 5 hours late)
  • Log clearly — when catchup fires, log at WARNING level so users know a missed run was compensated
  • No duplicate riskadvance_next_run() already advances next_run_at before execution, so catchup jobs won't re-fire on the next tick

Test Cases

  1. Daily cron with catchup=true, gateway down for 3h → job fires on restart, next run scheduled normally
  2. Daily cron with catchup=false (default), gateway down for 3h → job skipped, next run scheduled normally
  3. Hourly cron with catchup=true, gateway down for 2h → job fires once (not twice — single catchup)
  4. Multiple missed periods with catchup=true → fires once for the most recent miss, not N times

Workaround (Current)

Users who need this today must implement external catchup logic — e.g., a script that runs at container startup and checks last run timestamps, or use a separate cron system (systemd timers, host crontab) outside of Hermes.


Motivation

I run Hermes in a Docker container on my personal computer. When the machine is off overnight, daily cron jobs are silently missed. A catchup option would let critical jobs (backups, health checks, data sync) compensate for missed runs automatically.

This is analogous to systemd timer Persistent=true or AWS CloudWatch "catch up" behavior for scheduled events.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix feat(cron): add `catchup` option to execute missed recurring jobs instead of silently skipping them