hermes - 💡(How to fix) Fix [Bug]: Cron job fires 3 hours early after gateway restart, causing duplicate execution

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Cron jobs intermittently fire ~3 hours before their scheduled time after a gateway restart, resulting in duplicate executions for the same scheduled run. The scheduler then fires the job again at the correct time, delivering two outputs for one scheduled run.

Root Cause

Cron jobs intermittently fire ~3 hours before their scheduled time after a gateway restart, resulting in duplicate executions for the same scheduled run. The scheduler then fires the job again at the correct time, delivering two outputs for one scheduled run.

Fix Action

Workaround

Currently no workaround besides restarting the gateway only during non-scheduled hours, which is impractical for 24/7 operation.

Code Example

21:03:49 - SIGTERM
21:05:14 - Started
21:06:37 - SIGTERM
21:06:49 - Started
21:08:26 - SIGTERM
21:08:39 - Started
21:10:55 - SIGTERM
21:11:06 - Started
21:16:14 - SIGTERM
21:16:20 - Started
21:24:25 - SIGTERM
21:24:29 - Started
21:43:24 - SIGTERM
21:43:28 - Started
22:45:06 - SIGTERM
22:46:31 - Started
22:58:24 - SIGTERM
22:58:52 - Started  ← final restart before the bug manifests

---

# scheduler.py L1699-1702
# Advance next_run_at for all recurring jobs FIRST, under the file lock,
# before any execution begins.  This preserves at-most-once semantics.
for job in due_jobs:
    advance_next_run(job["id"])
RAW_BUFFERClick to expand / collapse

Summary

Cron jobs intermittently fire ~3 hours before their scheduled time after a gateway restart, resulting in duplicate executions for the same scheduled run. The scheduler then fires the job again at the correct time, delivering two outputs for one scheduled run.

Environment

  • Hermes version: 0.13.0
  • OS: Ubuntu (WSL2)
  • System timezone: UTC (timedatectl confirms Etc/UTC)
  • Hermes timezone config: timezone: '' (empty — falls back to system UTC)
  • Python: system venv

Steps to Reproduce

  1. Create a recurring cron job, e.g. 0 9 * * * (daily at 09:00 UTC)
  2. Restart the Hermes gateway (systemctl restart hermes or SIGTERM)
  3. Observe the job fires ~3 hours early (e.g. at 06:00 UTC instead of 09:00 UTC)
  4. The job fires again at the correct 09:00 UTC

Evidence

Daily Briefing (0 9 * * *) — output files

DateRunsTimes (UTC)Duplicate?
May 1208:08, 09:02
May 2–111 each~09:XX
May 12206:13, 09:20

All jobs double-firing on May 12

JobScheduleCorrect TimeActual Fire Times
Phantom0 10 * * 1-510:00 UTC02:00, 03:51, (correct at 10:00 not logged yet)
Ecosystem Watch0 7 * * *07:00 UTC04:17, 07:01
Daily Briefing0 9 * * *09:00 UTC06:00, 09:00

Gateway restart pattern

The gateway restarted multiple times on May 11 evening:

21:03:49 - SIGTERM
21:05:14 - Started
21:06:37 - SIGTERM
21:06:49 - Started
21:08:26 - SIGTERM
21:08:39 - Started
21:10:55 - SIGTERM
21:11:06 - Started
21:16:14 - SIGTERM
21:16:20 - Started
21:24:25 - SIGTERM
21:24:29 - Started
21:43:24 - SIGTERM
21:43:28 - Started
22:45:06 - SIGTERM
22:46:31 - Started
22:58:24 - SIGTERM
22:58:52 - Started  ← final restart before the bug manifests

After the final restart at 22:58 UTC, the cron ticker started and all subsequent overnight jobs fired 3 hours early.

Code Analysis

The scheduler correctly implements advance_next_run() before execution (line 1702 of scheduler.py), which should guarantee at-most-once semantics:

# scheduler.py L1699-1702
# Advance next_run_at for all recurring jobs FIRST, under the file lock,
# before any execution begins.  This preserves at-most-once semantics.
for job in due_jobs:
    advance_next_run(job["id"])

The grace window for a daily cron (0 9 * * *) computes to 43,200s // 2 = 21,600s, clamped to MAX_GRACE = 7,200s (2 hours). So a job due at 09:00 UTC should NOT be considered "due" at 06:00 UTC (3 hours = 10,800s > 7,200s grace).

Yet get_due_jobs() returned the job as due at 06:00 UTC. This suggests next_run_at was incorrectly set to a value between 04:00–06:00 UTC during or after the restart sequence, bypassing the grace window check.

The _ensure_aware() / _normalize_aware_dt() function (jobs.py L274-289) interprets naive datetimes as system-local time and converts to the Hermes timezone. If advance_next_run() or compute_next_run() ever produces a naive datetime that gets misinterpreted during a restart window, this could explain the 3-hour offset — though no obvious code path produces naive datetimes in the current version.

Workaround

Currently no workaround besides restarting the gateway only during non-scheduled hours, which is impractical for 24/7 operation.

Expected Behavior

A gateway restart should never cause a job to fire before its scheduled next_run_at. The advance_next_run under-lock pattern should prevent duplicates.

Related Issues

  • #9086 — Serial cron tick execution causing silent job skips (fixed in v2026.4.23)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug]: Cron job fires 3 hours early after gateway restart, causing duplicate execution