hermes - 💡(How to fix) Fix [Bug]: Cron job fires 3 hours early after gateway restart, causing duplicate execution

hermes2026-05-12 09:52:51

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Cron jobs intermittently fire ~3 hours before their scheduled time after a gateway restart, resulting in duplicate executions for the same scheduled run. The scheduler then fires the job again at the correct time, delivering two outputs for one scheduled run.

Root Cause

Fix Action

Workaround

Currently no workaround besides restarting the gateway only during non-scheduled hours, which is impractical for 24/7 operation.

Code Example

21:03:49 - SIGTERM
21:05:14 - Started
21:06:37 - SIGTERM
21:06:49 - Started
21:08:26 - SIGTERM
21:08:39 - Started
21:10:55 - SIGTERM
21:11:06 - Started
21:16:14 - SIGTERM
21:16:20 - Started
21:24:25 - SIGTERM
21:24:29 - Started
21:43:24 - SIGTERM
21:43:28 - Started
22:45:06 - SIGTERM
22:46:31 - Started
22:58:24 - SIGTERM
22:58:52 - Started  ← final restart before the bug manifests

---

# scheduler.py L1699-1702
# Advance next_run_at for all recurring jobs FIRST, under the file lock,
# before any execution begins.  This preserves at-most-once semantics.
for job in due_jobs:
    advance_next_run(job["id"])

RAW_BUFFERClick to expand / collapse

Summary

Environment

Hermes version: 0.13.0
OS: Ubuntu (WSL2)
System timezone: UTC (timedatectl confirms Etc/UTC)
Hermes timezone config: timezone: '' (empty — falls back to system UTC)
Python: system venv

Steps to Reproduce

Create a recurring cron job, e.g. 0 9 * * * (daily at 09:00 UTC)
Restart the Hermes gateway (systemctl restart hermes or SIGTERM)
Observe the job fires ~3 hours early (e.g. at 06:00 UTC instead of 09:00 UTC)
The job fires again at the correct 09:00 UTC

Evidence

Daily Briefing (`0 9 * * *`) — output files

Date	Runs	Times (UTC)	Duplicate?
May 1	2	08:08, 09:02	✅
May 2–11	1 each	~09:XX	❌
May 12	2	06:13, 09:20	✅

All jobs double-firing on May 12

Job	Schedule	Correct Time	Actual Fire Times
Phantom	`0 10 * * 1-5`	10:00 UTC	02:00, 03:51, (correct at 10:00 not logged yet)
Ecosystem Watch	`0 7 * * *`	07:00 UTC	04:17, 07:01
Daily Briefing	`0 9 * * *`	09:00 UTC	06:00, 09:00

Gateway restart pattern

The gateway restarted multiple times on May 11 evening:

21:03:49 - SIGTERM
21:05:14 - Started
21:06:37 - SIGTERM
21:06:49 - Started
21:08:26 - SIGTERM
21:08:39 - Started
21:10:55 - SIGTERM
21:11:06 - Started
21:16:14 - SIGTERM
21:16:20 - Started
21:24:25 - SIGTERM
21:24:29 - Started
21:43:24 - SIGTERM
21:43:28 - Started
22:45:06 - SIGTERM
22:46:31 - Started
22:58:24 - SIGTERM
22:58:52 - Started  ← final restart before the bug manifests

After the final restart at 22:58 UTC, the cron ticker started and all subsequent overnight jobs fired 3 hours early.

Code Analysis

The scheduler correctly implements advance_next_run() before execution (line 1702 of scheduler.py), which should guarantee at-most-once semantics:

# scheduler.py L1699-1702
# Advance next_run_at for all recurring jobs FIRST, under the file lock,
# before any execution begins.  This preserves at-most-once semantics.
for job in due_jobs:
    advance_next_run(job["id"])

The grace window for a daily cron (0 9 * * *) computes to 43,200s // 2 = 21,600s, clamped to MAX_GRACE = 7,200s (2 hours). So a job due at 09:00 UTC should NOT be considered "due" at 06:00 UTC (3 hours = 10,800s > 7,200s grace).

Yet get_due_jobs() returned the job as due at 06:00 UTC. This suggests next_run_at was incorrectly set to a value between 04:00–06:00 UTC during or after the restart sequence, bypassing the grace window check.

The _ensure_aware() / _normalize_aware_dt() function (jobs.py L274-289) interprets naive datetimes as system-local time and converts to the Hermes timezone. If advance_next_run() or compute_next_run() ever produces a naive datetime that gets misinterpreted during a restart window, this could explain the 3-hour offset — though no obvious code path produces naive datetimes in the current version.

Workaround

Currently no workaround besides restarting the gateway only during non-scheduled hours, which is impractical for 24/7 operation.

Expected Behavior

A gateway restart should never cause a job to fire before its scheduled next_run_at. The advance_next_run under-lock pattern should prevent duplicates.

Related Issues

#9086 — Serial cron tick execution causing silent job skips (fixed in v2026.4.23)

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#prompt issue #agent setup #task chaining #parallel task #integration issue

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug]: Cron job fires 3 hours early after gateway restart, causing duplicate execution

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Environment

Steps to Reproduce

Evidence

Daily Briefing (`0 9 * * *`) — output files

All jobs double-firing on May 12

Gateway restart pattern

Code Analysis

Workaround

Expected Behavior

Related Issues

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug]: Cron job fires 3 hours early after gateway restart, causing duplicate execution

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Environment

Steps to Reproduce

Evidence

Daily Briefing (0 9 * * *) — output files

All jobs double-firing on May 12

Gateway restart pattern

Code Analysis

Workaround

Expected Behavior

Related Issues

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Daily Briefing (`0 9 * * *`) — output files