hermes - 💡(How to fix) Fix cron: no health monitoring for cron ticker — can silently stop for hours [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#20302Fetched 2026-05-06 06:37:29
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
labeled ×4

Error Message

There is no mechanism to detect when the cron ticker stops processing jobs. If scheduler.py has a syntax error, an import failure, or the ticker encounters an unhandled exception, all scheduled jobs silently stop running — but the gateway continues to operate normally with no alert.

  1. Introduce a syntax error in cron/scheduler.py (e.g. a bad f-string)
  2. Check gateway.log — no critical error about cron ticker failure High severity for production deployments. All automated tasks (backups, health checks, scheduled reports, data cleanup) silently stop. In one observed case, a syntax error caused the cron ticker to stop for 7.5 hours with zero alerts — all cron jobs did not run during that window.
RAW_BUFFERClick to expand / collapse

Bug Description

There is no mechanism to detect when the cron ticker stops processing jobs. If scheduler.py has a syntax error, an import failure, or the ticker encounters an unhandled exception, all scheduled jobs silently stop running — but the gateway continues to operate normally with no alert.

Steps to Reproduce

  1. Introduce a syntax error in cron/scheduler.py (e.g. a bad f-string)
  2. Start or restart the gateway
  3. Observe that the gateway starts successfully with no critical errors
  4. Check cron job execution — all jobs stop running silently
  5. Check gateway.log — no critical error about cron ticker failure

Expected Behavior

Two layers of protection:

  1. Preflight at startup: Gateway should verify that cron/scheduler.py can be compiled AND imported before starting the cron ticker. If either fails, emit a critical log and refuse to start the ticker.
  2. Runtime heartbeat monitoring: During ticker operation, periodically check that the ticker is actually running (e.g. by checking a heartbeat timestamp). If the ticker appears dead for more than ~30 minutes, emit a critical alert.

Actual Behavior

No preflight check and no runtime monitoring. The cron ticker can stop for any reason and all scheduled jobs silently cease with no alert.

Impact

High severity for production deployments. All automated tasks (backups, health checks, scheduled reports, data cleanup) silently stop. In one observed case, a syntax error caused the cron ticker to stop for 7.5 hours with zero alerts — all cron jobs did not run during that window.

extent analysis

TL;DR

Implement preflight checks at startup and runtime heartbeat monitoring to detect and alert when the cron ticker stops processing jobs.

Guidance

  • Introduce a preflight check in the gateway startup sequence to verify that cron/scheduler.py can be compiled and imported without errors before starting the cron ticker.
  • Implement a runtime heartbeat monitoring mechanism to periodically check the cron ticker's status and emit a critical alert if it appears dead for an extended period (e.g., 30 minutes).
  • Modify the gateway.log to include critical error messages when the cron ticker fails to start or stops running unexpectedly.
  • Consider adding a fallback mechanism to restart the cron ticker or trigger an alert when it fails to run for an extended period.

Example

import importlib.util
import os

def check_cron_scheduler():
    file_path = 'cron/scheduler.py'
    if not os.path.exists(file_path):
        print("Critical: Cron scheduler file not found.")
        return False
    
    spec = importlib.util.spec_from_file_location("scheduler", file_path)
    if spec is None:
        print("Critical: Failed to import cron scheduler.")
        return False
    
    try:
        module = importlib.util.module_from_spec(spec)
        spec.loader.exec_module(module)
    except Exception as e:
        print(f"Critical: Error importing cron scheduler: {e}")
        return False
    
    return True

Notes

The provided example is a basic illustration of how to implement a preflight check for the cron scheduler. The actual implementation may vary depending on the specific requirements and the existing codebase.

Recommendation

Apply workaround: Implement the suggested preflight checks and runtime monitoring to detect and alert when the cron ticker stops processing jobs, as this will provide an immediate solution to the problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING