openclaw - ✅(Solved) Fix cron: failureAlert never fires — all error jobs show deliveryStatus 'not-requested' [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#60845Fetched 2026-04-08 02:46:33
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
cross-referenced ×3referenced ×1

Cron job failureAlert is configured but never fires because deliveryStatus is always "not-requested" on error runs. The consecutiveErrors counter increments correctly, but the delivery path is skipped entirely. No alert reaches Discord, Slack, or any configured channel.

Error Message

Cron job failureAlert is configured but never fires because deliveryStatus is always "not-requested" on error runs. The consecutiveErrors counter increments correctly, but the delivery path is skipped entirely. No alert reaches Discord, Slack, or any configured channel. 2. Trigger the job to fail (script error or timeout) 3. Run openclaw cron list — job shows error status with consecutiveErrors: 1 (or higher) 4. Query job runs: openclaw cron runs <id> — every error run shows deliveryStatus: not-requested Every error run shows deliveryStatus: "not-requested". No delivery attempt is logged. The failureAlert.after threshold has no effect — alerts never fire. The failureAlert delivery path appears to be decoupled from the job runner error handling. When a cron job fails (either agentTurn timeout or script error), the consecutiveErrors counter increments but the delivery request is never triggered. This is distinct from #56521 (feature request for agent-turn alerts) — this bug means even the baseline announce delivery mechanism does not fire.

  1. Audit the cron error handling path to confirm requestDelivery() is not called on job failure

Root Cause

The failureAlert delivery path appears to be decoupled from the job runner error handling. When a cron job fails (either agentTurn timeout or script error), the consecutiveErrors counter increments but the delivery request is never triggered. This is distinct from #56521 (feature request for agent-turn alerts) — this bug means even the baseline announce delivery mechanism does not fire.

Fix Action

Fixed

PR fix notes

PR #60876: fix: cron failureAlert fires correctly on error and skipped runs

Description (problem / solution / changelog)

Summary

Two related cron monitoring failures fixed in applyJobResult (src/cron/service/timer.ts).


Fix: #60845 — failureAlert never fires on error runs

Root cause: The local fork added an extra condition to the isBestEffort guard in applyJobResult:

// Before (broken):
const isBestEffort =
  job.delivery?.bestEffort === true ||
  (job.payload.kind === "agentTurn" && job.payload.bestEffortDeliver === true);

payload.bestEffortDeliver is a legacy field that gates output delivery, not failure alerting. Any agentTurn job that had bestEffortDeliver: true in its payload (set during migration from the old top-level format) would silently skip failureAlert forever — consecutiveErrors incremented correctly but emitFailureAlert was never called.

Fix: Revert to the upstream guard:

// After (correct):
const isBestEffort = job.delivery?.bestEffort === true;

Fix: #60846 — failureAlert never evaluated for "skipped" runs

Root cause: The else branch in applyJobResult handled both "ok" and "skipped" results identically — resetting consecutiveErrors to 0 and never calling resolveFailureAlert. A job that is permanently stuck in "skipped" state (e.g. gateway-restart health-check jobs, jobs with empty systemEvent text) generated zero alerts regardless of failureAlert configuration.

Fix: Split "skipped" into its own branch with:

  • A new consecutiveSkips counter (mirrors consecutiveErrors)
  • A new lastSkipAlertAtMs cooldown timestamp (mirrors lastFailureAlertAtMs)
  • resolveFailureAlert evaluation against consecutiveSkips >= alertConfig.after
  • emitFailureAlert with isSkip: true for a distinct message ("skipped N times / Reason: ...")
  • consecutiveErrors + lastFailureAlertAtMs still reset on "skipped" (a skip is not an error)
  • Both counters reset on "ok" (clean run clears all alert state)

Files Changed

  • src/cron/service/timer.tsemitFailureAlert + applyJobResult
  • src/cron/types.ts — added consecutiveSkips?: number and lastSkipAlertAtMs?: number to CronJobState
  • src/gateway/protocol/schema/cron.ts — added consecutiveSkips and lastSkipAlertAtMs to CronJobStateSchema

Fixes openclaw/openclaw#60845 Fixes openclaw/openclaw#60846

Changed files

  • src/config/io.ts (modified, +49/-3)
  • src/cron/service/timer.ts (modified, +64/-26)
  • src/cron/types.ts (modified, +4/-0)
  • src/gateway/protocol/schema/cron.ts (modified, +2/-0)
RAW_BUFFERClick to expand / collapse

Bug type

Bug / Silent Failure

OpenClaw version

OpenClaw 2026.4.2 (d74a122)

OS and install method

Linux 6.8.0-106-generic (x64), Node.js v24.14.1, npm global, systemd user service.

Summary

Cron job failureAlert is configured but never fires because deliveryStatus is always "not-requested" on error runs. The consecutiveErrors counter increments correctly, but the delivery path is skipped entirely. No alert reaches Discord, Slack, or any configured channel.

Steps to reproduce

  1. Create a cron job with failureAlert: { after: 1, channel: discord, to: channel:xxx }
  2. Trigger the job to fail (script error or timeout)
  3. Run openclaw cron list — job shows error status with consecutiveErrors: 1 (or higher)
  4. Query job runs: openclaw cron runs <id> — every error run shows deliveryStatus: not-requested
  5. Check the Discord channel — no alert message was posted

Expected behavior

After consecutiveErrors >= failureAlert.after (e.g., 1), the gateway should deliver the failure alert to the configured channel.

Actual behavior

Every error run shows deliveryStatus: "not-requested". No delivery attempt is logged. The failureAlert.after threshold has no effect — alerts never fire.

Logs and evidence

Two jobs confirmed on a live instance running 2026.4.2:

  • pr47305-monitor: 28 errors today, all "not-requested"
  • memory-health-unified: 2 errors today, all "not-requested"

Gateway logs (/tmp/openclaw/openclaw-2026-04-04.log) show zero delivery request entries for any failureAlert-configured job.

Root cause analysis

The failureAlert delivery path appears to be decoupled from the job runner error handling. When a cron job fails (either agentTurn timeout or script error), the consecutiveErrors counter increments but the delivery request is never triggered. This is distinct from #56521 (feature request for agent-turn alerts) — this bug means even the baseline announce delivery mechanism does not fire.

Impact

  • All cron failure alerts are silently dropped — operators receive zero notification when jobs fail, regardless of failureAlert configuration
  • This affects systemEvent and agentTurn jobs alike
  • The failureAlert.after setting is effectively a no-op

Proposed fix

  1. Audit the cron error handling path to confirm requestDelivery() is not called on job failure
  2. Wire up failureAlert delivery to fire when consecutiveErrors >= failureAlert.after
  3. Add a deliveryStatus value of "failed" (deliver failed) distinct from "not-requested" (never attempted)

Related

  • #56521 — Feature: Route failure alerts as agent-turn events (this bug blocks that feature from working at all)
  • #54834 — Cron isolated agentTurn announce delivery can complete with deliveryStatus: "unknown"

extent analysis

TL;DR

The failureAlert delivery path needs to be coupled with the job runner error handling to trigger alerts when a cron job fails.

Guidance

  • Review the cron error handling code to ensure requestDelivery() is called when a job fails, specifically when consecutiveErrors >= failureAlert.after.
  • Verify that the deliveryStatus is updated correctly to reflect the delivery attempt, such as adding a "failed" status.
  • Check the gateway logs for any delivery request entries related to failureAlert-configured jobs to confirm the issue.
  • Test the failureAlert functionality with a simple cron job to isolate the problem and verify the fix.

Example

No code snippet is provided as the issue does not include specific code references.

Notes

The proposed fix involves auditing the cron error handling path and wiring up the failureAlert delivery. However, without access to the codebase, it's difficult to provide a more detailed solution.

Recommendation

Apply a workaround by manually triggering the failureAlert delivery when a job fails, until the underlying issue is fixed. This can be done by creating a custom script that checks the job status and triggers the alert when necessary.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After consecutiveErrors >= failureAlert.after (e.g., 1), the gateway should deliver the failure alert to the configured channel.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING