openclaw - ✅(Solved) Fix [Bug]: Recurring main-session cron jobs ghost (status:ok, durationMs ~16ms) when first heartbeat hits a retryable busy skip — early-exit fire-and-forgets without consuming queued cron event [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75964Fetched 2026-05-03 04:43:48
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
2
Timeline (top)
cross-referenced ×3closed ×1commented ×1unsubscribed ×1

executeMainSessionCronJob in src/cron/service/timer.ts has an early-exit path that returns status: ok after fire-and-forgetting a heartbeat wake when:

  1. wakeMode === "now"
  2. The first runHeartbeatOnce call returns skipped with a retryable busy reason (requests-in-flight, cron-in-progress, or lanes-busy)
  3. The job is recurring (schedule.kind !== "at")

In that case the cron records status: ok with durationMs ~16ms even though the agent never processed the cron event. This is functionally a "ghost run."

This is distinct from #73189 (closed): that fix made buildCronEventPrompt work when the heartbeat does run. This bug is about the case where the heartbeat is skipped and a follow-up wake is fire-and-forgotten — requestHeartbeatNow is not awaited, and the queued cron event is not guaranteed to be consumed.

Root Cause

executeMainSessionCronJob in src/cron/service/timer.ts has an early-exit path that returns status: ok after fire-and-forgetting a heartbeat wake when:

  1. wakeMode === "now"
  2. The first runHeartbeatOnce call returns skipped with a retryable busy reason (requests-in-flight, cron-in-progress, or lanes-busy)
  3. The job is recurring (schedule.kind !== "at")

In that case the cron records status: ok with durationMs ~16ms even though the agent never processed the cron event. This is functionally a "ghost run."

This is distinct from #73189 (closed): that fix made buildCronEventPrompt work when the heartbeat does run. This bug is about the case where the heartbeat is skipped and a follow-up wake is fire-and-forgotten — requestHeartbeatNow is not awaited, and the queued cron event is not guaranteed to be consumed.

Fix Action

Fixed

PR fix notes

PR #76083: fix(cron): retry recurring wake-now jobs on temporary busy skips

Description (problem / solution / changelog)

Summary

  • retry recurring wakeMode=now main-session cron jobs on temporary heartbeat busy skips instead of recording an immediate ok ghost run
  • keep cron-in-progress on the immediate deferred heartbeat path because the active cron marker prevents synchronous wake-now from succeeding
  • update regression coverage and changelog for #75964

Fixes #75964

Test plan

  • pnpm test src/cron/service/timer.regression.test.ts src/cron/service.runs-one-shot-main-job-disables-it.test.ts src/cron/service.main-job-passes-heartbeat-target-last.test.ts
  • pnpm test src/infra/heartbeat-runner.ghost-reminder.test.ts src/infra/heartbeat-wake.test.ts
  • pnpm exec oxfmt --check --threads=1 CHANGELOG.md src/cron/service/timer.ts src/cron/service/timer.regression.test.ts src/cron/service.main-job-passes-heartbeat-target-last.test.ts
  • pnpm check:changed

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/cron/service.main-job-passes-heartbeat-target-last.test.ts (modified, +3/-2)
  • src/cron/service/timer.regression.test.ts (modified, +12/-15)
  • src/cron/service/timer.ts (modified, +2/-6)

PR #72677: fix(cron): warn on main heartbeat handoff ghost runs

Description (problem / solution / changelog)

Fixes #63106.

Summary

  • add a scoped warning for fast ok main-session systemEvent jobs using wakeMode=next-heartbeat
  • annotate cron run logs with possible-main-next-heartbeat-ghost-run for that handoff path
  • add cron.ghostRunWarningThresholdMs config with schema/help metadata

Tests

  • node --no-maglev node_modules/vitest/vitest.mjs run --config test/vitest/vitest.gateway.config.ts src/gateway/server-cron.test.ts
  • node --no-maglev node_modules/vitest/vitest.mjs run --config test/vitest/vitest.config.ts src/config/zod-schema.cron-retention.test.ts src/config/schema.base.generated.test.ts src/config/schema.help.quality.test.ts src/cron/run-log.test.ts
  • node --import tsx scripts/generate-base-config-schema.ts --check
  • node_modules/.bin/oxlint --tsconfig tsconfig.oxlint.core.json src/gateway/server-cron.ts src/gateway/server-cron.test.ts src/config/types.cron.ts src/config/zod-schema.ts src/config/schema.help.ts src/config/schema.labels.ts src/cron/run-log.ts

Note: local tsgo checks currently fail before this change on unrelated dependency/type issues (missing typebox and existing model compat errors).

Changed files

  • extensions/discord/src/voice/manager.e2e.test.ts (modified, +1/-3)
  • extensions/line/src/monitor.lifecycle.test.ts (modified, +1/-1)
  • src/channels/plugins/setup-wizard-helpers.ts (modified, +3/-3)
  • src/config/schema.base.generated.ts (modified, +13/-0)
  • src/config/schema.help.ts (modified, +2/-0)
  • src/config/schema.labels.ts (modified, +1/-0)
  • src/config/types.cron.ts (modified, +5/-0)
  • src/config/zod-schema.agent-defaults.ts (modified, +1/-3)
  • src/config/zod-schema.ts (modified, +1/-0)
  • src/cron/run-log.ts (modified, +10/-0)
  • src/gateway/server-cron.test.ts (modified, +129/-0)
  • src/gateway/server-cron.ts (modified, +66/-1)
  • src/plugins/provider-validation.ts (modified, +4/-3)

Code Example

if (heartbeatResult.status !== "skipped" || !isRetryableHeartbeatBusySkipReason(heartbeatResult.reason)) break;
if (isRecurringJob || heartbeatResult.reason === "cron-in-progress") {
    state.deps.requestHeartbeatNow({ ... });
    return { status: "ok", summary: text };  // ← ghost
}

---

- if (isRecurringJob || heartbeatResult.reason === "cron-in-progress") {
+ if (heartbeatResult.reason === "cron-in-progress") {
      state.deps.requestHeartbeatNow({ ... });
      return { status: "ok", summary: text };
  }
RAW_BUFFERClick to expand / collapse

Summary

executeMainSessionCronJob in src/cron/service/timer.ts has an early-exit path that returns status: ok after fire-and-forgetting a heartbeat wake when:

  1. wakeMode === "now"
  2. The first runHeartbeatOnce call returns skipped with a retryable busy reason (requests-in-flight, cron-in-progress, or lanes-busy)
  3. The job is recurring (schedule.kind !== "at")

In that case the cron records status: ok with durationMs ~16ms even though the agent never processed the cron event. This is functionally a "ghost run."

This is distinct from #73189 (closed): that fix made buildCronEventPrompt work when the heartbeat does run. This bug is about the case where the heartbeat is skipped and a follow-up wake is fire-and-forgotten — requestHeartbeatNow is not awaited, and the queued cron event is not guaranteed to be consumed.

Environment

  • OpenClaw 2026.4.29 (also reproduces on 4.27, 4.28 per code inspection)
  • Linux, Node v25.6.1
  • Cron job: sessionTarget: "main", payload.kind: "systemEvent", wakeMode: "now", schedule.kind: "cron" (i.e., recurring)

Evidence (real workload)

Daily 8 PM PT debrief cron, run history (last ~3 weeks). Config has been stable the whole time (main + systemEvent + wakeMode:"now").

DateStatusdur (ms)deliverynotes
Apr 12 (config switched to main+systemEvent)ok12314not-requestedfirst run on new shape
Apr 19 20:35ok14not-requestedLATE-FIRE recovery (different)
Apr 20–22ok8–11knot-requestedOK
Apr 23 20:20ok17not-requestedLATE-FIRE recovery (different)
Apr 24–30ok5–9knot-requestedOK
May 01 20:00ok16not-requestedon-time GHOST — this bug

18 of 19 on-time runs since Apr 12 produced full model turns. The May 01 ghost fired exactly on schedule, so it is not a late-fire-recovery skip (those legitimately short-circuit). The session trajectory at 20:00:27 shows the heartbeat ran with prompt [OpenClaw heartbeat poll] (i.e., the bare-poll fallback), not the expected cron-event provider with the payload text.

Steps to reproduce

  1. Configure a recurring cron with sessionTarget: "main", payload.kind: "systemEvent", wakeMode: "now", schedule.kind: "cron".
  2. Arrange for another heartbeat to be in-flight at the cron's scheduled time (e.g., a 6h cadence main-agent heartbeat that lands within the same window; common in real deployments).
  3. The first runHeartbeatOnce call from executeMainSessionCronJob returns skipped with reason: "requests-in-flight" (or "lanes-busy").
  4. The early-exit fires: requestHeartbeatNow(...) (fire-and-forget) + return { status: "ok", summary: text }.
  5. The follow-up heartbeat may or may not actually run before the queued cron event is consumed by something else; if it does not, the cron status: ok but no agent turn occurred.

Code analysis

Offending branch (4.29 dist/hook-client-ip-config-BCNYpeHn.js, the bundled executeMainSessionCronJob):

if (heartbeatResult.status !== "skipped" || !isRetryableHeartbeatBusySkipReason(heartbeatResult.reason)) break;
if (isRecurringJob || heartbeatResult.reason === "cron-in-progress") {
    state.deps.requestHeartbeatNow({ ... });
    return { status: "ok", summary: text };  // ← ghost
}

The isRecurringJob || short-circuit means recurring jobs never enter the retry-with-wait loop that one-shot jobs use. They always fire-and-forget on busy.

Proposed fix

Drop the isRecurringJob || from the condition so recurring jobs use the same retry loop as one-shot jobs. The existing 2-minute cap (wakeNowHeartbeatBusyMaxWaitMs ?? 2 * 6e4) preserves the fire-and-forget fallback for the genuinely-stuck case. Worst case = same as today's behavior; best case = the busy heartbeat finishes within the cap and the cron actually delivers.

- if (isRecurringJob || heartbeatResult.reason === "cron-in-progress") {
+ if (heartbeatResult.reason === "cron-in-progress") {
      state.deps.requestHeartbeatNow({ ... });
      return { status: "ok", summary: text };
  }

Related

  • #73189 (closed; fixed payload-injection in 4.27, but doesn't cover the retryable-busy-skip code path)
  • #63106 (open; ghost-run detection — would have surfaced this as a warning)

extent analysis

TL;DR

The proposed fix is to drop the isRecurringJob || condition in the executeMainSessionCronJob function to ensure recurring jobs use the same retry loop as one-shot jobs.

Guidance

  • Review the executeMainSessionCronJob function in src/cron/service/timer.ts to understand the current implementation and the proposed fix.
  • Apply the proposed fix by removing the isRecurringJob || condition from the if statement, as shown in the provided diff.
  • Verify that the fix resolves the issue by checking the cron job history and ensuring that the ghost runs are no longer occurring.
  • Test the fix with different scenarios, including recurring and one-shot jobs, to ensure that it works as expected.

Example

The proposed fix can be applied by changing the following code:

- if (isRecurringJob || heartbeatResult.reason === "cron-in-progress") {
+ if (heartbeatResult.reason === "cron-in-progress") {
      state.deps.requestHeartbeatNow({ ... });
      return { status: "ok", summary: text };
  }

Notes

The proposed fix assumes that the wakeNowHeartbeatBusyMaxWaitMs cap is sufficient to prevent infinite waiting in case of a genuinely stuck heartbeat. Additionally, this fix may not address other potential issues related to ghost runs, such as those detected by #63106.

Recommendation

Apply the proposed workaround by dropping the isRecurringJob || condition, as it is a targeted fix that addresses the specific issue described in the problem. This change ensures that recurring jobs use the same retry loop as one-shot jobs, which should prevent ghost runs from occurring.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Recurring main-session cron jobs ghost (status:ok, durationMs ~16ms) when first heartbeat hits a retryable busy skip — early-exit fire-and-forgets without consuming queued cron event [2 pull requests, 1 comments, 2 participants]