openclaw - ✅(Solved) Fix cron: ghost runs recorded as ok when gateway is down (durationMs < 50ms) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63106Fetched 2026-04-09 07:58:30
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1referenced ×1

Error Message

When the OpenClaw gateway goes down (e.g. due to a config error or model routing failure), cron jobs configured with sessionTarget: "main" continue to fire on schedule and record status: "ok" in the run log. The runs complete in 1–8 ms (durationMs: 1) because executeMainSessionCronJob returns immediately after calling enqueueSystemEvent + requestHeartbeatNow — no actual agent turn is awaited. 2. Bring the gateway into a state where the main session cannot process requests (e.g. invalid model config, provider auth error) Add a ghost-run detector in the onEvent handler in src/gateway/server-cron.ts: when a finished event has status: "ok" and durationMs < GHOST_RUN_THRESHOLD_MS (e.g. 50 ms) for a job where sessionTarget !== "none" and payload.kind === "systemEvent", log a warning and/or record the run with an additional warn flag or summary note so operators can identify silent failures.

Root Cause

In src/cron/service/timer.ts, executeMainSessionCronJob enqueues a system event and returns { status: "ok" } without waiting for the agent to actually process the heartbeat. If the gateway or agent session is unhealthy, the event is silently dropped and the cron run is still recorded as ok.

The wakeMode: "now" path (runHeartbeatOnce) at least checks the heartbeat result, but wakeMode: "next-heartbeat" (the default for main-session jobs) has no health check at all.

Fix Action

Fixed

PR fix notes

PR #63111: fix(cron): detect ghost runs on main-session systemEvent jobs

Description (problem / solution / changelog)

Summary

  • When the OpenClaw gateway is unhealthy, cron jobs using sessionTarget: "main" + payload.kind: "systemEvent" complete in < 50 ms and are recorded as status: ok even though no agent turn was executed
  • Adds a GHOST_RUN_THRESHOLD_MS = 50 constant and a structured warn log in the onEvent handler in src/gateway/server-cron.ts when a run completes suspiciously fast
  • The warning includes jobId, durationMs, sessionTarget, payloadKind, and ghostRunThresholdMs so operators can identify silent failures

Fixes #63106

Root Cause

executeMainSessionCronJob in src/cron/service/timer.ts calls enqueueSystemEvent() and returns { status: "ok" } immediately without waiting for the agent to process the heartbeat. When the gateway is down or the agent session is unhealthy, the event is silently dropped but the run is still logged as ok.

Changes

  • src/gateway/server-cron.ts: Added GHOST_RUN_THRESHOLD_MS constant and ghost-run detection warning in the onEvent / appendCronRunLog path

Test plan

  • With a healthy gateway: main-session cron job runs produce no warning
  • With a downed gateway (e.g. invalid model config): runs that complete in < 50 ms emit cron: possible ghost run detected in the log with the structured fields
  • Existing cron service tests pass unchanged

🤖 Generated with Claude Code

Changed files

  • src/gateway/server-cron.ts (modified, +31/-0)
RAW_BUFFERClick to expand / collapse

Problem

When the OpenClaw gateway goes down (e.g. due to a config error or model routing failure), cron jobs configured with sessionTarget: "main" continue to fire on schedule and record status: "ok" in the run log. The runs complete in 1–8 ms (durationMs: 1) because executeMainSessionCronJob returns immediately after calling enqueueSystemEvent + requestHeartbeatNow — no actual agent turn is awaited.

There is no alerting, no health check failure, and no indication in the log that the jobs are not actually executing. The cron scheduler marks runs as successful even though no agent turn occurred and no script was run.

Reproduction:

  1. Configure a cron job with sessionTarget: "main", payload.kind: "systemEvent", wakeMode: "next-heartbeat"
  2. Bring the gateway into a state where the main session cannot process requests (e.g. invalid model config, provider auth error)
  3. Observe run log: status: "ok", durationMs: 1

Root Cause

In src/cron/service/timer.ts, executeMainSessionCronJob enqueues a system event and returns { status: "ok" } without waiting for the agent to actually process the heartbeat. If the gateway or agent session is unhealthy, the event is silently dropped and the cron run is still recorded as ok.

The wakeMode: "now" path (runHeartbeatOnce) at least checks the heartbeat result, but wakeMode: "next-heartbeat" (the default for main-session jobs) has no health check at all.

Proposed Fix

Add a ghost-run detector in the onEvent handler in src/gateway/server-cron.ts: when a finished event has status: "ok" and durationMs < GHOST_RUN_THRESHOLD_MS (e.g. 50 ms) for a job where sessionTarget !== "none" and payload.kind === "systemEvent", log a warning and/or record the run with an additional warn flag or summary note so operators can identify silent failures.

At a minimum, the ghost-run check should:

  • Apply only to sessionTarget: "main" + payload.kind: "systemEvent" + wakeMode: "next-heartbeat" jobs (the fast-return code path)
  • Not treat legitimately fast jobs (e.g. no-op system events) as errors — use a configurable threshold (default 50 ms)
  • Emit a structured log warning that can be surfaced in openclaw doctor or cron logs

See src/cron/service/timer.tsexecuteMainSessionCronJob and src/gateway/server-cron.tsonEvent handler.

Acceptance Criteria

  • When a main-session cron job completes in < 50 ms, a warning is logged (cron: possible ghost run detected)
  • The warning includes jobId, durationMs, sessionTarget, and payloadKind
  • The behavior is gated behind a threshold that can be adjusted via cron config
  • Existing tests for executeMainSessionCronJob are not broken

extent analysis

TL;DR

Implement a ghost-run detector in the onEvent handler to identify and log warnings for potentially silent cron job failures.

Guidance

  • Review the executeMainSessionCronJob function in src/cron/service/timer.ts to understand the current behavior and identify areas for improvement.
  • Add a ghost-run detector in the onEvent handler in src/gateway/server-cron.ts to log warnings for jobs with status: "ok" and durationMs < GHOST_RUN_THRESHOLD_MS.
  • Configure the threshold value (e.g., 50 ms) to distinguish between legitimate fast jobs and potential ghost runs.
  • Ensure the warning log includes relevant details such as jobId, durationMs, sessionTarget, and payloadKind for easier debugging.

Example

// src/gateway/server-cron.ts
const GHOST_RUN_THRESHOLD_MS = 50;

// ...

onEvent(event => {
  if (event.status === 'ok' && event.durationMs < GHOST_RUN_THRESHOLD_MS && event.sessionTarget !== 'none' && event.payload.kind === 'systemEvent') {
    console.warn(`cron: possible ghost run detected - jobId: ${event.jobId}, durationMs: ${event.durationMs}, sessionTarget: ${event.sessionTarget}, payloadKind: ${event.payload.kind}`);
  }
});

Notes

The proposed fix focuses on detecting and logging potential ghost runs, but it may not prevent them entirely. Additional changes to the executeMainSessionCronJob function or the cron job scheduling mechanism might be necessary to fully address the issue.

Recommendation

Apply the workaround by implementing the ghost-run detector, as it provides a way to identify and log potential silent failures, allowing for further investigation and debugging.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING