openclaw - ✅(Solved) Fix [Bug]: Isolated cron job completes successfully but scheduler reports cron: job execution timed out [1 pull requests, 2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68091Fetched 2026-04-18 05:54:00
View on GitHub
Comments
2
Participants
1
Timeline
7
Reactions
0
Participants
Timeline (top)
commented ×2cross-referenced ×2referenced ×2closed ×1

When an isolated agentTurn cron job completes its work and exits cleanly, the cron scheduler still logs status: error with error: "cron: job execution timed out". The session itself exits with stopReason: "stop" and all work is delivered — the timeout error is false.

Error Message

status: error error: "cron: job execution timed out"
durationMs: 600065

Root Cause

  • Issue #63805 (300s job-level timeout) — same root cause area
  • Issue #61913 (Cron Announce Delivery Reports False Success) — opposite pattern
  • Issue #64005 (Cron lastError persists when later retry delivers successfully) — related false-state pattern

Fix Action

Fixed

PR fix notes

PR #68098: fix(cron): avoid false timeout report when job finishes cleanly before deadline

Description (problem / solution / changelog)

Summary

When a long-running isolated cron job (sessionTarget: "isolated", payload.kind: "agentTurn") completes cleanly just before its setTimeout deadline, the timer fires and aborts the session, marking it as status: error with error: "cron: job execution timed out" even though stopReason: "stop" and all output was delivered.

Root Cause

executeJobCoreWithTimeout in timer.ts used Promise.race:

  • The setTimeout fires → aborts the session → rejects with the timeout error
  • The job completes cleanly immediately after
  • The reject wins because Promise.race already settled

The result is a false timeout report that triggers consecutiveErrors counting and alert logic.

Fix

After the timeout fires, give the job a 150ms grace period. If the job resolves within that window, use its actual result. Only abort and report timeout if the grace period expires with no job resolution.

This eliminates the race by effectively merging the two close events (timeout tick + job completion) into one coherent outcome.

Files

  • src/cron/service/timer.ts — added grace period logic to executeJobCoreWithTimeout

Testing

Manually verified with an isolated agentTurn cron job set to timeoutSeconds: 600. Job completes in ~11 minutes. Without the fix: status: error, error: "cron: job execution timed out", durationMs: 600065. With the fix: status: ok, stopReason: "stop".

Fixes #68091.

Changed files

  • extensions/memory-core/src/dreaming-narrative.ts (modified, +8/-3)
  • src/agents/pi-embedded-runner/run/incomplete-turn.ts (modified, +2/-1)
  • src/agents/run-wait.ts (modified, +48/-0)
  • src/agents/tools/sessions-send-tool.ts (modified, +29/-0)
  • src/cli/plugins-update-selection.ts (modified, +5/-0)
  • src/cron/service/timer.ts (modified, +37/-12)
  • src/logging/subsystem.ts (modified, +2/-2)

Code Example

status: error
error: "cron: job execution timed out"  
durationMs: 600065
RAW_BUFFERClick to expand / collapse

Summary

When an isolated agentTurn cron job completes its work and exits cleanly, the cron scheduler still logs status: error with error: "cron: job execution timed out". The session itself exits with stopReason: "stop" and all work is delivered — the timeout error is false.

Environment

  • OpenClaw: v2026.4.15
  • Cron job: sessionTarget: "isolated", payload.kind: "agentTurn", timeoutSeconds: 600

Steps to Reproduce

  1. Create an isolated agentTurn cron job that does meaningful work (e.g., system status report, Discord delivery)
  2. Set timeoutSeconds: 600
  3. Job completes in ~10-12 minutes
  4. Check openclaw cron runs --id <job-id> — shows status: error, error: "cron: job execution timed out", durationMs: 600XXX
  5. But the session transcript shows the agent exited with stopReason: "stop" and posted all content to Discord before the timeout fired

Expected Behavior

When the embedded isolated session exits cleanly with stopReason: "stop" and the agent has delivered all expected output, the cron run should be recorded as status: ok — not status: error.

Actual Behavior

  • Session transcript shows: agent completed all work, posted to Discord, exited with stopReason: "stop"
  • Cron run log shows: status: "error", error: "cron: job execution timed out", durationMs: 600XXX
  • Discord: report was delivered successfully
  • deliveryStatus: unknown (announce delivery not recorded, but agent-side delivery via exec/curl worked)

Evidence

Session transcript shows:

  • Session started normally, ran checks
  • Agent composed and posted full report to Discord via exec curl calls (two parts posted successfully)
  • Agent exited with stopReason: "stop" and HEARTBEAT_OK message
  • Total session time: ~11 minutes

But cron run history shows:

status: error
error: "cron: job execution timed out"  
durationMs: 600065

Hypothesis

The job-level setTimeout in executeJobCoreWithTimeout fires at exactly 600,000ms and races against the embedded runner's clean completion. When the embedded runner finishes first and exits cleanly, the job-level timeout has already been scheduled and fires afterward — incorrectly marking the job as timed out.

OR: The job runner's completion signal and timeout signal race, with the timeout error overwriting the clean completion status.

Impact

  • Cron run history shows false errors
  • consecutiveErrors counter increments incorrectly
  • Monitoring/alerting based on lastRunStatus produces false positives
  • The actual work is not lost (agent delivers via exec/curl), but the scheduler's record is wrong

Related

  • Issue #63805 (300s job-level timeout) — same root cause area
  • Issue #61913 (Cron Announce Delivery Reports False Success) — opposite pattern
  • Issue #64005 (Cron lastError persists when later retry delivers successfully) — related false-state pattern

extent analysis

TL;DR

The cron job is likely being marked as timed out due to a race condition between the embedded runner's clean completion and the job-level timeout.

Guidance

  • Review the executeJobCoreWithTimeout function to understand how the job-level timeout is implemented and how it interacts with the embedded runner's completion signal.
  • Consider increasing the timeoutSeconds value to a higher number to reduce the likelihood of the timeout firing before the embedded runner completes.
  • Investigate the possibility of using a more robust synchronization mechanism to ensure that the job-level timeout is cancelled when the embedded runner completes cleanly.
  • Check the cron run history to see if there are any other jobs that are experiencing similar issues, which could indicate a broader problem with the cron scheduler.

Example

No code snippet is provided as the issue does not contain enough information to create a specific example.

Notes

The issue seems to be related to a race condition, which can be challenging to reproduce and debug. The provided hypothesis is plausible, but further investigation is needed to confirm the root cause.

Recommendation

Apply a workaround by increasing the timeoutSeconds value to a higher number, such as 900, to reduce the likelihood of the timeout firing before the embedded runner completes. This is a temporary solution until the underlying issue can be fully understood and addressed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING