openclaw - ✅(Solved) Fix [Bug]: Isolated cron sessions falsely report status:ok despite incomplete work [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#65950Fetched 2026-04-14 05:39:36
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Author
Participants
Timeline (top)
referenced ×3labeled ×2cross-referenced ×1

Isolated cron sessions report status: ok and delivered: true in the cron run history even when the agent never completes its intended workflow.

Root Cause

Root Cause Hypothesis

Infrastructure wall-clock timeout (~2 min) is independent of timeoutSeconds: 900. Session is killed, agent hasn't finished, but parent saw first message and marks ok.

Fix Action

Fixed

PR fix notes

PR #65988: fix: report error when cron subagents produce no final output

Description (problem / solution / changelog)

Summary

Fixes #65950 — Isolated cron sessions falsely report status: ok despite incomplete work.

Root cause: In finalizeTextDelivery, when descendant subagents have been observed but die without producing output (e.g., killed by infrastructure timeout), the code falls into a branch that returns status: "ok" with deliveryAttempted: true. The branch's intended semantics were "subagents are still running, don't deliver yet" — but at that point activeSubagentRuns === 0, meaning the subagents already finished or died. The distinction between "still running" and "died without output" was missing.

Note: isAborted() is NOT the right signal here — OpenClaw's own 60-minute timeout hasn't triggered. The subagents were killed by the infrastructure layer (Docker/VPS wall-clock limit), which does not set the abort controller.

Changes

  • src/cron/isolated-agent/delivery-dispatch.ts — The hadDescendants + interim text + zero active subagents branch now returns status: "error" instead of status: "ok". The activeSubagentRuns > 0 branch above it is unchanged (still-running = ok is correct).
  • src/cron/isolated-agent/run.ts — After the continuation prompt, if output is still interim, sets runResult.meta.error so the error propagates through resolveCronPayloadOutcome / hasFatalErrorPayload naturally.

Tests

  • src/cron/isolated-agent/delivery-dispatch.subagent-died.test.ts (new) — 6 tests: dead subagents → error; still-running subagents → ok; subagents with output → delivers; fallback reply → delivers; multiple dead subagents → error
  • src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts (updated) — existing stale-interim test now asserts status: "error" and the error message

All 516 cron tests pass.

Note on infra alignment

This fix handles the observable symptom correctly. As a separate operational recommendation: Docker/VPS wall-clock timeouts should be set to timeoutSeconds + buffer to give OpenClaw's own timeout mechanism a chance to exit cleanly before the infra kills the process.

AI Disclosure

  • AI-assisted (Claude Code via OpenClaw)
  • Fully tested (516 cron tests pass, build succeeds)
  • Root cause verified by tracing complete call chain through source
  • Understands what the code does — semantic fix to status judgment, not an abort-path patch

Changed files

  • src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts (modified, +11/-2)
  • src/cron/isolated-agent/delivery-dispatch.subagent-died.test.ts (added, +287/-0)
  • src/cron/isolated-agent/delivery-dispatch.ts (modified, +6/-4)
  • src/cron/isolated-agent/run-executor.ts (modified, +29/-0)

Code Example

- Subagent run 2026-04-12 12:52 UTC: 3,914 output tokens, timed out after 1m59s — only produced opening message
- Cron run 2026-04-12 09:30 UTC: 591 output tokens, status `ok`, delivery `delivered` — same pattern
- Cron run 2026-04-12 17:30 UTC: 438 output tokens, status `ok`, delivery `delivered` — same pattern
- Manual in-session execution of the same workflow: completes successfully in ~8 minutes with full output
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Isolated cron sessions report status: ok and delivered: true in the cron run history even when the agent never completes its intended workflow.

Steps to reproduce

  1. Create a cron job with sessionTarget: "isolated" and a long-running agentTurn payload (~6 parallel web searches + file writes + SMTP send)
  2. Set the job to run daily at a fixed time
  3. Observe the run history — status shows ok, delivery shows delivered, but actual output tokens are only 400-600 (a partial opening message), not the full intended output

Expected behavior

If the isolated session times out or is killed before completing the workflow, the run status should reflect a failure or at minimum delivered: false.

Actual behavior

The parent session marks the run as status: ok and delivered: true based solely on receiving the first transmission from the child (a single "👀 Starting..." message). The actual work (web searches, email, Telegram) never completes.

OpenClaw version

2026.4.8

Operating system

Ubuntu

Install method

docker

Model

minimax M2.7

Provider / routing chain

minimax

Additional provider/model setup details

Logs, screenshots, and evidence

- Subagent run 2026-04-12 12:52 UTC: 3,914 output tokens, timed out after 1m59s — only produced opening message
- Cron run 2026-04-12 09:30 UTC: 591 output tokens, status `ok`, delivery `delivered` — same pattern
- Cron run 2026-04-12 17:30 UTC: 438 output tokens, status `ok`, delivery `delivered` — same pattern
- Manual in-session execution of the same workflow: completes successfully in ~8 minutes with full output

Impact and severity

Cron jobs appear to succeed but don't deliver output. Users trusting status: ok miss failures.

Additional information

Root Cause Hypothesis

Infrastructure wall-clock timeout (~2 min) is independent of timeoutSeconds: 900. Session is killed, agent hasn't finished, but parent saw first message and marks ok.

extent analysis

TL;DR

Adjust the infrastructure wall-clock timeout to match or exceed the timeoutSeconds value to ensure accurate cron job status reporting.

Guidance

  • Review the current infrastructure configuration to identify where the wall-clock timeout is set and adjust it to be at least as long as the timeoutSeconds value (900 seconds in this case).
  • Verify that the adjusted timeout allows the agent to complete its workflow within the allotted time, ensuring that the cron job status accurately reflects the outcome.
  • Consider implementing additional logging or monitoring to detect cases where the agent is killed due to timeouts, providing clearer insights into job failures.
  • Evaluate the workflow's performance and optimize it if necessary to complete within the desired timeframe, reducing the likelihood of timeouts.

Example

No code snippet is provided as the issue is related to configuration and infrastructure rather than code.

Notes

The solution assumes that the infrastructure wall-clock timeout can be adjusted. If this is not possible, alternative solutions such as rearchitecting the workflow or using a different execution model may be necessary.

Recommendation

Apply workaround: Adjust the infrastructure timeout to match or exceed the timeoutSeconds value, as this directly addresses the identified root cause and should resolve the issue of inaccurate cron job status reporting.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If the isolated session times out or is killed before completing the workflow, the run status should reflect a failure or at minimum delivered: false.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING