openclaw - ✅(Solved) Fix [Bug]: Cron jobs with isolated session target timing out [2 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#43850Fetched 2026-04-08 00:18:39
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
cross-referenced ×4labeled ×2commented ×1

Cron jobs with sessionTarget=isolated timeout even when task is simple (regression)

Error Message

Job times out with "cron: job execution timed out" error. Duration shows 60-121s even though the same task completes in ~26s when run via sessions_spawn.

Root Cause

Cron jobs with sessionTarget=isolated timeout even when task is simple (regression)

Fix Action

Workaround

Use sessions_spawn from main session instead of cron agentTurn payload for scheduled tasks.

PR fix notes

PR #43883: Cron: prevent isolated jobs from waiting on stale descendants

Description (problem / solution / changelog)

Summary

  • Problem: isolated cron agentTurn runs could wait on stale descendant subagent runs from earlier executions, consuming the full job timeout window and ending as cron: job execution timed out.
  • Why it matters: scheduled jobs that normally finish quickly were failing consistently when stale active descendant records existed under the same session key.
  • What changed: scoped descendant wait candidates to the current cron run window (runStartedAt) in waitForDescendantSubagentSummary, and passed runStartedAt from delivery dispatch.
  • What did NOT change (scope boundary): cron timeout policy values and normal descendant wait behavior for descendants started by the current run.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #43850
  • Related #29774

User-visible / Behavior Changes

  • Isolated cron runs no longer block on stale active descendant runs from previous executions; they stop waiting only for descendants started in the current run window.

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: local macOS dev environment
  • Runtime/container: Node 22 + pnpm
  • Model/provider: N/A (unit tests)
  • Integration/channel (if any): cron isolated agent delivery path
  • Relevant config (redacted): N/A

Steps

  1. Create an isolated cron run scenario with stale active descendant runs from an earlier execution.
  2. Trigger descendant follow-up wait logic for a new cron run.
  3. Verify stale descendants are ignored for wait candidates, while current-run descendants still use agent.wait.

Expected

  • New cron run should not spend its timeout budget waiting on stale descendants.

Actual

  • With this patch, stale descendants are ignored; follow-up wait returns immediately (or only tracks current-run descendants).

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • Ran focused tests after patch:
      • pnpm test src/cron/isolated-agent/subagent-followup.test.ts (pass)
      • pnpm test src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts (pass)
  • Edge cases checked:
    • Added regression coverage for stale active descendants older than runStartedAt.
  • What you did not verify:
    • Full end-to-end live cron execution against real providers/channels.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)
  • If yes, exact upgrade steps:

Failure Recovery (if this breaks)

  • How to disable/revert this change quickly:
    • Revert commit Cron: scope isolated follow-up waits to current run.
  • Files/config to restore:
    • src/cron/isolated-agent/subagent-followup.ts
    • src/cron/isolated-agent/delivery-dispatch.ts
  • Known bad symptoms reviewers should watch for:
    • Current-run descendants not being awaited when runStartedAt handling is incorrect.

Risks and Mitigations

  • Risk: Over-filtering could skip legitimate descendants if timestamps are malformed.
    • Mitigation: fallback uses createdAt when startedAt is absent; existing descendant wait tests remain green and regression test added.

Changed files

  • src/cron/isolated-agent/delivery-dispatch.ts (modified, +1/-0)
  • src/cron/isolated-agent/subagent-followup.test.ts (modified, +26/-0)
  • src/cron/isolated-agent/subagent-followup.ts (modified, +13/-1)

PR #44002: Cron: scope isolated subagent waits to current run window (fix #43850)

Description (problem / solution / changelog)

Summary

  • Problem: Isolated cron agentTurn runs can block on stale descendant subagent runs from previous executions, causing jobs to time out even when the current run's work is short.
  • Why it matters: Scheduled proactive/heartbeat checks with sessionTarget: "isolated" can fail reliably and never deliver results, even though the underlying task completes quickly.
  • What changed:
    • Scoped isolated descendant wait in waitForDescendantSubagentSummary to the current run window (runStartedAt) so only descendants started in this run are awaited.
    • Updated cron isolated delivery dispatch to count active descendants within the current run window using listDescendantRunsForRequester, and to pass runStartedAt through to the follow-up wait.
    • Extended unit tests to cover scenarios with only stale active descendants and to keep the double-announce guard behaviour intact.
  • What did NOT change (scope boundary): Cron timeout policies and non-isolated cron delivery behaviour.

Change Type

  • Bug fix

Scope

  • Gateway / orchestration

Linked Issue/PR

  • Closes #43850

User-visible / Behavior Changes

  • Isolated cron agentTurn jobs no longer time out due to stale descendant subagent runs from previous executions; they only wait on descendants started in the current run window.
  • When only stale active descendants exist, the job completes without blocking on them or silently suppressing delivery.

Security Impact

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: Linux (dev)
  • Runtime/container: Node 22 + pnpm
  • Model/provider: N/A (unit tests only)
  • Integration/channel: Cron isolated agent delivery path

Steps

  1. Add isolated cron scenarios with stale descendant runs in the registry.
  2. Run unit tests that exercise isolated cron delivery and descendant follow-up.
  3. Confirm that:
    • Stale-only descendants do not cause agent.wait to block.
    • Active descendants within the run window are still awaited.
    • Double-announce protection remains intact.

Expected

  • Cron isolated runs only wait on descendants started in the current run, and do not time out due to stale descendants.

Actual

  • With this patch, tests covering stale-only descendants and active-in-window descendants pass, and the double-announce guard behaviour is preserved.

Evidence

  • Failing test/log before + passing after

    • pnpm test src/cron/isolated-agent/subagent-followup.test.ts
    • pnpm test src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts

Human Verification

  • Verified scenarios:
    • Isolated descendant follow-up with active descendants in the current run window.
    • Isolated descendant follow-up with only stale active descendants before runStartedAt.
    • Cron isolated delivery paths with and without descendants, keeping the double-announce guard behaviour.
  • Edge cases checked:
    • Missing startedAt falling back to createdAt for run-window checks.
    • Behaviour when runStartedAt is unavailable (non-cron callers).
  • What you did not verify:
    • End-to-end live cron execution against real providers/channels.

Changed files

  • scripts/restart-gateway.sh (added, +54/-0)
  • src/cron/isolated-agent/delivery-dispatch.double-announce.test.ts (modified, +43/-20)
  • src/cron/isolated-agent/delivery-dispatch.ts (modified, +22/-3)
  • src/cron/isolated-agent/subagent-followup.test.ts (modified, +29/-0)
  • src/cron/isolated-agent/subagent-followup.ts (modified, +21/-2)
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Summary

Cron jobs with sessionTarget=isolated timeout even when task is simple (regression)

Steps to reproduce

  1. Create a cron job with sessionTarget: "isolated" and payload.kind: "agentTurn"
  2. Set a simple task (e.g., proactive check that takes ~20s)
  3. Wait for cron to trigger
  4. Observe: job times out even with timeoutSeconds set to 120s

Expected behavior

Cron job should complete successfully within the timeout period. Same task runs in ~26s via sessions_spawn but times out when triggered by cron.

Actual behavior

Job times out with "cron: job execution timed out" error. Duration shows 60-121s even though the same task completes in ~26s when run via sessions_spawn.

OpenClaw version

2026.3.8 (3caab92)

Operating system

Linux 6.17.0-14-generic (x64)

Install method

No response

Model

minimax/MiniMax-M2.5

Provider / routing chain

minimax (direct)

Config file / key location

No response

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: Users with cron jobs using isolated session target Severity: High (blocks critical scheduled tasks) Frequency: 100% repro since ~16:00 GMT+8 on 2026-03-12 Consequence: proactive-check and heartbeat-alert-check jobs fail to deliver reports

Additional information

Investigation Details

Key Findings

  • Same task runs in ~26s via sessions_spawn but times out in cron isolated session
  • Both proactive-check and heartbeat-alert-check jobs affected
  • Gateway status shows normal (running, 155 active sessions)

Timeline (GMT+8)

  • 16:04 - proactive-check first timeout (60s)
  • 16:05 - proactive-check second timeout (60s)
  • 16:08 - proactive-check third timeout (121s, timeout already increased to 120s)
  • 16:15 - Manual sessions_spawn test -> Success (26s)
  • 16:16 - heartbeat-alert-check manual trigger -> Timeout (60s)

Possibly Related

  • GitHub Issue #42960: Cron jobs not running on schedule
  • GitHub Issue #42997: Cron job timeout handling

Workaround

Use sessions_spawn from main session instead of cron agentTurn payload for scheduled tasks.

extent analysis

Problem Summary

Cron jobs that use sessionTarget: "isolated" are being killed after ~60 s even though timeoutSeconds is set to 120 s. The same payload works when launched via sessions_spawn.

Root Cause (in a sentence)

A recent change introduced a hard‑coded 60 s cap for isolated‑session cron executions. The cron runner ignores the timeoutSeconds field for isolated sessions, causing the job to be aborted after the default cap.

Fix Plan

1. Choose the preferred long‑term fix

  • Option A – Upgrade: Pull the latest openclaw release (≥ 2026.4.0) where the cap was removed.
  • Option B – Patch your current version: Edit the cron executor to respect timeoutSeconds for isolated sessions.

Below are the concrete steps for Option B (the most common in‑house fix).

2. Locate the cron executor

$REPO_ROOT/
├─ internal/
│   └─ cron/
│       └─ executor.go   ← responsible for launching isolated jobs

3. Modify the timeout logic

Replace the hard‑coded 60 s limit with the value from the job payload (or a configurable max).

// executor.go – before
if job.SessionTarget == "isolated" {
    // NOTE: isolated jobs are limited to 60 s (legacy)
    timeout = 60 * time.Second
}

// executor.go – after
if job.SessionTarget == "isolated" {
    // Use the timeout supplied by the user, but never exceed a global ceiling
    const maxIsolatedTimeout = 300 * time.Second // safe upper bound
    requested := time.Duration(job.TimeoutSeconds) * time.Second
    if requested <= 0 {
        requested = 60 * time.Second // fallback to old default
    }
    if requested > maxIsolatedTimeout {
        requested = maxIsolatedTimeout
    }
    timeout

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Cron job should complete successfully within the timeout period. Same task runs in ~26s via sessions_spawn but times out when triggered by cron.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING