openclaw - 💡(How to fix) Fix Cron outer timeout should emit lifecycle.error so sessions.json finalizes immediately [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70347Fetched 2026-04-23 07:25:56
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

Follow-up to #48488 and the fix that landed in #67785.

That work fixed the dispatch-ghost / cross-session lane problem, but there is still a session-finalization gap when a cron's outer wall timer fires after the session has already started.

The timeout path aborts the run and records cron: job execution timed out, but it does not emit the terminal lifecycle event that the gateway already uses to persist session end/error state. As a result, the session store is not finalized synchronously and host-side watchdog logic has to repair it later.

Error Message

Cron outer timeout should emit lifecycle.error so sessions.json finalizes immediately The timeout path aborts the run and records cron: job execution timed out, but it does not emit the terminal lifecycle event that the gateway already uses to persist session end/error state. As a result, the session store is not finalized synchronously and host-side watchdog logic has to repair it later.

  • then it immediately rejects with new Error(timeoutErrorMessage())
  • persistGatewaySessionLifecycleEvent(...) persists phase=start|end|error onto the gateway session store
  • isolated-agent runs can return sessionId and sessionKey on normal completion/error paths
  • normal lifecycle:error and lifecycle:end events finalize the session store
  • cron result becomes error / cron: job execution timed out
  • phase: "error" or phase: "end"
  • error: "cron: job execution timed out" The timeout path rejects with a plain timeout error but does not emit a lifecycle event, so the gateway session store does not finalize in-band.
  1. Before propagating the timeout error, publish a terminal lifecycle event onto the same session bus used by normal gateway lifecycle handling.
  • preferred: emit lifecycle.error directly inside the timeout callback before rejecting

Root Cause

After #67785, the affected cron sessions now dispatch and do real work. But when the outer timer kills one of those runs, jobs.json and the cron run record can show a timeout while the gateway session entry remains non-terminal until an external watchdog correlates the timeout and rewrites sessions.json.

That leaves a latency gap where:

  • session state is wrong inside the gateway
  • health monitors can still see a stale running session
  • downstream tooling has to rely on host-side repair logic instead of first-party lifecycle semantics

Fix Action

Fix / Workaround

That work fixed the dispatch-ghost / cross-session lane problem, but there is still a session-finalization gap when a cron's outer wall timer fires after the session has already started.

After #67785, the affected cron sessions now dispatch and do real work. But when the outer timer kills one of those runs, jobs.json and the cron run record can show a timeout while the gateway session entry remains non-terminal until an external watchdog correlates the timeout and rewrites sessions.json.

RAW_BUFFERClick to expand / collapse

Bounty 91204fef Upstream Issue Draft

Target repo: openclaw/openclaw Installed version audited: [email protected] Related upstream refs: #48488, #67785, #43141

Proposed issue title

Cron outer timeout should emit lifecycle.error so sessions.json finalizes immediately

Proposed issue body

Summary

Follow-up to #48488 and the fix that landed in #67785.

That work fixed the dispatch-ghost / cross-session lane problem, but there is still a session-finalization gap when a cron's outer wall timer fires after the session has already started.

The timeout path aborts the run and records cron: job execution timed out, but it does not emit the terminal lifecycle event that the gateway already uses to persist session end/error state. As a result, the session store is not finalized synchronously and host-side watchdog logic has to repair it later.

Current behavior

In the shipped [email protected] bundle:

  • src/cron/service/timer.ts / executeJobCoreWithTimeout(...) wraps executeJobCore(...) in Promise.race(...)
  • when the timer fires, it calls runAbortController.abort(timeoutErrorMessage())
  • then it immediately rejects with new Error(timeoutErrorMessage())
  • the timeout path does not emit a lifecycle event

At the same time, session-store finalization already exists elsewhere:

  • persistGatewaySessionLifecycleEvent(...) persists phase=start|end|error onto the gateway session store
  • the gateway event handler calls it for lifecycle events that flow through the normal agent event bus

So the missing piece is not session persistence itself. The missing piece is that the cron outer-timeout path never publishes the terminal lifecycle event that would drive that persistence.

Why this matters

After #67785, the affected cron sessions now dispatch and do real work. But when the outer timer kills one of those runs, jobs.json and the cron run record can show a timeout while the gateway session entry remains non-terminal until an external watchdog correlates the timeout and rewrites sessions.json.

That leaves a latency gap where:

  • session state is wrong inside the gateway
  • health monitors can still see a stale running session
  • downstream tooling has to rely on host-side repair logic instead of first-party lifecycle semantics

Concrete code path

Observed in the installed dist bundle:

  • src/cron/service/timer.ts
    • executeJobCoreWithTimeout(...)
    • timeout callback aborts + rejects with cron: job execution timed out
  • executeJobCore(...)
    • isolated-agent runs can return sessionId and sessionKey on normal completion/error paths
  • src/gateway/server-chat.ts
    • persistGatewaySessionLifecycleEvent(...)
    • normal lifecycle:error and lifecycle:end events finalize the session store

This means the timeout path is dropping the exact signal the gateway already knows how to persist.

Repro

  1. Configure an isolated cron job with an agent-turn payload and a finite timeoutSeconds.
  2. Trigger a run that genuinely starts a session and begins work.
  3. Let the outer cron wall timer fire before the agent finishes.
  4. Observe:
    • cron result becomes error / cron: job execution timed out
    • no terminal lifecycle event is emitted from the timeout path
    • session finalization is delayed until external cleanup logic repairs the session row

Expected behavior

When the outer timer fires, the cron path should emit a terminal lifecycle event immediately, for example:

  • stream: "lifecycle"
  • phase: "error" or phase: "end"
  • error: "cron: job execution timed out"
  • errorKind: "timeout"
  • stopReason: "timeout"

That event should include the active cron session key so the existing gateway lifecycle persistence path can finalize the row synchronously.

Actual behavior

The timeout path rejects with a plain timeout error but does not emit a lifecycle event, so the gateway session store does not finalize in-band.

Proposed minimal fix

In src/cron/service/timer.ts:

  1. Teach the outer-timeout path to preserve the cron session identity (job.sessionKey, and session id if already available).
  2. Before propagating the timeout error, publish a terminal lifecycle event onto the same session bus used by normal gateway lifecycle handling.
  3. Reuse the existing persistGatewaySessionLifecycleEvent(...) downstream behavior instead of adding a second bespoke session-store write path.

Two reasonable implementation shapes:

  • preferred: emit lifecycle.error directly inside the timeout callback before rejecting
  • fallback: return a structured timeout result that still carries sessionKey, then synthesize the lifecycle event in the cron result-finalization path

Non-goal / distinction from #43141

#43141 is about queued cron-lane work that times out before model invocation because the queue is not abort-aware.

This report is different:

  • the session has already started
  • the outer timer fires after the cron run is already in-flight
  • the missing behavior is terminal session lifecycle emission, not queue removal

Suggested regression coverage

  • force a timed-out isolated cron run after session start
  • assert the matching session row transitions to a terminal state within the same gateway flow, without waiting for external watchdog repair
  • assert the emitted terminal event records a timeout-specific reason and preserves the session key

extent analysis

TL;DR

The cron outer timeout should emit a lifecycle error event to finalize the session store immediately.

Guidance

  • Identify the cron session identity (job.sessionKey and session id if available) in the outer-timeout path.
  • Publish a terminal lifecycle event onto the session bus before propagating the timeout error.
  • Reuse the existing persistGatewaySessionLifecycleEvent function to finalize the session store.
  • Consider two implementation shapes: emitting lifecycle.error directly or synthesizing the lifecycle event in the cron result-finalization path.

Example

// In src/cron/service/timer.ts
executeJobCoreWithTimeout(...): Promise {
  //...
  return Promise.race([
    executeJobCore(...),
    new Promise((_, reject) => {
      setTimeout(() => {
        const sessionKey = job.sessionKey;
        const sessionId = job.sessionId;
        // Emit lifecycle error event
        emitLifecycleEvent('lifecycle', 'error', 'cron: job execution timed out', 'timeout', sessionKey, sessionId);
        reject(new Error('cron: job execution timed out'));
      }, timeoutSeconds * 1000);
    }),
  ]);
}

Notes

This fix assumes that the emitLifecycleEvent function is already implemented and available in the scope of executeJobCoreWithTimeout. Additionally, the persistGatewaySessionLifecycleEvent function should be able to handle the emitted lifecycle event and finalize the session store accordingly.

Recommendation

Apply the proposed minimal fix to emit a terminal lifecycle event in the outer-timeout path, allowing the gateway session store to finalize synchronously. This fix reuses existing functionality and ensures consistent session state handling.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When the outer timer fires, the cron path should emit a terminal lifecycle event immediately, for example:

  • stream: "lifecycle"
  • phase: "error" or phase: "end"
  • error: "cron: job execution timed out"
  • errorKind: "timeout"
  • stopReason: "timeout"

That event should include the active cron session key so the existing gateway lifecycle persistence path can finalize the row synchronously.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Cron outer timeout should emit lifecycle.error so sessions.json finalizes immediately [1 participants]