openclaw - 💡(How to fix) Fix [Bug]: Cron pre-model watchdog kills active isolated runs — model_call_started never emitted by Pi or CLI runners

StepCodex · 2026-05-12T05:21:32Z

[openclaw] The cron pre-model watchdog CRON AGENT PRE MODEL WATCHDOG MS , 60s fires unconditionally for every isolated cron job because neither the Pi embedded… The cron pre-model watchdog (`CRON_AGENT_PRE_MODEL_WATCHDOG_MS`, 60s) fires unconditionally for every isolated cron job because neither the Pi embedded runner nor the CLI runner (Codex path) emits the `model_call_started` phase event that would clear it. Jobs that complete in under 60s survive; jobs that take longer are aborted mid-execution with the misleading error: ``` cron: isolated agent run stalled before first model call (last phase: attempt-dispatch) ``` The model call did start. The run was not stalled. ## Fix / Workaround ``` cron: isolated agent run stalled before first model call (last phase: attempt-dispatch) ``` 1. Create an isolated cron job with a task that takes >60s: ``` openclaw cron create --name "watchdog-test" --at " " \ --session isolated --timeout-seconds 900 \ --message "Run a multi-step task that takes more than 60 seconds." ``` 2. The job is killed at ~60s with `stalled before first model call (last phase: attempt-dispatch)`. 3. Session logs confirm the model was actively generating output when killed. | Line | Phase emitted | |------|--------------| | 1760 | `runner_entered` | | 1774 | `workspace` | | 1781 | `runtime_plugins` | | 1879 | `model_resolution` | | 2026 | `auth` | | 2164 | `context_engine` | | 2286 | `attempt_dispatch` | ### Bug type Behavior bug (incorrect output/state without crash) ### Beta release blocker Yes — all isolated cron jobs that take longer than 60s are unconditionally killed while actively executing. ### Summary The cron pre-model watchdog (`CRON_AGENT_PRE_MODEL_WATCHDOG_MS`, 60s) fires unconditionally for every isolated cron job because neither the Pi embedded runner nor the CLI runner (Codex path) emits the `model_call_started` phase event that would clear it. Jobs that complete in under 60s survive; jobs that take longer are aborted mid-execution with the misleading error: ``` cron: isolated agent run stalled before first model call (last phase: attempt-dispatch) ``` The model call did start. The run was not stalled. ### Steps to reproduce 1. Create an isolated cron job with a task that takes >60s: ``` openclaw cron create --name "watchdog-test" --at " " \ --session isolated --timeout-seconds 900 \ --message "Run a multi-step task that takes more than 60 seconds." ``` 2. The job is killed at ~60s with `stalled before first model call (last phase: attempt-dispatch)`. 3. Session logs confirm the model was actively generating output when killed. ### Expected behavior The pre-model watchdog should clear when the model call starts, allowing the job to run until either its configured `timeoutSeconds` or natural completion. ### Actual behavior The pre-model watchdog fires at 60s regardless of model call state, aborting all isolated runs longer than 60s. ### OpenClaw version OpenClaw 2026.5.10-beta.3 (6d7dcd9) ### Operating system macOS Darwin 24.6.0 (Apple Silicon) ### Install method npm global ### Model Observed with both `openai/gpt-5.5` (Codex/CLI path) and `opencode-go/glm-5.1` (Pi embedded path). The bug is runner-agnostic — neither path emits the required signal. ### Provider / routing chain - `openai/gpt-5.5` → CLI runner (Codex app-server) → `cli-runner-DyC31cUU.js` - `opencode-go/glm-5.1` → Pi embedded runner → `pi-embedded-CoRiLi5x.js` ### Additional provider/model setup details Default model config: ```json { "agents": { "defaults": { "model": { "primary": "openai/gpt-5.5", "fallbacks": ["opencode-go/glm-5.1", "opencode-go/kimi-k2.6"] } } } } ``` OpenAI models have `agentRuntime: { id: "codex" }` (added by the 5.10-beta.3 upgrade), routing them through the CLI runner. ### Logs, screenshots, and evidence **Watchdog code** (`server-cron-Q61S5eaf.js:344–363`): The watchdog clears only when `onExecutionPhase` delivers `phase === "model_call_started"` or `firstModelCallStarted === true`: ```javascript const startPreModelTimeout = () => { if (preModelTimeoutId || modelCallStarted) return; preModelTimeoutId = setTimeout(() => { if (!modelCallStarted) triggerTimeout(preModelTimeoutErrorMessage(activeExecution)); }, resolveCronAgentPreModelWatchdogMs(jobTimeoutMs)); // 60s }; // In noteExecutionProgress: if (info.phase === "model_call_started" || info.firstModelCallStarted) { modelCallStarted = true; clearPreModelTimeout(); } ``` **Pi embedded runner** (`pi-embedded-CoRiLi5x.js`) — emits phases via `notifyExecutionPhase()` but never `model_call_started`: | Line | Phase emitted | |------|--------------| | 1760 | `runner_entered` | | 1774 | `workspace` | | 1781 | `runtime_plugins` | | 1879 | `model_resolution` | | 2026 | `auth` | | 2164 | `context_engine` | | 2286 | `attempt_dispatch` | No `model_call_started` phase exists in this file. **CLI runner** (`cli-runner-DyC31cUU.js:52`) — calls `onExecutionStarted()` once, never references `onExecutionPhase` at all: ```javascript asyn

openclaw2026-05-12 05:21:32

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

The cron pre-model watchdog (CRON_AGENT_PRE_MODEL_WATCHDOG_MS, 60s) fires unconditionally for every isolated cron job because neither the Pi embedded runner nor the CLI runner (Codex path) emits the model_call_started phase event that would clear it. Jobs that complete in under 60s survive; jobs that take longer are aborted mid-execution with the misleading error:

cron: isolated agent run stalled before first model call (last phase: attempt-dispatch)

The model call did start. The run was not stalled.

Error Message

Consequence: Wasted model tokens (jobs run and consume resources before being killed), missing outputs, missed deliveries, misleading error messages that suggest dispatch failure when the run was healthy.

Root Cause

Fix Action

Fix / Workaround

cron: isolated agent run stalled before first model call (last phase: attempt-dispatch)

Create an isolated cron job with a task that takes >60s:

openclaw cron create --name "watchdog-test" --at "<now+5m ISO>" \
  --session isolated --timeout-seconds 900 \
  --message "Run a multi-step task that takes more than 60 seconds."

The job is killed at ~60s with stalled before first model call (last phase: attempt-dispatch).
Session logs confirm the model was actively generating output when killed.

Line	Phase emitted
1760	`runner_entered`
1774	`workspace`
1781	`runtime_plugins`
1879	`model_resolution`
2026	`auth`
2164	`context_engine`
2286	`attempt_dispatch`

Code Example

cron: isolated agent run stalled before first model call (last phase: attempt-dispatch)

---

openclaw cron create --name "watchdog-test" --at "<now+5m ISO>" \
     --session isolated --timeout-seconds 900 \
     --message "Run a multi-step task that takes more than 60 seconds."

---

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai/gpt-5.5",
        "fallbacks": ["opencode-go/glm-5.1", "opencode-go/kimi-k2.6"]
      }
    }
  }
}

---

const startPreModelTimeout = () => {
    if (preModelTimeoutId || modelCallStarted) return;
    preModelTimeoutId = setTimeout(() => {
        if (!modelCallStarted) triggerTimeout(preModelTimeoutErrorMessage(activeExecution));
    }, resolveCronAgentPreModelWatchdogMs(jobTimeoutMs));  // 60s
};

// In noteExecutionProgress:
if (info.phase === "model_call_started" || info.firstModelCallStarted) {
    modelCallStarted = true;
    clearPreModelTimeout();
}

---

async function runCliAgent(params) {
    params.onExecutionStarted?.();
    // ... onExecutionPhase is never called
}

---

[agent/embedded] embedded abort settle timed out: runId=a3a3bc4e sessionId=a3a3bc4e timeoutMs=2000
[agent/embedded] embedded run failover decision: runId=a3a3bc4e stage=assistant decision=surface_error reason=none from=opencode-go/glm-5.1

---

if (info.phase === "model_call_started" || info.phase === "attempt_dispatch" || info.firstModelCallStarted) {
       modelCallStarted = true;
       clearPreModelTimeout();
   }

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Yes — all isolated cron jobs that take longer than 60s are unconditionally killed while actively executing.

Summary

cron: isolated agent run stalled before first model call (last phase: attempt-dispatch)

The model call did start. The run was not stalled.

Steps to reproduce

Create an isolated cron job with a task that takes >60s:

openclaw cron create --name "watchdog-test" --at "<now+5m ISO>" \
  --session isolated --timeout-seconds 900 \
  --message "Run a multi-step task that takes more than 60 seconds."

The job is killed at ~60s with stalled before first model call (last phase: attempt-dispatch).
Session logs confirm the model was actively generating output when killed.

Expected behavior

The pre-model watchdog should clear when the model call starts, allowing the job to run until either its configured timeoutSeconds or natural completion.

Actual behavior

The pre-model watchdog fires at 60s regardless of model call state, aborting all isolated runs longer than 60s.

OpenClaw version

OpenClaw 2026.5.10-beta.3 (6d7dcd9)

Operating system

macOS Darwin 24.6.0 (Apple Silicon)

Install method

npm global

Model

Observed with both openai/gpt-5.5 (Codex/CLI path) and opencode-go/glm-5.1 (Pi embedded path). The bug is runner-agnostic — neither path emits the required signal.

Provider / routing chain

openai/gpt-5.5 → CLI runner (Codex app-server) → cli-runner-DyC31cUU.js
opencode-go/glm-5.1 → Pi embedded runner → pi-embedded-CoRiLi5x.js

Additional provider/model setup details

Default model config:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai/gpt-5.5",
        "fallbacks": ["opencode-go/glm-5.1", "opencode-go/kimi-k2.6"]
      }
    }
  }
}

OpenAI models have agentRuntime: { id: "codex" } (added by the 5.10-beta.3 upgrade), routing them through the CLI runner.

Logs, screenshots, and evidence

Watchdog code (server-cron-Q61S5eaf.js:344–363):

The watchdog clears only when onExecutionPhase delivers phase === "model_call_started" or firstModelCallStarted === true:

const startPreModelTimeout = () => {
    if (preModelTimeoutId || modelCallStarted) return;
    preModelTimeoutId = setTimeout(() => {
        if (!modelCallStarted) triggerTimeout(preModelTimeoutErrorMessage(activeExecution));
    }, resolveCronAgentPreModelWatchdogMs(jobTimeoutMs));  // 60s
};

// In noteExecutionProgress:
if (info.phase === "model_call_started" || info.firstModelCallStarted) {
    modelCallStarted = true;
    clearPreModelTimeout();
}

Pi embedded runner (pi-embedded-CoRiLi5x.js) — emits phases via notifyExecutionPhase() but never model_call_started:

Line	Phase emitted
1760	`runner_entered`
1774	`workspace`
1781	`runtime_plugins`
1879	`model_resolution`
2026	`auth`
2164	`context_engine`
2286	`attempt_dispatch`

No model_call_started phase exists in this file.

CLI runner (cli-runner-DyC31cUU.js:52) — calls onExecutionStarted() once, never references onExecutionPhase at all:

async function runCliAgent(params) {
    params.onExecutionStarted?.();
    // ... onExecutionPhase is never called
}

grep -n 'onExecutionPhase' cli-runner-DyC31cUU.js returns zero matches.

Observed impact (2026-05-12):

Five isolated cron jobs killed overnight. Session logs confirm model calls were active:

Job	Time (HKT)	Model	Evidence of execution	Duration before kill
Daily Memory Review	03:00	gpt-5.5 (default)	Unknown	~57s
TRMNL T1	04:00	gpt-5.5 (default)	GPT-5.5 ran but didn't reach CLI invocation	~61s
TRMNL T2	04:15	gpt-5.5 (default)	Claude CLI ran, scene directory created	~61s
Morning Briefing	06:00	opencode-go/glm-5.1	4 messages in session, tokens consumed	~61s
TDNET Morning	08:00	gpt-5.5 (default)	Unknown	~61s

Jobs that succeeded the same day all completed in under 60s:

Job	Duration
Pre-warm Heartbeat (03:30)	39.0s
Check-in (09:00)	38.7s
TDNET Stop (16:00)	29.1s
N225 Monitor (18:00)	14.5s

Morning Briefing err.log detail:

[agent/embedded] embedded abort settle timed out: runId=a3a3bc4e sessionId=a3a3bc4e timeoutMs=2000
[agent/embedded] embedded run failover decision: runId=a3a3bc4e stage=assistant decision=surface_error reason=none from=opencode-go/glm-5.1

Impact and severity

Affected: Every isolated cron job (sessionTarget: "isolated") on any model, any provider, any runtime path (Pi or CLI/Codex).
Severity: High. Long-running cron jobs (morning briefings, scene generation, monitoring scripts) are unconditionally killed at 60s. Short jobs (<60s) mask the bug by completing before the watchdog fires.
Frequency: 100% — the model_call_started event is never emitted, so the watchdog always fires.
Consequence: Wasted model tokens (jobs run and consume resources before being killed), missing outputs, missed deliveries, misleading error messages that suggest dispatch failure when the run was healthy.

Additional information

Related: #80217 (Codex app-server: report Codex-native tool execution to diagnostics so long-running tools no longer look like stale embedded runs to the watchdog). This is the same class of issue — a watchdog that expects explicit event signaling from the runner, but the runner doesn't emit it. #80217 fixed the tool-execution watchdog for Codex; this bug is about the cron pre-model watchdog, which neither runner satisfies.

Suggested fix (two options):

Emit model_call_started from both runners. The Pi embedded runner should emit it when runEmbeddedAttemptWithBackend begins (after the attempt_dispatch phase, line ~2301). The CLI runner should emit it when the CLI session starts its first model turn.

Or clear the pre-model watchdog at attempt_dispatch. If the dispatch phase is reached, the run is not stalled. Change the watchdog condition:

if (info.phase === "model_call_started" || info.phase === "attempt_dispatch" || info.firstModelCallStarted) {
    modelCallStarted = true;
    clearPreModelTimeout();
}

Option 1 is more correct (the watchdog would still catch genuine post-dispatch stalls). Option 2 is a one-line fix that unblocks users immediately.

Separate issue (same upgrade): The Codex runtime sandbox sets HOME to the agent sandbox path (~/.openclaw/agents/main/agent/codex-home/home). Scripts that shell out to openclaw message send --channel discord fail with Channel is unavailable: discord because the CLI reads config from the sandboxed HOME. Will file separately.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

The pre-model watchdog should clear when the model call starts, allowing the job to run until either its configured timeoutSeconds or natural completion.

#docker error #permission error #memory optimization #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Cron pre-model watchdog kills active isolated runs — model_call_started never emitted by Pi or CLI runners

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Cron pre-model watchdog kills active isolated runs — model_call_started never emitted by Pi or CLI runners

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING