openclaw - 💡(How to fix) Fix [Bug]: Cron pre-model watchdog kills active isolated runs — model_call_started never emitted by Pi or CLI runners

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The cron pre-model watchdog (CRON_AGENT_PRE_MODEL_WATCHDOG_MS, 60s) fires unconditionally for every isolated cron job because neither the Pi embedded runner nor the CLI runner (Codex path) emits the model_call_started phase event that would clear it. Jobs that complete in under 60s survive; jobs that take longer are aborted mid-execution with the misleading error:

cron: isolated agent run stalled before first model call (last phase: attempt-dispatch)

The model call did start. The run was not stalled.

Error Message

The cron pre-model watchdog (CRON_AGENT_PRE_MODEL_WATCHDOG_MS, 60s) fires unconditionally for every isolated cron job because neither the Pi embedded runner nor the CLI runner (Codex path) emits the model_call_started phase event that would clear it. Jobs that complete in under 60s survive; jobs that take longer are aborted mid-execution with the misleading error:

  • Consequence: Wasted model tokens (jobs run and consume resources before being killed), missing outputs, missed deliveries, misleading error messages that suggest dispatch failure when the run was healthy.

Root Cause

The cron pre-model watchdog (CRON_AGENT_PRE_MODEL_WATCHDOG_MS, 60s) fires unconditionally for every isolated cron job because neither the Pi embedded runner nor the CLI runner (Codex path) emits the model_call_started phase event that would clear it. Jobs that complete in under 60s survive; jobs that take longer are aborted mid-execution with the misleading error:

Fix Action

Fix / Workaround

cron: isolated agent run stalled before first model call (last phase: attempt-dispatch)
  1. Create an isolated cron job with a task that takes >60s:
    openclaw cron create --name "watchdog-test" --at "<now+5m ISO>" \
      --session isolated --timeout-seconds 900 \
      --message "Run a multi-step task that takes more than 60 seconds."
  2. The job is killed at ~60s with stalled before first model call (last phase: attempt-dispatch).
  3. Session logs confirm the model was actively generating output when killed.
LinePhase emitted
1760runner_entered
1774workspace
1781runtime_plugins
1879model_resolution
2026auth
2164context_engine
2286attempt_dispatch

Code Example

cron: isolated agent run stalled before first model call (last phase: attempt-dispatch)

---

openclaw cron create --name "watchdog-test" --at "<now+5m ISO>" \
     --session isolated --timeout-seconds 900 \
     --message "Run a multi-step task that takes more than 60 seconds."

---

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai/gpt-5.5",
        "fallbacks": ["opencode-go/glm-5.1", "opencode-go/kimi-k2.6"]
      }
    }
  }
}

---

const startPreModelTimeout = () => {
    if (preModelTimeoutId || modelCallStarted) return;
    preModelTimeoutId = setTimeout(() => {
        if (!modelCallStarted) triggerTimeout(preModelTimeoutErrorMessage(activeExecution));
    }, resolveCronAgentPreModelWatchdogMs(jobTimeoutMs));  // 60s
};

// In noteExecutionProgress:
if (info.phase === "model_call_started" || info.firstModelCallStarted) {
    modelCallStarted = true;
    clearPreModelTimeout();
}

---

async function runCliAgent(params) {
    params.onExecutionStarted?.();
    // ... onExecutionPhase is never called
}

---

[agent/embedded] embedded abort settle timed out: runId=a3a3bc4e sessionId=a3a3bc4e timeoutMs=2000
[agent/embedded] embedded run failover decision: runId=a3a3bc4e stage=assistant decision=surface_error reason=none from=opencode-go/glm-5.1

---

if (info.phase === "model_call_started" || info.phase === "attempt_dispatch" || info.firstModelCallStarted) {
       modelCallStarted = true;
       clearPreModelTimeout();
   }
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Yes — all isolated cron jobs that take longer than 60s are unconditionally killed while actively executing.

Summary

The cron pre-model watchdog (CRON_AGENT_PRE_MODEL_WATCHDOG_MS, 60s) fires unconditionally for every isolated cron job because neither the Pi embedded runner nor the CLI runner (Codex path) emits the model_call_started phase event that would clear it. Jobs that complete in under 60s survive; jobs that take longer are aborted mid-execution with the misleading error:

cron: isolated agent run stalled before first model call (last phase: attempt-dispatch)

The model call did start. The run was not stalled.

Steps to reproduce

  1. Create an isolated cron job with a task that takes >60s:
    openclaw cron create --name "watchdog-test" --at "<now+5m ISO>" \
      --session isolated --timeout-seconds 900 \
      --message "Run a multi-step task that takes more than 60 seconds."
  2. The job is killed at ~60s with stalled before first model call (last phase: attempt-dispatch).
  3. Session logs confirm the model was actively generating output when killed.

Expected behavior

The pre-model watchdog should clear when the model call starts, allowing the job to run until either its configured timeoutSeconds or natural completion.

Actual behavior

The pre-model watchdog fires at 60s regardless of model call state, aborting all isolated runs longer than 60s.

OpenClaw version

OpenClaw 2026.5.10-beta.3 (6d7dcd9)

Operating system

macOS Darwin 24.6.0 (Apple Silicon)

Install method

npm global

Model

Observed with both openai/gpt-5.5 (Codex/CLI path) and opencode-go/glm-5.1 (Pi embedded path). The bug is runner-agnostic — neither path emits the required signal.

Provider / routing chain

  • openai/gpt-5.5 → CLI runner (Codex app-server) → cli-runner-DyC31cUU.js
  • opencode-go/glm-5.1 → Pi embedded runner → pi-embedded-CoRiLi5x.js

Additional provider/model setup details

Default model config:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai/gpt-5.5",
        "fallbacks": ["opencode-go/glm-5.1", "opencode-go/kimi-k2.6"]
      }
    }
  }
}

OpenAI models have agentRuntime: { id: "codex" } (added by the 5.10-beta.3 upgrade), routing them through the CLI runner.

Logs, screenshots, and evidence

Watchdog code (server-cron-Q61S5eaf.js:344–363):

The watchdog clears only when onExecutionPhase delivers phase === "model_call_started" or firstModelCallStarted === true:

const startPreModelTimeout = () => {
    if (preModelTimeoutId || modelCallStarted) return;
    preModelTimeoutId = setTimeout(() => {
        if (!modelCallStarted) triggerTimeout(preModelTimeoutErrorMessage(activeExecution));
    }, resolveCronAgentPreModelWatchdogMs(jobTimeoutMs));  // 60s
};

// In noteExecutionProgress:
if (info.phase === "model_call_started" || info.firstModelCallStarted) {
    modelCallStarted = true;
    clearPreModelTimeout();
}

Pi embedded runner (pi-embedded-CoRiLi5x.js) — emits phases via notifyExecutionPhase() but never model_call_started:

LinePhase emitted
1760runner_entered
1774workspace
1781runtime_plugins
1879model_resolution
2026auth
2164context_engine
2286attempt_dispatch

No model_call_started phase exists in this file.

CLI runner (cli-runner-DyC31cUU.js:52) — calls onExecutionStarted() once, never references onExecutionPhase at all:

async function runCliAgent(params) {
    params.onExecutionStarted?.();
    // ... onExecutionPhase is never called
}

grep -n 'onExecutionPhase' cli-runner-DyC31cUU.js returns zero matches.

Observed impact (2026-05-12):

Five isolated cron jobs killed overnight. Session logs confirm model calls were active:

JobTime (HKT)ModelEvidence of executionDuration before kill
Daily Memory Review03:00gpt-5.5 (default)Unknown~57s
TRMNL T104:00gpt-5.5 (default)GPT-5.5 ran but didn't reach CLI invocation~61s
TRMNL T204:15gpt-5.5 (default)Claude CLI ran, scene directory created~61s
Morning Briefing06:00opencode-go/glm-5.14 messages in session, tokens consumed~61s
TDNET Morning08:00gpt-5.5 (default)Unknown~61s

Jobs that succeeded the same day all completed in under 60s:

JobDuration
Pre-warm Heartbeat (03:30)39.0s
Check-in (09:00)38.7s
TDNET Stop (16:00)29.1s
N225 Monitor (18:00)14.5s

Morning Briefing err.log detail:

[agent/embedded] embedded abort settle timed out: runId=a3a3bc4e sessionId=a3a3bc4e timeoutMs=2000
[agent/embedded] embedded run failover decision: runId=a3a3bc4e stage=assistant decision=surface_error reason=none from=opencode-go/glm-5.1

Impact and severity

  • Affected: Every isolated cron job (sessionTarget: "isolated") on any model, any provider, any runtime path (Pi or CLI/Codex).
  • Severity: High. Long-running cron jobs (morning briefings, scene generation, monitoring scripts) are unconditionally killed at 60s. Short jobs (<60s) mask the bug by completing before the watchdog fires.
  • Frequency: 100% — the model_call_started event is never emitted, so the watchdog always fires.
  • Consequence: Wasted model tokens (jobs run and consume resources before being killed), missing outputs, missed deliveries, misleading error messages that suggest dispatch failure when the run was healthy.

Additional information

Related: #80217 (Codex app-server: report Codex-native tool execution to diagnostics so long-running tools no longer look like stale embedded runs to the watchdog). This is the same class of issue — a watchdog that expects explicit event signaling from the runner, but the runner doesn't emit it. #80217 fixed the tool-execution watchdog for Codex; this bug is about the cron pre-model watchdog, which neither runner satisfies.

Suggested fix (two options):

  1. Emit model_call_started from both runners. The Pi embedded runner should emit it when runEmbeddedAttemptWithBackend begins (after the attempt_dispatch phase, line ~2301). The CLI runner should emit it when the CLI session starts its first model turn.

  2. Or clear the pre-model watchdog at attempt_dispatch. If the dispatch phase is reached, the run is not stalled. Change the watchdog condition:

    if (info.phase === "model_call_started" || info.phase === "attempt_dispatch" || info.firstModelCallStarted) {
        modelCallStarted = true;
        clearPreModelTimeout();
    }

Option 1 is more correct (the watchdog would still catch genuine post-dispatch stalls). Option 2 is a one-line fix that unblocks users immediately.

Separate issue (same upgrade): The Codex runtime sandbox sets HOME to the agent sandbox path (~/.openclaw/agents/main/agent/codex-home/home). Scripts that shell out to openclaw message send --channel discord fail with Channel is unavailable: discord because the CLI reads config from the sandboxed HOME. Will file separately.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The pre-model watchdog should clear when the model call starts, allowing the job to run until either its configured timeoutSeconds or natural completion.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Cron pre-model watchdog kills active isolated runs — model_call_started never emitted by Pi or CLI runners