openclaw - 💡(How to fix) Fix Cron isolated-agent: fallback chain silently delivers incomplete-turn warning, and status=error rows mark delivered=true [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72985Fetched 2026-04-28 06:29:09
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Error Message

Cron isolated-agent: provider-fallback failures silently deliver "Agent couldn't generate a response" + status=error rows mark delivered=true

  1. (P1) The fallback chain has no second-stage retry / no proactive alert. The error text is silently delivered to the user's chat as if it were a normal cron output.
  2. (P2) The cron run is recorded in jobs-state.json with lastRunStatus: "error" and lastDelivered: true simultaneously, which corrupts downstream monitoring (Observer agent reading jobs-state.json cannot tell whether the user actually got useful content). "status": "error", "error": "⚠️ Agent couldn't generate a response. Please try again.", | 04-27 | minimax | MiniMax-M2.7-highspeed | error | 180s | "lastRunStatus": "error", lastRunStatus: "error" AND lastDelivered: true co-existing is the contradiction described in P2 below. The downstream impact: a monitoring agent (Observer) reading jobs-state.json and runs/<jobId>.jsonl cannot tell from lastRunStatus: error alone what category of failure this was. We had to dig into the per-run jsonl to spot delivery.fallbackUsed: true and the provider switch from anthropic-proxy to minimax.

P2 — status: "error" + delivered: true simultaneously is a monitoring contradiction

  • The fallback's failure text ("Agent couldn't generate a response") is treated as a successful "delivery" because it was sent to the configured channel without the channel itself raising an error.
  • jobs-state.json records both "lastRunStatus": "error" and "lastDelivered": true. A monitoring agent that reads jobs-state.json to compute system health sees rows where delivered: true (✅ "user got something") and status: error (🔴 "the run failed") at the same time. Without reading per-run trajectories, it cannot reconcile the two. In our deployment, an LLM monitoring agent reading these contradictory signals fabricated a "3 tasks reported errors" narrative when ground truth was "1 task delivered the fallback's failure-text"; the other 2 tasks the LLM hallucinated as errors were actually status: ok.
  • (a) When lastRunStatus === "error", force lastDelivered: false. The "delivery" should only count if the run produced real assistant content; an error fallback text doesn't count.
  • (b) Add a separate lastErrorDelivered: true field to distinguish "we sent an error notification" from "we sent the actual cron output". Update monitoring docs accordingly. The error text comes from resolveIncompleteTurnPayloadText in src/agents/pi-embedded-runner/run/incomplete-turn.ts, which fires when:
  • stopReason === "error" In the 04-27 case, model: "MiniMax-M2.7-highspeed" likely returned stopReason: "error" or empty payload after fallback, which triggered the warning text. The cron delivery layer (src/cron/delivery.ts and src/cron/isolated-agent/delivery-dispatch.ts) then shipped that warning text as if it were the cron output, marking delivered: true. The simplest single-line fix for P2 is in src/cron/delivery.ts (or wherever the lastDelivered field is set): when the run's terminal status is error AND the delivered text matches the incomplete-turn fallback regex, set lastDelivered: false. OpenClaw users running multiple daily cron jobs use jobs-state.json as the authoritative source of truth for cron health. Any bug that makes a status:error row look "delivered" causes downstream monitoring loops (e.g., Observer-style health-check agents) to either silently miss real errors, or fabricate fake errors to compensate for the missing signal. Both directions damage trust.

Root Cause

OpenClaw users running multiple daily cron jobs use jobs-state.json as the authoritative source of truth for cron health. Any bug that makes a status:error row look "delivered" causes downstream monitoring loops (e.g., Observer-style health-check agents) to either silently miss real errors, or fabricate fake errors to compensate for the missing signal. Both directions damage trust.


Fix Action

Fix / Workaround

In the 04-27 case, model: "MiniMax-M2.7-highspeed" likely returned stopReason: "error" or empty payload after fallback, which triggered the warning text. The cron delivery layer (src/cron/delivery.ts and src/cron/isolated-agent/delivery-dispatch.ts) then shipped that warning text as if it were the cron output, marking delivered: true.

Code Example

{
  "ts": 1777251780303,
  "jobId": "926bfa97-b097-4be5-8a95-fef83c29353e",
  "action": "finished",
  "status": "error",
  "error": "⚠️ Agent couldn't generate a response. Please try again.",
  "summary": "⚠️ Agent couldn't generate a response. Please try again.",
  "delivered": true,
  "deliveryStatus": "delivered",
  "delivery": {
    "intended": { "channel": "feishu", "to": "user:ou_edc…" },
    "resolved": { "ok": true, "channel": "feishu", "to": "user:ou_edc…" },
    "fallbackUsed": true,
    "delivered": true
  },
  "durationMs": 180233,
  "model": "MiniMax-M2.7-highspeed",
  "provider": "minimax"
}

---

"926bfa97-b097-4be5-8a95-fef83c29353e": {
  "state": {
    "lastRunStatus": "error",
    "lastError": "⚠️ Agent couldn't generate a response. Please try again.",
    "lastDeliveryStatus": "delivered",
    "lastDelivered": true,
    "consecutiveErrors": 1,
    "lastDurationMs": 180233
  }
}
RAW_BUFFERClick to expand / collapse

Cron isolated-agent: provider-fallback failures silently deliver "Agent couldn't generate a response" + status=error rows mark delivered=true

TL;DR

When a cron job's primary provider (e.g. anthropic-proxy) is temporarily unavailable, OpenClaw's cron isolated-agent runner falls back to the configured backup provider (e.g. minimax). If the backup provider then also fails to produce a visible response, the run terminates with the pi-embedded-runner incomplete-turn fallback text "⚠️ Agent couldn't generate a response. Please try again.", but:

  1. (P1) The fallback chain has no second-stage retry / no proactive alert. The error text is silently delivered to the user's chat as if it were a normal cron output.
  2. (P2) The cron run is recorded in jobs-state.json with lastRunStatus: "error" and lastDelivered: true simultaneously, which corrupts downstream monitoring (Observer agent reading jobs-state.json cannot tell whether the user actually got useful content).

Both bugs were observed in production on 2026-04-27 and caused a downstream Observer agent to fabricate "3 cron tasks reported errors simultaneously" when in reality only 1 cron run had genuinely failed.


Reproduction (real production trace)

Job: 926bfa97-b097-4be5-8a95-fef83c29353e (ContentCrew 选题立项, scheduled 30 9 * * * Asia/Shanghai)

Run on 2026-04-27 09:30 CST — entry from ~/.openclaw/cron/runs/926bfa97-…jsonl:

{
  "ts": 1777251780303,
  "jobId": "926bfa97-b097-4be5-8a95-fef83c29353e",
  "action": "finished",
  "status": "error",
  "error": "⚠️ Agent couldn't generate a response. Please try again.",
  "summary": "⚠️ Agent couldn't generate a response. Please try again.",
  "delivered": true,
  "deliveryStatus": "delivered",
  "delivery": {
    "intended": { "channel": "feishu", "to": "user:ou_edc…" },
    "resolved": { "ok": true, "channel": "feishu", "to": "user:ou_edc…" },
    "fallbackUsed": true,
    "delivered": true
  },
  "durationMs": 180233,
  "model": "MiniMax-M2.7-highspeed",
  "provider": "minimax"
}

Same jobId historical context (model + provider + status):

dateprovidermodelstatusduration
04-22anthropic-proxyclaude-opus-4-6ok302s
04-25anthropic-proxyclaude-sonnet-4-6ok583s
04-26anthropic-proxyclaude-sonnet-4-6ok72s
04-27minimaxMiniMax-M2.7-highspeederror180s

The 04-27 run is the only one that swung to the fallback provider; that run failed with pi-embedded-runner's incomplete-turn fallback text (src/agents/pi-embedded-runner/run/incomplete-turn.ts:233-244).

~/.openclaw/cron/jobs-state.json after the run:

"926bfa97-b097-4be5-8a95-fef83c29353e": {
  "state": {
    "lastRunStatus": "error",
    "lastError": "⚠️ Agent couldn't generate a response. Please try again.",
    "lastDeliveryStatus": "delivered",
    "lastDelivered": true,
    "consecutiveErrors": 1,
    "lastDurationMs": 180233
  }
}

lastRunStatus: "error" AND lastDelivered: true co-existing is the contradiction described in P2 below.


P1 — fallback chain is single-shot and silent

Current behavior:

  • Primary provider fails → fallback provider invoked once → if fallback provider also produces an incomplete-turn situation, the run ends with the pi-embedded-runner fallback text shipped to the user.
  • No same-provider retry on the fallback path.
  • No additional fallback to a third provider.
  • No telemetry / alert distinguishing "this output is the fallback's failure text" from "this output is real assistant content".

Expected behavior (one of):

  • (a) When delivery.fallbackUsed === true AND pi-embedded-runner falls through to the incomplete-turn warning, retry the same provider once before giving up (180s here is far below typical timeouts; the failure smells transient).
  • (b) When fallback also fails, suppress the user-visible "Agent couldn't generate a response" delivery, and instead post a system-level alert to the configured alert channel ("ContentCrew 09:30 cron failed: primary anthropic-proxy unavailable, fallback minimax also returned empty after 180s").
  • (c) Surface the fallbackUsed flag prominently in the cron run record so monitoring agents can distinguish "a successful run via fallback" from "a failed run that fell off the fallback path".

The downstream impact: a monitoring agent (Observer) reading jobs-state.json and runs/<jobId>.jsonl cannot tell from lastRunStatus: error alone what category of failure this was. We had to dig into the per-run jsonl to spot delivery.fallbackUsed: true and the provider switch from anthropic-proxy to minimax.


P2 — status: "error" + delivered: true simultaneously is a monitoring contradiction

Current behavior:

  • The fallback's failure text ("Agent couldn't generate a response") is treated as a successful "delivery" because it was sent to the configured channel without the channel itself raising an error.
  • jobs-state.json records both "lastRunStatus": "error" and "lastDelivered": true.

Why this is wrong: A monitoring agent that reads jobs-state.json to compute system health sees rows where delivered: true (✅ "user got something") and status: error (🔴 "the run failed") at the same time. Without reading per-run trajectories, it cannot reconcile the two. In our deployment, an LLM monitoring agent reading these contradictory signals fabricated a "3 tasks reported errors" narrative when ground truth was "1 task delivered the fallback's failure-text"; the other 2 tasks the LLM hallucinated as errors were actually status: ok.

Expected behavior (one of):

  • (a) When lastRunStatus === "error", force lastDelivered: false. The "delivery" should only count if the run produced real assistant content; an error fallback text doesn't count.
  • (b) Add a separate lastErrorDelivered: true field to distinguish "we sent an error notification" from "we sent the actual cron output". Update monitoring docs accordingly.

Root cause hypothesis (please verify)

The error text comes from resolveIncompleteTurnPayloadText in src/agents/pi-embedded-runner/run/incomplete-turn.ts, which fires when:

  • stopReason === "toolUse" without visible text, OR
  • reasoning-only assistant turn, OR
  • empty response, OR
  • stopReason === "error"

In the 04-27 case, model: "MiniMax-M2.7-highspeed" likely returned stopReason: "error" or empty payload after fallback, which triggered the warning text. The cron delivery layer (src/cron/delivery.ts and src/cron/isolated-agent/delivery-dispatch.ts) then shipped that warning text as if it were the cron output, marking delivered: true.

The simplest single-line fix for P2 is in src/cron/delivery.ts (or wherever the lastDelivered field is set): when the run's terminal status is error AND the delivered text matches the incomplete-turn fallback regex, set lastDelivered: false.

P1 is more involved — it would require touching the fallback policy in src/cron/isolated-agent/run-fallback-policy.ts.


Why this matters

OpenClaw users running multiple daily cron jobs use jobs-state.json as the authoritative source of truth for cron health. Any bug that makes a status:error row look "delivered" causes downstream monitoring loops (e.g., Observer-style health-check agents) to either silently miss real errors, or fabricate fake errors to compensate for the missing signal. Both directions damage trust.


Repro environment

  • OpenClaw 2026.4.25 (c070509) (macOS, LaunchAgent gateway)
  • Cron job set up via openclaw cron add ... with sessionTarget: isolated, wakeMode: now
  • Provider chain: anthropic-proxy (primary) → minimax (fallback), via gateway-side provider router

Happy to provide the full per-run jsonl on request (omitted here to avoid leaking message content).

extent analysis

TL;DR

The cron isolated-agent's fallback chain should be modified to handle failures more robustly, and the lastDelivered field in jobs-state.json should be updated to accurately reflect delivery status.

Guidance

  • Review the fallback policy in src/cron/isolated-agent/run-fallback-policy.ts to consider implementing a retry mechanism or additional fallback providers.
  • Update the lastDelivered field in src/cron/delivery.ts to set lastDelivered: false when the run's terminal status is error and the delivered text matches the incomplete-turn fallback regex.
  • Consider adding a separate lastErrorDelivered field to distinguish between error notifications and actual cron output.
  • Verify the changes by testing the cron job with simulated failures and monitoring the jobs-state.json file for accurate updates.

Example

// src/cron/delivery.ts
if (runStatus === 'error' && deliveredText.match(incompleteTurnRegex)) {
  lastDelivered = false;
}

Notes

The provided information suggests that the issue is related to the fallback chain and the handling of error statuses in the cron job. However, without further details on the implementation, it's difficult to provide a comprehensive solution. The suggested changes aim to address the immediate issues, but additional testing and verification may be necessary to ensure the fixes are effective.

Recommendation

Apply the workaround by updating the lastDelivered field and considering changes to the fallback policy, as this will help to accurately reflect delivery status and improve the robustness of the cron job.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Cron isolated-agent: fallback chain silently delivers incomplete-turn warning, and status=error rows mark delivered=true [1 participants]