openclaw - 💡(How to fix) Fix Cron isolated-agent: fallback chain silently delivers incomplete-turn warning, and status=error rows mark delivered=true [1 participants]

openclaw2026-04-27 18:53:51

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72985•Fetched 2026-04-28 06:29:09

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Allenbluff

Participants

Allenbluff

Error Message

Cron isolated-agent: provider-fallback failures silently deliver "Agent couldn't generate a response" + status=error rows mark delivered=true

(P1) The fallback chain has no second-stage retry / no proactive alert. The error text is silently delivered to the user's chat as if it were a normal cron output.
(P2) The cron run is recorded in jobs-state.json with lastRunStatus: "error" and lastDelivered: true simultaneously, which corrupts downstream monitoring (Observer agent reading jobs-state.json cannot tell whether the user actually got useful content). "status": "error", "error": "⚠️ Agent couldn't generate a response. Please try again.", | 04-27 | minimax | MiniMax-M2.7-highspeed | error | 180s | "lastRunStatus": "error", lastRunStatus: "error" AND lastDelivered: true co-existing is the contradiction described in P2 below. The downstream impact: a monitoring agent (Observer) reading jobs-state.json and runs/<jobId>.jsonl cannot tell from lastRunStatus: error alone what category of failure this was. We had to dig into the per-run jsonl to spot delivery.fallbackUsed: true and the provider switch from anthropic-proxy to minimax.

P2 — `status: "error"` + `delivered: true` simultaneously is a monitoring contradiction

The fallback's failure text ("Agent couldn't generate a response") is treated as a successful "delivery" because it was sent to the configured channel without the channel itself raising an error.
jobs-state.json records both "lastRunStatus": "error" and "lastDelivered": true. A monitoring agent that reads jobs-state.json to compute system health sees rows where delivered: true (✅ "user got something") and status: error (🔴 "the run failed") at the same time. Without reading per-run trajectories, it cannot reconcile the two. In our deployment, an LLM monitoring agent reading these contradictory signals fabricated a "3 tasks reported errors" narrative when ground truth was "1 task delivered the fallback's failure-text"; the other 2 tasks the LLM hallucinated as errors were actually status: ok.
(a) When lastRunStatus === "error", force lastDelivered: false. The "delivery" should only count if the run produced real assistant content; an error fallback text doesn't count.
(b) Add a separate lastErrorDelivered: true field to distinguish "we sent an error notification" from "we sent the actual cron output". Update monitoring docs accordingly. The error text comes from resolveIncompleteTurnPayloadText in src/agents/pi-embedded-runner/run/incomplete-turn.ts, which fires when:
stopReason === "error" In the 04-27 case, model: "MiniMax-M2.7-highspeed" likely returned stopReason: "error" or empty payload after fallback, which triggered the warning text. The cron delivery layer (src/cron/delivery.ts and src/cron/isolated-agent/delivery-dispatch.ts) then shipped that warning text as if it were the cron output, marking delivered: true. The simplest single-line fix for P2 is in src/cron/delivery.ts (or wherever the lastDelivered field is set): when the run's terminal status is error AND the delivered text matches the incomplete-turn fallback regex, set lastDelivered: false. OpenClaw users running multiple daily cron jobs use jobs-state.json as the authoritative source of truth for cron health. Any bug that makes a status:error row look "delivered" causes downstream monitoring loops (e.g., Observer-style health-check agents) to either silently miss real errors, or fabricate fake errors to compensate for the missing signal. Both directions damage trust.

Root Cause

OpenClaw users running multiple daily cron jobs use jobs-state.json as the authoritative source of truth for cron health. Any bug that makes a status:error row look "delivered" causes downstream monitoring loops (e.g., Observer-style health-check agents) to either silently miss real errors, or fabricate fake errors to compensate for the missing signal. Both directions damage trust.

Fix Action

Fix / Workaround

In the 04-27 case, model: "MiniMax-M2.7-highspeed" likely returned stopReason: "error" or empty payload after fallback, which triggered the warning text. The cron delivery layer (src/cron/delivery.ts and src/cron/isolated-agent/delivery-dispatch.ts) then shipped that warning text as if it were the cron output, marking delivered: true.

Code Example

{
  "ts": 1777251780303,
  "jobId": "926bfa97-b097-4be5-8a95-fef83c29353e",
  "action": "finished",
  "status": "error",
  "error": "⚠️ Agent couldn't generate a response. Please try again.",
  "summary": "⚠️ Agent couldn't generate a response. Please try again.",
  "delivered": true,
  "deliveryStatus": "delivered",
  "delivery": {
    "intended": { "channel": "feishu", "to": "user:ou_edc…" },
    "resolved": { "ok": true, "channel": "feishu", "to": "user:ou_edc…" },
    "fallbackUsed": true,
    "delivered": true
  },
  "durationMs": 180233,
  "model": "MiniMax-M2.7-highspeed",
  "provider": "minimax"
}

---

"926bfa97-b097-4be5-8a95-fef83c29353e": {
  "state": {
    "lastRunStatus": "error",
    "lastError": "⚠️ Agent couldn't generate a response. Please try again.",
    "lastDeliveryStatus": "delivered",
    "lastDelivered": true,
    "consecutiveErrors": 1,
    "lastDurationMs": 180233
  }
}

RAW_BUFFERClick to expand / collapse

Cron isolated-agent: provider-fallback failures silently deliver "Agent couldn't generate a response" + status=error rows mark delivered=true

TL;DR

When a cron job's primary provider (e.g. anthropic-proxy) is temporarily unavailable, OpenClaw's cron isolated-agent runner falls back to the configured backup provider (e.g. minimax). If the backup provider then also fails to produce a visible response, the run terminates with the pi-embedded-runner incomplete-turn fallback text "⚠️ Agent couldn't generate a response. Please try again.", but:

(P1) The fallback chain has no second-stage retry / no proactive alert. The error text is silently delivered to the user's chat as if it were a normal cron output.
(P2) The cron run is recorded in jobs-state.json with lastRunStatus: "error" and lastDelivered: true simultaneously, which corrupts downstream monitoring (Observer agent reading jobs-state.json cannot tell whether the user actually got useful content).

Both bugs were observed in production on 2026-04-27 and caused a downstream Observer agent to fabricate "3 cron tasks reported errors simultaneously" when in reality only 1 cron run had genuinely failed.

Reproduction (real production trace)

Job: 926bfa97-b097-4be5-8a95-fef83c29353e (ContentCrew 选题立项, scheduled 30 9 * * * Asia/Shanghai)

Run on 2026-04-27 09:30 CST — entry from ~/.openclaw/cron/runs/926bfa97-…jsonl:

{
  "ts": 1777251780303,
  "jobId": "926bfa97-b097-4be5-8a95-fef83c29353e",
  "action": "finished",
  "status": "error",
  "error": "⚠️ Agent couldn't generate a response. Please try again.",
  "summary": "⚠️ Agent couldn't generate a response. Please try again.",
  "delivered": true,
  "deliveryStatus": "delivered",
  "delivery": {
    "intended": { "channel": "feishu", "to": "user:ou_edc…" },
    "resolved": { "ok": true, "channel": "feishu", "to": "user:ou_edc…" },
    "fallbackUsed": true,
    "delivered": true
  },
  "durationMs": 180233,
  "model": "MiniMax-M2.7-highspeed",
  "provider": "minimax"
}

Same jobId historical context (model + provider + status):

date	provider	model	status	duration
04-22	anthropic-proxy	claude-opus-4-6	ok	302s
04-25	anthropic-proxy	claude-sonnet-4-6	ok	583s
04-26	anthropic-proxy	claude-sonnet-4-6	ok	72s
04-27	minimax	MiniMax-M2.7-highspeed	error	180s

The 04-27 run is the only one that swung to the fallback provider; that run failed with pi-embedded-runner's incomplete-turn fallback text (src/agents/pi-embedded-runner/run/incomplete-turn.ts:233-244).

~/.openclaw/cron/jobs-state.json after the run:

"926bfa97-b097-4be5-8a95-fef83c29353e": {
  "state": {
    "lastRunStatus": "error",
    "lastError": "⚠️ Agent couldn't generate a response. Please try again.",
    "lastDeliveryStatus": "delivered",
    "lastDelivered": true,
    "consecutiveErrors": 1,
    "lastDurationMs": 180233
  }
}

lastRunStatus: "error" AND lastDelivered: true co-existing is the contradiction described in P2 below.

P1 — fallback chain is single-shot and silent

Current behavior:

Primary provider fails → fallback provider invoked once → if fallback provider also produces an incomplete-turn situation, the run ends with the pi-embedded-runner fallback text shipped to the user.
No same-provider retry on the fallback path.
No additional fallback to a third provider.
No telemetry / alert distinguishing "this output is the fallback's failure text" from "this output is real assistant content".

Expected behavior (one of):

(a) When delivery.fallbackUsed === true AND pi-embedded-runner falls through to the incomplete-turn warning, retry the same provider once before giving up (180s here is far below typical timeouts; the failure smells transient).
(b) When fallback also fails, suppress the user-visible "Agent couldn't generate a response" delivery, and instead post a system-level alert to the configured alert channel ("ContentCrew 09:30 cron failed: primary anthropic-proxy unavailable, fallback minimax also returned empty after 180s").
(c) Surface the fallbackUsed flag prominently in the cron run record so monitoring agents can distinguish "a successful run via fallback" from "a failed run that fell off the fallback path".

The downstream impact: a monitoring agent (Observer) reading jobs-state.json and runs/<jobId>.jsonl cannot tell from lastRunStatus: error alone what category of failure this was. We had to dig into the per-run jsonl to spot delivery.fallbackUsed: true and the provider switch from anthropic-proxy to minimax.

P2 — `status: "error"` + `delivered: true` simultaneously is a monitoring contradiction

Current behavior:

The fallback's failure text ("Agent couldn't generate a response") is treated as a successful "delivery" because it was sent to the configured channel without the channel itself raising an error.
jobs-state.json records both "lastRunStatus": "error" and "lastDelivered": true.

Why this is wrong: A monitoring agent that reads jobs-state.json to compute system health sees rows where delivered: true (✅ "user got something") and status: error (🔴 "the run failed") at the same time. Without reading per-run trajectories, it cannot reconcile the two. In our deployment, an LLM monitoring agent reading these contradictory signals fabricated a "3 tasks reported errors" narrative when ground truth was "1 task delivered the fallback's failure-text"; the other 2 tasks the LLM hallucinated as errors were actually status: ok.

Expected behavior (one of):

(a) When lastRunStatus === "error", force lastDelivered: false. The "delivery" should only count if the run produced real assistant content; an error fallback text doesn't count.
(b) Add a separate lastErrorDelivered: true field to distinguish "we sent an error notification" from "we sent the actual cron output". Update monitoring docs accordingly.

Root cause hypothesis (please verify)

The error text comes from resolveIncompleteTurnPayloadText in src/agents/pi-embedded-runner/run/incomplete-turn.ts, which fires when:

stopReason === "toolUse" without visible text, OR
reasoning-only assistant turn, OR
empty response, OR
stopReason === "error"

The simplest single-line fix for P2 is in src/cron/delivery.ts (or wherever the lastDelivered field is set): when the run's terminal status is error AND the delivered text matches the incomplete-turn fallback regex, set lastDelivered: false.

P1 is more involved — it would require touching the fallback policy in src/cron/isolated-agent/run-fallback-policy.ts.

Why this matters

Repro environment

OpenClaw 2026.4.25 (c070509) (macOS, LaunchAgent gateway)
Cron job set up via openclaw cron add ... with sessionTarget: isolated, wakeMode: now
Provider chain: anthropic-proxy (primary) → minimax (fallback), via gateway-side provider router

Happy to provide the full per-run jsonl on request (omitted here to avoid leaking message content).

extent analysis

TL;DR

The cron isolated-agent's fallback chain should be modified to handle failures more robustly, and the lastDelivered field in jobs-state.json should be updated to accurately reflect delivery status.

Guidance

Review the fallback policy in src/cron/isolated-agent/run-fallback-policy.ts to consider implementing a retry mechanism or additional fallback providers.
Update the lastDelivered field in src/cron/delivery.ts to set lastDelivered: false when the run's terminal status is error and the delivered text matches the incomplete-turn fallback regex.
Consider adding a separate lastErrorDelivered field to distinguish between error notifications and actual cron output.
Verify the changes by testing the cron job with simulated failures and monitoring the jobs-state.json file for accurate updates.

Example

// src/cron/delivery.ts
if (runStatus === 'error' && deliveredText.match(incompleteTurnRegex)) {
  lastDelivered = false;
}

Notes

The provided information suggests that the issue is related to the fallback chain and the handling of error statuses in the cron job. However, without further details on the implementation, it's difficult to provide a comprehensive solution. The suggested changes aim to address the immediate issues, but additional testing and verification may be necessary to ensure the fixes are effective.

Recommendation

Apply the workaround by updating the lastDelivered field and considering changes to the fallback policy, as this will help to accurately reflect delivery status and improve the robustness of the cron job.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#docker error #permission error #memory optimization #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Cron isolated-agent: fallback chain silently delivers incomplete-turn warning, and status=error rows mark delivered=true [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Cron isolated-agent: provider-fallback failures silently deliver "Agent couldn't generate a response" + status=error rows mark delivered=true

P2 — `status: "error"` + `delivered: true` simultaneously is a monitoring contradiction

Root Cause

Fix Action

Fix / Workaround

Code Example

Cron isolated-agent: provider-fallback failures silently deliver "Agent couldn't generate a response" + status=error rows mark delivered=true

TL;DR

Reproduction (real production trace)

P1 — fallback chain is single-shot and silent

P2 — `status: "error"` + `delivered: true` simultaneously is a monitoring contradiction

Root cause hypothesis (please verify)

Why this matters

Repro environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Cron isolated-agent: fallback chain silently delivers incomplete-turn warning, and status=error rows mark delivered=true [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Cron isolated-agent: provider-fallback failures silently deliver "Agent couldn't generate a response" + status=error rows mark delivered=true

P2 — status: "error" + delivered: true simultaneously is a monitoring contradiction

Root Cause

Fix Action

Fix / Workaround

Code Example

Cron isolated-agent: provider-fallback failures silently deliver "Agent couldn't generate a response" + status=error rows mark delivered=true

TL;DR

Reproduction (real production trace)

P1 — fallback chain is single-shot and silent

P2 — status: "error" + delivered: true simultaneously is a monitoring contradiction

Root cause hypothesis (please verify)

Why this matters

Repro environment

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING

P2 — `status: "error"` + `delivered: true` simultaneously is a monitoring contradiction

P2 — `status: "error"` + `delivered: true` simultaneously is a monitoring contradiction