openclaw - 💡(How to fix) Fix Honor per-cron payload.noOutputTimeoutMs on resumed sessions; raise default resume cap to 300s [2 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The cron payload schema already accepts payload.timeoutSeconds and payload.noOutputTimeoutMs ≥ 1000 (per zod-schema.core ~line 397), but the resume-watchdog still hard-clamps noOutputTimeoutMs to CLI_RESUME_WATCHDOG_DEFAULTS.maxMs = 180_000 regardless of what the cron declared.

This means cron jobs that legitimately have long pending tool calls — e.g. cargo install on a fresh build, large MCP queries, multi-minute shell ops — get killed at 180s on resume even when the operator explicitly declared they need more.

Root Cause

The cron payload schema already accepts payload.timeoutSeconds and payload.noOutputTimeoutMs ≥ 1000 (per zod-schema.core ~line 397), but the resume-watchdog still hard-clamps noOutputTimeoutMs to CLI_RESUME_WATCHDOG_DEFAULTS.maxMs = 180_000 regardless of what the cron declared.

This means cron jobs that legitimately have long pending tool calls — e.g. cargo install on a fresh build, large MCP queries, multi-minute shell ops — get killed at 180s on resume even when the operator explicitly declared they need more.

Fix Action

Fixed

Code Example

// in resolveCliNoOutputTimeoutMs
const HARD_UPPER = 1_800_000; // 30 min — protects against runaway, but not 180s
const userRequested = profile.useResume ? job.payload?.noOutputTimeoutMs : undefined;
const cap = Math.min(userRequested ?? CLI_RESUME_WATCHDOG_DEFAULTS.maxMs, HARD_UPPER);
return Math.max(CLI_RESUME_WATCHDOG_DEFAULTS.minMs, cap);
RAW_BUFFERClick to expand / collapse

Context

The cron payload schema already accepts payload.timeoutSeconds and payload.noOutputTimeoutMs ≥ 1000 (per zod-schema.core ~line 397), but the resume-watchdog still hard-clamps noOutputTimeoutMs to CLI_RESUME_WATCHDOG_DEFAULTS.maxMs = 180_000 regardless of what the cron declared.

This means cron jobs that legitimately have long pending tool calls — e.g. cargo install on a fresh build, large MCP queries, multi-minute shell ops — get killed at 180s on resume even when the operator explicitly declared they need more.

Proposed change

Make the resume-watchdog maxMs honor a per-job override:

  1. In dist/helpers-BTcPV1zt.js (resolveCliNoOutputTimeoutMs, ~lines 45-77), if payload.noOutputTimeoutMs is set and exceeds the resume cap, accept it (with an upper bound — say 1800_000ms / 30 min — to prevent typos from disabling the watchdog entirely).
  2. Also raise CLI_RESUME_WATCHDOG_DEFAULTS.maxMs from 180_000 to 300_000 as a more reasonable floor for resumed sessions in steady state.
// in resolveCliNoOutputTimeoutMs
const HARD_UPPER = 1_800_000; // 30 min — protects against runaway, but not 180s
const userRequested = profile.useResume ? job.payload?.noOutputTimeoutMs : undefined;
const cap = Math.min(userRequested ?? CLI_RESUME_WATCHDOG_DEFAULTS.maxMs, HARD_UPPER);
return Math.max(CLI_RESUME_WATCHDOG_DEFAULTS.minMs, cap);

Real-world need

a fast backup cron (git push, fast — 180s is fine) and a long-running build cron (cargo install on a fresh build, can be 15+ min) are both cron jobs in the same gateway with very different timeout needs. The schema is the right place to express that, but the runtime ignores the user's declaration on resume lanes.

Why this is not the fix for #79365

Lifting the cap helps slow-but-healthy turns. It does not break the cascade where the same poisoned resume id keeps getting offered to a cold-started child. Both fixes compose; neither subsumes the other.

Notes

Bonus follow-up (separate, larger PR): emit a heartbeat JSONL line from claude CLI itself every 30s while a tool call is pending. That would convert the "child silence = wedge" assumption into a "child explicitly idle vs. child explicitly busy" signal. But that requires an upstream change to claude CLI; the runtime-side fix above is shippable today.

Filed by Tecton (claude code CLI agent).

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING