openclaw - 💡(How to fix) Fix Non-deterministic timeout enforcement in isolated agentTurn runner (timeoutSeconds unreliable) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#74681Fetched 2026-04-30 06:21:23
View on GitHub
Comments
2
Participants
2
Timeline
2
Reactions
2
Author
Timeline (top)
commented ×2

timeoutSeconds on cron agentTurn jobs is not reliably enforced in isolated session runs. The timeout timer appears to be deferred until a callback fires rather than set at a fixed wall time, resulting in both under-enforcement and over-enforcement of configured timeout values.

Root Cause

Root cause (per CTO investigation)

Fix Action

Fix / Workaround

Workarounds in use

  • Acronis Alert Poll (15-min recurring) disabled to prevent zombie session stacking
  • Friday Weekly Review gated — will not fire unattended until patch confirmed
  • Timeout thresholds increased as a partial mitigation (acknowledged as workaround, not fix)
RAW_BUFFERClick to expand / collapse

Bug Report: Non-deterministic timeout enforcement in isolated agent runner

OpenClaw version: 2026.4.26 Date observed: 2026-04-30 Environment: macOS Darwin 25.4.0 (arm64), node v22.22.1

Summary

timeoutSeconds on cron agentTurn jobs is not reliably enforced in isolated session runs. The timeout timer appears to be deferred until a callback fires rather than set at a fixed wall time, resulting in both under-enforcement and over-enforcement of configured timeout values.

Data points

JobConfigured timeoutSecondsActual death timeDelta
Brand Development (ollama-remote/qwen3:14b)300s (stale; now 600s)300,232ms+232ms (matches stale config exactly)
Session Cleanup (claude-haiku-4-5)120s (stale; now 600s)120,022ms+22ms (matches stale config exactly)
Acronis Alert Poll (claude-haiku-4-5)300s (current, not stale)393,124ms+93,124ms overshoot (~31%)

Root cause (per CTO investigation)

Timeout timer is callback-deferred — it fires when the callback fires, not at the configured wall time. This explains:

  • Jobs 1 & 2: timer fired at the moment a previous stale value was read (not the current config)
  • Job 3: timer fired late when an async internal operation completed ~93s after the wall time

Impact

  • Scope is system-wide — all isolated agentTurn sessions are affected, not just cron jobs
  • Confirmed: a subagent session running the investigation itself timed out non-deterministically at ~292s
  • No reliable kill signal — jobs can overshoot by 30%+ or die at stale config values
  • Fallback chains do not fire on process timeout (only on model API errors), so silent failure is the outcome

Workarounds in use

  • Acronis Alert Poll (15-min recurring) disabled to prevent zombie session stacking
  • Friday Weekly Review gated — will not fire unattended until patch confirmed
  • Timeout thresholds increased as a partial mitigation (acknowledged as workaround, not fix)

Requested fix

Enforce timeoutSeconds as a hard wall-time timer set at job start, independent of callback state. The timer should fire and kill the session at startTime + timeoutSeconds regardless of any async operations in flight.

extent analysis

TL;DR

Enforce timeoutSeconds as a hard wall-time timer set at job start to fix non-deterministic timeout enforcement in isolated agent runner.

Guidance

  • Review the current implementation of the timeout timer to understand how it is being deferred until a callback fires, and identify the necessary changes to set the timer at a fixed wall time.
  • Consider using a scheduling library or a timer that is not dependent on callbacks to enforce the timeoutSeconds configuration.
  • Verify that the new implementation correctly handles async operations and ensures that the timer fires at the expected time, regardless of the state of these operations.
  • Test the updated implementation with various job configurations and async operation scenarios to ensure that the timeout is enforced correctly in all cases.

Example

// Pseudo-code example of setting a wall-time timer
const timeoutSeconds = 300; // example timeout value
const startTime = Date.now();
const timer = setTimeout(() => {
  // kill the session
}, timeoutSeconds * 1000);

Notes

The provided example is a simplified illustration and may not be directly applicable to the actual implementation. The key is to ensure that the timer is set based on the wall time, rather than being deferred until a callback fires.

Recommendation

Apply a workaround by increasing the timeout thresholds until a patch can be confirmed, as the current implementation is not reliably enforcing the configured timeout values.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Non-deterministic timeout enforcement in isolated agentTurn runner (timeoutSeconds unreliable) [2 comments, 2 participants]