openclaw - 💡(How to fix) Fix Stall detector aborts long tool-call chains that are actively making progress (2026.5.26)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Running OpenClaw 2026.5.26 (npm, Windows 10). During an agent task involving a long uninterrupted sequence of Write tool calls (~21 files written across 3 new workspace directories under ~/.openclaw/workspace-<name>/), the diagnostic watcher classified the session as active_work_without_progress and force-aborted the embedded run at the default 360s abort threshold (max(5min, stuckSessionWarnMs × 3)), even though tool calls were firing continuously and committing to disk the entire time.

Error Message

00:57:00 [agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=111 trigger=user useResume=true session=present resumeSession=a683dcc44eeb reuse=reusable historyPrompt=present
00:57:00 [agent/cli-backend] claude live session reuse: provider=claude-cli model=claude-opus-4-7
00:59:07 [diagnostic] stalled session: sessionId=unknown sessionKey=agent:main:main state=processing age=128s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=embedded_run:started lastProgressAge=128s recovery=none
00:59:37 [diagnostic] stalled session: ... age=158s ... lastProgress=embedded_run:started lastProgressAge=158s recovery=none
00:00:07 [diagnostic] stalled session: ... age=188s ... recovery=none
... (repeats every 30s, ages 218/248/278/308/338s, all lastProgress=embedded_run:started) ...
01:03:07 [diagnostic] stalled session: ... age=368s ... lastProgress=embedded_run:started lastProgressAge=368s recovery=checking
01:03:07 [agent/cli-backend] claude live session close: provider=claude-cli model=claude-opus-4-7 reason=abort
01:03:07 [agent/cli-backend] claude live session turn failed: provider=claude-cli model=claude-opus-4-7 durationMs=367432 error=AbortError
01:03:07 [diagnostic] stuck session recovery: sessionId=37f0eb00-d091-4693-84df-40f242152bd7 sessionKey=agent:main:main age=368s action=abort_embedded_run aborted=true drained=true released=0
01:03:07 [diagnostic] stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=37f0eb00-d091-4693-84df-40f242152bd7 sessionKey=agent:main:main activeSessionId=37f0eb00-d091-4693-84df-40f242152bd7 activeWorkKind=embedded_run lane=session:agent:main:main aborted=true drained=true forceCleared=false released=0

Note lastProgress stays pinned at embedded_run:started for the entire 368-second window — the diagnostic watcher never sees any progress event, even though Write tool calls were executing throughout.

Root Cause

Running OpenClaw 2026.5.26 (npm, Windows 10). During an agent task involving a long uninterrupted sequence of Write tool calls (~21 files written across 3 new workspace directories under ~/.openclaw/workspace-<name>/), the diagnostic watcher classified the session as active_work_without_progress and force-aborted the embedded run at the default 360s abort threshold (max(5min, stuckSessionWarnMs × 3)), even though tool calls were firing continuously and committing to disk the entire time.

Fix Action

Workaround

Raise diagnostics.stuckSessionAbortMs in openclaw.json. Setting it to 1800000 (30 min) eliminates the kill in normal long-tool-call runs. But the underlying heuristic gap remains: tool-call activity does not register as progress to the stall detector, so the workaround just moves the cliff.

Code Example

00:57:00 [agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=111 trigger=user useResume=true session=present resumeSession=a683dcc44eeb reuse=reusable historyPrompt=present
00:57:00 [agent/cli-backend] claude live session reuse: provider=claude-cli model=claude-opus-4-7
00:59:07 [diagnostic] stalled session: sessionId=unknown sessionKey=agent:main:main state=processing age=128s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=embedded_run:started lastProgressAge=128s recovery=none
00:59:37 [diagnostic] stalled session: ... age=158s ... lastProgress=embedded_run:started lastProgressAge=158s recovery=none
00:00:07 [diagnostic] stalled session: ... age=188s ... recovery=none
... (repeats every 30s, ages 218/248/278/308/338s, all lastProgress=embedded_run:started) ...
01:03:07 [diagnostic] stalled session: ... age=368s ... lastProgress=embedded_run:started lastProgressAge=368s recovery=checking
01:03:07 [agent/cli-backend] claude live session close: provider=claude-cli model=claude-opus-4-7 reason=abort
01:03:07 [agent/cli-backend] claude live session turn failed: provider=claude-cli model=claude-opus-4-7 durationMs=367432 error=AbortError
01:03:07 [diagnostic] stuck session recovery: sessionId=37f0eb00-d091-4693-84df-40f242152bd7 sessionKey=agent:main:main age=368s action=abort_embedded_run aborted=true drained=true released=0
01:03:07 [diagnostic] stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=37f0eb00-d091-4693-84df-40f242152bd7 sessionKey=agent:main:main activeSessionId=37f0eb00-d091-4693-84df-40f242152bd7 activeWorkKind=embedded_run lane=session:agent:main:main aborted=true drained=true forceCleared=false released=0
RAW_BUFFERClick to expand / collapse

Summary

Running OpenClaw 2026.5.26 (npm, Windows 10). During an agent task involving a long uninterrupted sequence of Write tool calls (~21 files written across 3 new workspace directories under ~/.openclaw/workspace-<name>/), the diagnostic watcher classified the session as active_work_without_progress and force-aborted the embedded run at the default 360s abort threshold (max(5min, stuckSessionWarnMs × 3)), even though tool calls were firing continuously and committing to disk the entire time.

Logs

00:57:00 [agent/cli-backend] cli exec: provider=claude-cli model=opus promptChars=111 trigger=user useResume=true session=present resumeSession=a683dcc44eeb reuse=reusable historyPrompt=present
00:57:00 [agent/cli-backend] claude live session reuse: provider=claude-cli model=claude-opus-4-7
00:59:07 [diagnostic] stalled session: sessionId=unknown sessionKey=agent:main:main state=processing age=128s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=embedded_run:started lastProgressAge=128s recovery=none
00:59:37 [diagnostic] stalled session: ... age=158s ... lastProgress=embedded_run:started lastProgressAge=158s recovery=none
00:00:07 [diagnostic] stalled session: ... age=188s ... recovery=none
... (repeats every 30s, ages 218/248/278/308/338s, all lastProgress=embedded_run:started) ...
01:03:07 [diagnostic] stalled session: ... age=368s ... lastProgress=embedded_run:started lastProgressAge=368s recovery=checking
01:03:07 [agent/cli-backend] claude live session close: provider=claude-cli model=claude-opus-4-7 reason=abort
01:03:07 [agent/cli-backend] claude live session turn failed: provider=claude-cli model=claude-opus-4-7 durationMs=367432 error=AbortError
01:03:07 [diagnostic] stuck session recovery: sessionId=37f0eb00-d091-4693-84df-40f242152bd7 sessionKey=agent:main:main age=368s action=abort_embedded_run aborted=true drained=true released=0
01:03:07 [diagnostic] stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=37f0eb00-d091-4693-84df-40f242152bd7 sessionKey=agent:main:main activeSessionId=37f0eb00-d091-4693-84df-40f242152bd7 activeWorkKind=embedded_run lane=session:agent:main:main aborted=true drained=true forceCleared=false released=0

Note lastProgress stays pinned at embedded_run:started for the entire 368-second window — the diagnostic watcher never sees any progress event, even though Write tool calls were executing throughout.

Expected

Active tool-call execution should count as forward progress. The embedded run is not stalled — it is doing exactly what it was asked to do — but the watcher's progress signal only updates on coarser events (e.g. embedded_run:started/turn boundaries), missing all the work in between.

Actual

lastProgress remains at embedded_run:started for the full duration. At 360s the recovery path fires abortAndDrainEmbeddedPiRun and the live Claude CLI session is terminated mid-flow with error=AbortError. The user-facing turn appears to end abruptly; the next user message is required to resume.

Reproduction shape

Any agent turn that performs a long uninterrupted sequence of tool calls (file writes, multi-step refactors, large restoration tasks) without intervening model turns will hit this. In this case: writing IDENTITY/USER/SOUL/TOOLS files across three new sibling-agent workspace directories.

Workaround

Raise diagnostics.stuckSessionAbortMs in openclaw.json. Setting it to 1800000 (30 min) eliminates the kill in normal long-tool-call runs. But the underlying heuristic gap remains: tool-call activity does not register as progress to the stall detector, so the workaround just moves the cliff.

Related

  • #85826 — same heuristic, different surface (long model calls on local vLLM)
  • #85532 — recovery aborting active embedded runs after terminal-looking app-server progress
  • #85251 — Codex app-server emits notification:turn/started then goes silent

Suggested directions (for maintainer judgment)

  1. Emit a progress event when a tool call starts and/or completes, so lastProgress advances during tool-heavy turns.
  2. Distinguish embedded-run idle (no tool calls in flight) from embedded-run busy (tool call active) in the classifier, and only treat the idle case as active_work_without_progress.
  3. Document diagnostics.stuckSessionAbortMs more prominently in onboarding/upgrade notes, since the default 360s is short enough to bite users on routine long sessions on first encounter.

Environment

  • OpenClaw: 2026.5.26 (npm)
  • OS: Windows 10 (10.0.19045, x64)
  • Node: 24.14.1
  • Provider: claude-cli (Claude Opus 4.7)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Stall detector aborts long tool-call chains that are actively making progress (2026.5.26)