openclaw - 💡(How to fix) Fix Agent run stuck in 'in progress' after agent completes — state machine race condition

StepCodex · 2026-05-30T14:24:31Z

[openclaw] Agent runs remain stuck in "In progress" / "processing" state after the assistant has finished generating its complete response. The final reply is… Agent runs remain stuck in "In progress" / "processing" state after the assistant has finished generating its complete response. The final reply is visible in the conversation, but the run status never transitions to "completed" and the input remains disabled. This cascades: subsequent messages queue behind the stuck run (`reason=queued_behind_active_work`), freezing the session until a gateway restart. ## Description Agent runs remain stuck in "In progress" / "processing" state after the assistant has finished generating its complete response. The final reply is visible in the conversation, but the run status never transitions to "completed" and the input remains disabled. This cascades: subsequent messages queue behind the stuck run (`reason=queued_behind_active_work`), freezing the session until a gateway restart. ## Reproduction 1. Send a message in the OpenClaw dashboard (Main Session) 2. 2. Assistant completes work — tool calls and final reply are visible 3. 3. Run status stays "In progress" at the bottom 4. 4. Input box disabled. Sending another message is blocked Restarting the gateway resets stale runs and restores normal operation. ## Log evidence ``` WARN long-running session: state=processing age=140s queueDepth=1 reason=queued_behind_active_work WARN embedded abort settle timed out: runId=... timeoutMs=2000 WARN LLM timed out with high prompt token usage (77%); attempting compaction before retry WARN contextEngine.compact() threw during timeout recovery: AbortError: This operation was aborted WARN compaction did not reduce context; falling through to normal handling WARN embedded run failover decision: decision=surface_error reason=timeout INFO context-overflow-precheck early recovery route=truncate_tool_results_only completed; retrying prompt ``` ## Root cause The timeout → compaction → abort → failover → surface_error → context-overflow-recovery → retry path successfully recovers the LLM interaction, but the run state machine does not transition to `completed` after the retry. The final response is delivered to the UI, but the internal run state remains at `processing`. ## Environment - OpenClaw version: 2026.5.27 (27ae826) — latest npm as of 2026-05-30 - - Node: 22.22.2 - - - OS: Ubuntu 26.04 - - - - Provider: DeepSeek (deepseek-chat) - - - - - Browser: OpenClaw dashboard WebUI

openclaw2026-05-30 14:24:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Agent runs remain stuck in "In progress" / "processing" state after the assistant has finished generating its complete response. The final reply is visible in the conversation, but the run status never transitions to "completed" and the input remains disabled.

This cascades: subsequent messages queue behind the stuck run (reason=queued_behind_active_work), freezing the session until a gateway restart.

Error Message

WARN long-running session: state=processing age=140s queueDepth=1 reason=queued_behind_active_work WARN embedded abort settle timed out: runId=... timeoutMs=2000 WARN LLM timed out with high prompt token usage (77%); attempting compaction before retry WARN contextEngine.compact() threw during timeout recovery: AbortError: This operation was aborted WARN compaction did not reduce context; falling through to normal handling WARN embedded run failover decision: decision=surface_error reason=timeout INFO context-overflow-precheck early recovery route=truncate_tool_results_only completed; retrying prompt

Root Cause

The timeout → compaction → abort → failover → surface_error → context-overflow-recovery → retry path successfully recovers the LLM interaction, but the run state machine does not transition to completed after the retry. The final response is delivered to the UI, but the internal run state remains at processing.

Code Example

WARN  long-running session: state=processing age=140s queueDepth=1 reason=queued_behind_active_work
WARN  embedded abort settle timed out: runId=... timeoutMs=2000
WARN  LLM timed out with high prompt token usage (77%); attempting compaction before retry
WARN  contextEngine.compact() threw during timeout recovery: AbortError: This operation was aborted
WARN  compaction did not reduce context; falling through to normal handling
WARN  embedded run failover decision: decision=surface_error reason=timeout
INFO  context-overflow-precheck early recovery route=truncate_tool_results_only completed; retrying prompt

RAW_BUFFERClick to expand / collapse

Description

This cascades: subsequent messages queue behind the stuck run (reason=queued_behind_active_work), freezing the session until a gateway restart.

Reproduction

Send a message in the OpenClaw dashboard (Main Session)
1. Assistant completes work — tool calls and final reply are visible
1. Run status stays "In progress" at the bottom
1. Input box disabled. Sending another message is blocked Restarting the gateway resets stale runs and restores normal operation.

Log evidence

WARN  long-running session: state=processing age=140s queueDepth=1 reason=queued_behind_active_work
WARN  embedded abort settle timed out: runId=... timeoutMs=2000
WARN  LLM timed out with high prompt token usage (77%); attempting compaction before retry
WARN  contextEngine.compact() threw during timeout recovery: AbortError: This operation was aborted
WARN  compaction did not reduce context; falling through to normal handling
WARN  embedded run failover decision: decision=surface_error reason=timeout
INFO  context-overflow-precheck early recovery route=truncate_tool_results_only completed; retrying prompt

Root cause

Environment

OpenClaw version: 2026.5.27 (27ae826) — latest npm as of 2026-05-30
- Node: 22.22.2
- - OS: Ubuntu 26.04
- - - Provider: DeepSeek (deepseek-chat)
- - - - Browser: OpenClaw dashboard WebUI

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering