openclaw - ✅(Solved) Fix Recovery chain inconsistency after aborted runs: transient abortedLastRun + incomplete-text prefill cause unstable next-turn state [1 pull requests, 3 comments, 1 participants]

X-Ai-Art · 2026-04-07T05:51:31Z

[openclaw] When an embedded run is aborted, the recovery chain can provide the next turn with an unstable state source: - abortedLastRun is persisted on abort,… When an embedded run is aborted, the recovery chain can provide the next turn with an unstable state source: - `abortedLastRun` is persisted on abort, but later overwritten by subsequent turn results (`result.meta.aborted ?? false`) - `reply` injects an aborted hint (`The previous agent run was aborted by the user ...`) - `sanitizeThinkingForRecovery()` treats `incomplete-text` as `prefill: true` This combination appears to let the next turn see **partial assistant text + a transient aborted marker**, which can lead to inconsistent self-reporting and unstable follow-up behavior. # PR #62346: Avoid prefilling incomplete assistant text during recovery - Repository: openclaw/openclaw - Author: X-Ai-Art - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/62346 ## Description (problem / solution / changelog) ## Summary When the last assistant turn is classified as `incomplete-text`, the current recovery behavior keeps that partial assistant body and sets `prefill: true`. This PR changes that path to: - drop the incomplete assistant message from recovery input - return `prefill: false` The goal is to avoid feeding the next turn a partial assistant body in aborted/incomplete recovery situations, which can otherwise contribute to unstable next-turn state semantics. ## Why Local controlled experiments reproduced a recovery inconsistency after aborted runs: - sometimes the next turn claims the previous aborted output completed - sometimes it cannot reliably identify the previous aborted output as its own last turn - different models expose the issue differently, but the runtime recovery chain appears to be the common factor Relevant issue: #62322 ## Tests Targeted test updated and passing: ```bash node node_modules/vitest/vitest.mjs run src/agents/pi-embedded-runner/thinking.test.ts ``` Observed result: 15/15 passing. ## Notes This is intentionally a small behavior change focused on the `incomplete-text` recovery path only. It does not attempt to redesign `abortedLastRun` persistence or broader reply/session recovery semantics in this PR. ## Changed files - `src/agents/pi-embedded-runner/thinking.test.ts` (modified, +4/-3) - `src/agents/pi-embedded-runner/thinking.ts` (modified, +7/-1) ## Fixed - Fixed by PR: Avoid prefilling incomplete assistant text during recovery (https://github.com/openclaw/openclaw/pull/62346) ## Summary When an embedded run is aborted, the recovery chain can provide the next turn with an unstable state source: - `abortedLastRun` is persisted on abort, but later overwritten by subsequent turn results (`result.meta.aborted ?? false`) - `reply` injects an aborted hint (`The previous agent run was aborted by the user ...`) - `sanitizeThinkingForRecovery()` treats `incomplete-text` as `prefill: true` This combination appears to let the next turn see **partial assistant text + a transient aborted marker**, which can lead to inconsistent self-reporting and unstable follow-up behavior. ## Observed behavior Using controlled sacrificial-session experiments: 1. A long assistant output is aborted 2. A follow-up is sent in the same session 3. The follow-up sometimes: - claims the previous response completed successfully when it did not - cannot reliably tell whether the previous aborted output was its own last turn - becomes aborted itself before giving a stable answer This reproduces as a **soft recovery-state inconsistency**, even when hard stuck-lane behavior does not reproduce. ## Why this looks like a runtime issue rather than just model behavior We tested multiple models with the same pattern: - MiniMax-M2.7 tends to "fill narrative gaps" - gemma-4-31b-it is less agentic but still unstable under aborted follow-ups - a lite model is more direct, but can still make incorrect state judgments So model style changes the symptom shape, but the root cause still appears to be in the recovery/runtime chain. ## Relevant implementation points (current main also appears affected) - `src/agents/pi-embedded-runner/thinking.ts` - `RecoveryAssessment = "valid" | "incomplete-thinking" | "incomplete-text"` - `sanitizeThinkingForRecovery(...)` - `incomplete-text` path uses `prefill: true` - `src/auto-reply/reply/body.ts` - `abortedLastRun` is converted into an aborted hint for the next reply - `src/agents/command/session-store.ts` - `next.abortedLastRun = result.meta.aborted ?? false` - `src/process/command-queue.ts` - `lane wait exceeded` appears diagnostic, not self-healing ## Suggested direction At minimum, please consider whether one of these should change: 1. When `abortedLastRun=true`, do **not** allow `incomplete-text -> prefill: true` 2. Preserve abort state as a stable recovery fact instead of a one-turn transient flag 3. Prefer structured recovery facts over partial assistant-body continuation when the previous run was abor

openclaw2026-04-07 05:51:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62322•Fetched 2026-04-08 03:06:01

View on GitHub

Comments

Participants

Timeline

Reactions

Author

X-Ai-Art

Participants

X-Ai-Art

Timeline (top)

commented ×3cross-referenced ×1

When an embedded run is aborted, the recovery chain can provide the next turn with an unstable state source:

abortedLastRun is persisted on abort, but later overwritten by subsequent turn results (result.meta.aborted ?? false)
reply injects an aborted hint (The previous agent run was aborted by the user ...)
sanitizeThinkingForRecovery() treats incomplete-text as prefill: true

This combination appears to let the next turn see partial assistant text + a transient aborted marker, which can lead to inconsistent self-reporting and unstable follow-up behavior.

Error Message

Using controlled sacrificial-session experiments:

Root Cause

So model style changes the symptom shape, but the root cause still appears to be in the recovery/runtime chain.

Fix Action

Fixed

Fixed by PR: Avoid prefilling incomplete assistant text during recovery (https://github.com/openclaw/openclaw/pull/62346)

PR fix notes

PR #62346: Avoid prefilling incomplete assistant text during recovery

Repository: openclaw/openclaw
Author: X-Ai-Art
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/62346

Description (problem / solution / changelog)

Summary

When the last assistant turn is classified as incomplete-text, the current recovery behavior keeps that partial assistant body and sets prefill: true.

This PR changes that path to:

drop the incomplete assistant message from recovery input
return prefill: false

The goal is to avoid feeding the next turn a partial assistant body in aborted/incomplete recovery situations, which can otherwise contribute to unstable next-turn state semantics.

Why

Local controlled experiments reproduced a recovery inconsistency after aborted runs:

sometimes the next turn claims the previous aborted output completed
sometimes it cannot reliably identify the previous aborted output as its own last turn
different models expose the issue differently, but the runtime recovery chain appears to be the common factor

Relevant issue: #62322

Tests

Targeted test updated and passing:

node node_modules/vitest/vitest.mjs run src/agents/pi-embedded-runner/thinking.test.ts

Observed result: 15/15 passing.

Notes

This is intentionally a small behavior change focused on the incomplete-text recovery path only. It does not attempt to redesign abortedLastRun persistence or broader reply/session recovery semantics in this PR.

Changed files

src/agents/pi-embedded-runner/thinking.test.ts (modified, +4/-3)
src/agents/pi-embedded-runner/thinking.ts (modified, +7/-1)

RAW_BUFFERClick to expand / collapse

Summary

When an embedded run is aborted, the recovery chain can provide the next turn with an unstable state source:

abortedLastRun is persisted on abort, but later overwritten by subsequent turn results (result.meta.aborted ?? false)
reply injects an aborted hint (The previous agent run was aborted by the user ...)
sanitizeThinkingForRecovery() treats incomplete-text as prefill: true

This combination appears to let the next turn see partial assistant text + a transient aborted marker, which can lead to inconsistent self-reporting and unstable follow-up behavior.

Observed behavior

Using controlled sacrificial-session experiments:

A long assistant output is aborted
A follow-up is sent in the same session
The follow-up sometimes:
- claims the previous response completed successfully when it did not
- cannot reliably tell whether the previous aborted output was its own last turn
- becomes aborted itself before giving a stable answer

This reproduces as a soft recovery-state inconsistency, even when hard stuck-lane behavior does not reproduce.

Why this looks like a runtime issue rather than just model behavior

We tested multiple models with the same pattern:

MiniMax-M2.7 tends to "fill narrative gaps"
gemma-4-31b-it is less agentic but still unstable under aborted follow-ups
a lite model is more direct, but can still make incorrect state judgments

So model style changes the symptom shape, but the root cause still appears to be in the recovery/runtime chain.

Relevant implementation points (current main also appears affected)

src/agents/pi-embedded-runner/thinking.ts
- RecoveryAssessment = "valid" | "incomplete-thinking" | "incomplete-text"
- sanitizeThinkingForRecovery(...)
- incomplete-text path uses prefill: true
src/auto-reply/reply/body.ts
- abortedLastRun is converted into an aborted hint for the next reply
src/agents/command/session-store.ts
- next.abortedLastRun = result.meta.aborted ?? false
src/process/command-queue.ts
- lane wait exceeded appears diagnostic, not self-healing

Suggested direction

At minimum, please consider whether one of these should change:

When abortedLastRun=true, do not allow incomplete-text -> prefill: true
Preserve abort state as a stable recovery fact instead of a one-turn transient flag
Prefer structured recovery facts over partial assistant-body continuation when the previous run was aborted

Environment

Local installed version: 2026.4.5
Latest release checked: v2026.4.5
Main branch logic still appears consistent with the above paths

Related but not identical

I saw #54964 (zombie session after embedded init failure), which seems adjacent, but this report is specifically about aborted-run recovery semantics and inconsistent next-turn state.

extent analysis

TL;DR

The most likely fix involves modifying the recovery chain to preserve abort state as a stable recovery fact and handle incomplete-text paths differently when abortedLastRun is true.

Guidance

Review the sanitizeThinkingForRecovery function in src/agents/pi-embedded-runner/thinking.ts to ensure it correctly handles incomplete-text when abortedLastRun is true, potentially by not setting prefill: true in such cases.
Consider preserving the abort state as a stable recovery fact instead of overwriting it with subsequent turn results, as indicated by the line next.abortedLastRun = result.meta.aborted ?? false in src/agents/command/session-store.ts.
Evaluate the logic in src/auto-reply/reply/body.ts where abortedLastRun is converted into an aborted hint for the next reply, to ensure it does not contribute to the inconsistent state.
Investigate the lane wait exceeded diagnostic in src/process/command-queue.ts to understand if it provides any insights into the recovery chain's behavior during aborted runs.

Example

// Example modification to sanitizeThinkingForRecovery function
function sanitizeThinkingForRecovery(thinking: any, abortedLastRun: boolean) {
  if (abortedLastRun && thinking.type === 'incomplete-text') {
    // Do not set prefill: true when the last run was aborted
    return { ...thinking, prefill: false };
  }
  // Existing logic for other cases
}

Notes

The provided suggestions aim to address the inconsistent next-turn state after an aborted run. However, the exact implementation details may vary based on the specific requirements and constraints of the system. It's crucial to test these modifications thoroughly to ensure they resolve the issue without introducing new problems.

Recommendation

Apply a workaround by modifying the sanitizeThinkingForRecovery function and preserving the abort state as a stable recovery fact, as these changes directly address the identified inconsistencies in the recovery chain's behavior.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#ssr #cache issue #memory leak #API versioning #request timeout

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Recovery chain inconsistency after aborted runs: transient abortedLastRun + incomplete-text prefill cause unstable next-turn state [1 pull requests, 3 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fixed

PR fix notes

PR #62346: Avoid prefilling incomplete assistant text during recovery

Description (problem / solution / changelog)

Summary

Why

Tests

Notes

Changed files

Summary

Observed behavior

Why this looks like a runtime issue rather than just model behavior

Relevant implementation points (current main also appears affected)

Suggested direction

Environment

Related but not identical

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING