openclaw - ✅(Solved) Fix Recovery chain inconsistency after aborted runs: transient abortedLastRun + incomplete-text prefill cause unstable next-turn state [1 pull requests, 3 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62322Fetched 2026-04-08 03:06:01
View on GitHub
Comments
3
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
commented ×3cross-referenced ×1

When an embedded run is aborted, the recovery chain can provide the next turn with an unstable state source:

  • abortedLastRun is persisted on abort, but later overwritten by subsequent turn results (result.meta.aborted ?? false)
  • reply injects an aborted hint (The previous agent run was aborted by the user ...)
  • sanitizeThinkingForRecovery() treats incomplete-text as prefill: true

This combination appears to let the next turn see partial assistant text + a transient aborted marker, which can lead to inconsistent self-reporting and unstable follow-up behavior.

Error Message

Using controlled sacrificial-session experiments:

Root Cause

So model style changes the symptom shape, but the root cause still appears to be in the recovery/runtime chain.

Fix Action

Fixed

PR fix notes

PR #62346: Avoid prefilling incomplete assistant text during recovery

Description (problem / solution / changelog)

Summary

When the last assistant turn is classified as incomplete-text, the current recovery behavior keeps that partial assistant body and sets prefill: true.

This PR changes that path to:

  • drop the incomplete assistant message from recovery input
  • return prefill: false

The goal is to avoid feeding the next turn a partial assistant body in aborted/incomplete recovery situations, which can otherwise contribute to unstable next-turn state semantics.

Why

Local controlled experiments reproduced a recovery inconsistency after aborted runs:

  • sometimes the next turn claims the previous aborted output completed
  • sometimes it cannot reliably identify the previous aborted output as its own last turn
  • different models expose the issue differently, but the runtime recovery chain appears to be the common factor

Relevant issue: #62322

Tests

Targeted test updated and passing:

node node_modules/vitest/vitest.mjs run src/agents/pi-embedded-runner/thinking.test.ts

Observed result: 15/15 passing.

Notes

This is intentionally a small behavior change focused on the incomplete-text recovery path only. It does not attempt to redesign abortedLastRun persistence or broader reply/session recovery semantics in this PR.

Changed files

  • src/agents/pi-embedded-runner/thinking.test.ts (modified, +4/-3)
  • src/agents/pi-embedded-runner/thinking.ts (modified, +7/-1)
RAW_BUFFERClick to expand / collapse

Summary

When an embedded run is aborted, the recovery chain can provide the next turn with an unstable state source:

  • abortedLastRun is persisted on abort, but later overwritten by subsequent turn results (result.meta.aborted ?? false)
  • reply injects an aborted hint (The previous agent run was aborted by the user ...)
  • sanitizeThinkingForRecovery() treats incomplete-text as prefill: true

This combination appears to let the next turn see partial assistant text + a transient aborted marker, which can lead to inconsistent self-reporting and unstable follow-up behavior.

Observed behavior

Using controlled sacrificial-session experiments:

  1. A long assistant output is aborted
  2. A follow-up is sent in the same session
  3. The follow-up sometimes:
    • claims the previous response completed successfully when it did not
    • cannot reliably tell whether the previous aborted output was its own last turn
    • becomes aborted itself before giving a stable answer

This reproduces as a soft recovery-state inconsistency, even when hard stuck-lane behavior does not reproduce.

Why this looks like a runtime issue rather than just model behavior

We tested multiple models with the same pattern:

  • MiniMax-M2.7 tends to "fill narrative gaps"
  • gemma-4-31b-it is less agentic but still unstable under aborted follow-ups
  • a lite model is more direct, but can still make incorrect state judgments

So model style changes the symptom shape, but the root cause still appears to be in the recovery/runtime chain.

Relevant implementation points (current main also appears affected)

  • src/agents/pi-embedded-runner/thinking.ts
    • RecoveryAssessment = "valid" | "incomplete-thinking" | "incomplete-text"
    • sanitizeThinkingForRecovery(...)
    • incomplete-text path uses prefill: true
  • src/auto-reply/reply/body.ts
    • abortedLastRun is converted into an aborted hint for the next reply
  • src/agents/command/session-store.ts
    • next.abortedLastRun = result.meta.aborted ?? false
  • src/process/command-queue.ts
    • lane wait exceeded appears diagnostic, not self-healing

Suggested direction

At minimum, please consider whether one of these should change:

  1. When abortedLastRun=true, do not allow incomplete-text -> prefill: true
  2. Preserve abort state as a stable recovery fact instead of a one-turn transient flag
  3. Prefer structured recovery facts over partial assistant-body continuation when the previous run was aborted

Environment

  • Local installed version: 2026.4.5
  • Latest release checked: v2026.4.5
  • Main branch logic still appears consistent with the above paths

Related but not identical

I saw #54964 (zombie session after embedded init failure), which seems adjacent, but this report is specifically about aborted-run recovery semantics and inconsistent next-turn state.

extent analysis

TL;DR

The most likely fix involves modifying the recovery chain to preserve abort state as a stable recovery fact and handle incomplete-text paths differently when abortedLastRun is true.

Guidance

  • Review the sanitizeThinkingForRecovery function in src/agents/pi-embedded-runner/thinking.ts to ensure it correctly handles incomplete-text when abortedLastRun is true, potentially by not setting prefill: true in such cases.
  • Consider preserving the abort state as a stable recovery fact instead of overwriting it with subsequent turn results, as indicated by the line next.abortedLastRun = result.meta.aborted ?? false in src/agents/command/session-store.ts.
  • Evaluate the logic in src/auto-reply/reply/body.ts where abortedLastRun is converted into an aborted hint for the next reply, to ensure it does not contribute to the inconsistent state.
  • Investigate the lane wait exceeded diagnostic in src/process/command-queue.ts to understand if it provides any insights into the recovery chain's behavior during aborted runs.

Example

// Example modification to sanitizeThinkingForRecovery function
function sanitizeThinkingForRecovery(thinking: any, abortedLastRun: boolean) {
  if (abortedLastRun && thinking.type === 'incomplete-text') {
    // Do not set prefill: true when the last run was aborted
    return { ...thinking, prefill: false };
  }
  // Existing logic for other cases
}

Notes

The provided suggestions aim to address the inconsistent next-turn state after an aborted run. However, the exact implementation details may vary based on the specific requirements and constraints of the system. It's crucial to test these modifications thoroughly to ensure they resolve the issue without introducing new problems.

Recommendation

Apply a workaround by modifying the sanitizeThinkingForRecovery function and preserving the abort state as a stable recovery fact, as these changes directly address the identified inconsistencies in the recovery chain's behavior.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING