openclaw - 💡(How to fix) Fix Feature: diagnostics stuck-session recovery ladder (steer → nudge → abort)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

Workaround in use

Code Example

{
  diagnostics: {
    stuckSessionWarnMs: 120000,
    stuckSessionAbortMs: 720000,
  stuckSessionRecovery: {
      mode: "steer-then-nudge-then-abort", // | "abort" (current behavior)
      steerMessage: "…",  // optional template
      nudgeMessage: "…",
      recheckMs: 30000,
    }
  }
}
RAW_BUFFERClick to expand / collapse

Problem

Built-in diagnostics stuck-session recovery only supports abort-drain (abort_embedded_run) and release_lane after stuckSessionAbortMs. Before that threshold, stalled embedded runs log recovery=none / observe_only.

For production assistants, aborting a long tool-heavy turn is often the wrong first move. Operators want a configurable ladder:

  1. Steersessions.steer / interrupt active run with a recovery prompt
  2. Nudgesessions.send if still no user-visible progress
  3. Abortsessions.abort / abortAndDrainEmbeddedPiRun only as last resort

Today this requires a separate cron/heartbeat agent, which is fragile (model may ignore instructions, spam channels if misconfigured, etc.).

Proposed config

{
  diagnostics: {
    stuckSessionWarnMs: 120000,
    stuckSessionAbortMs: 720000,
  stuckSessionRecovery: {
      mode: "steer-then-nudge-then-abort", // | "abort" (current behavior)
      steerMessage: "…",  // optional template
      nudgeMessage: "…",
      recheckMs: 30000,
    }
  }
}

Expected behavior

  • On session.stalled / eligible stalled_agent_run, run steer first (same path as sessions.steer / chat abort+inject).
  • Re-check progress after recheckMs; if still stalled, sessions.send nudge.
  • Re-check again; if still stalled and age ≥ stuckSessionAbortMs, fall back to current abort-drain.
  • Emit structured diagnostic events: recovery.requested / recovery.completed with action: steer|nudge|abort.
  • Never deliver recovery messages to external channels unless the session itself would reply there.

Workaround in use

Cron heartbeat every 4m executing the ladder via openclaw gateway call sessions.steer|abort + sessions_send, with gateway abort at 12m as final fallback.

Environment

OpenClaw 2026.5.22, multi-agent gateway (Discord/Telegram), embedded Pi runs, lossless-claw context engine.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • On session.stalled / eligible stalled_agent_run, run steer first (same path as sessions.steer / chat abort+inject).
  • Re-check progress after recheckMs; if still stalled, sessions.send nudge.
  • Re-check again; if still stalled and age ≥ stuckSessionAbortMs, fall back to current abort-drain.
  • Emit structured diagnostic events: recovery.requested / recovery.completed with action: steer|nudge|abort.
  • Never deliver recovery messages to external channels unless the session itself would reply there.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Feature: diagnostics stuck-session recovery ladder (steer → nudge → abort)