openclaw - 💡(How to fix) Fix MissingAgentHarnessError after stall recovery on long-running sessions

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The claude-cli harness becomes permanently deregistered after OpenClaw's stall detector fires on a long-running session. Every subsequent inbound message fails instantly with MissingAgentHarnessError until the gateway is fully restarted.

Error Message

09:17:33 stalled session: age=148s classification=stalled_agent_run reason=active_work_without_progress 09:18:03 stalled session: age=178s ... 09:19:03 stalled session: age=238s ... 09:20:03 stalled session: age=298s ... (7 stall detections over ~3 min) 09:20:48 claude live session turn: durationMs=361772 rawLines=196 ← completed successfully 09:20:52 lane task error: lane=main durationMs=8 error="MissingAgentHarnessError: Requested agent harness 'claude-cli' is not registered." 09:20:54 lane task error: lane=session:agent:main:main durationMs=11 ...

Root Cause

Root Cause Hypothesis

Code Example

09:17:33  stalled session: age=148s classification=stalled_agent_run reason=active_work_without_progress
09:18:03  stalled session: age=178s ...
09:19:03  stalled session: age=238s ...
09:20:03  stalled session: age=298s ...  (7 stall detections over ~3 min)
09:20:48  claude live session turn: durationMs=361772 rawLines=196  ← completed successfully
09:20:52  lane task error: lane=main durationMs=8 error="MissingAgentHarnessError: Requested agent harness 'claude-cli' is not registered."
09:20:54  lane task error: lane=session:agent:main:main durationMs=11 ...
RAW_BUFFERClick to expand / collapse

Bug Report

Version: 2026.5.20 Platform: macOS, claude-cli harness, OAuth mode


Summary

The claude-cli harness becomes permanently deregistered after OpenClaw's stall detector fires on a long-running session. Every subsequent inbound message fails instantly with MissingAgentHarnessError until the gateway is fully restarted.

Reproducible Pattern

  1. Agent starts processing a large payload (multiple PDFs, zip files, complex multi-step tasks) — takes 5–6+ minutes
  2. Stall detector fires repeatedly at ~2.5 min intervals (classification=stalled_agent_run, reason=active_work_without_progress)
  3. Session self-completes successfully (durationMs=361772, rawLines=196)
  4. Next inbound message → instant MissingAgentHarnessError (durationMs=8ms) — harness is gone
  5. Gateway requires a full restart to recover

This has happened 4+ times in one day under normal usage (sending PDFs, zip files, and long multi-step tasks via Telegram).

Log Sequence

09:17:33  stalled session: age=148s classification=stalled_agent_run reason=active_work_without_progress
09:18:03  stalled session: age=178s ...
09:19:03  stalled session: age=238s ...
09:20:03  stalled session: age=298s ...  (7 stall detections over ~3 min)
09:20:48  claude live session turn: durationMs=361772 rawLines=196  ← completed successfully
09:20:52  lane task error: lane=main durationMs=8 error="MissingAgentHarnessError: Requested agent harness 'claude-cli' is not registered."
09:20:54  lane task error: lane=session:agent:main:main durationMs=11 ...

After restart, a secondary symptom: sessions.json becomes bloated (100KB+) and blocks the harness from re-registering on the next startup. Deleting it restores normal operation.

Root Cause Hypothesis

The stall recovery logic partially tears down the harness registration during its abort attempt, even when the session eventually self-resolves. The harness is left in an inconsistent state that persists until gateway restart.

Additional Context

  • The activeSessions=2 condition (heartbeat + user session overlapping) consistently precedes these failures
  • durationMs=8 on the error confirms the harness isn't even attempted — it's a registration check failure, not a runtime failure
  • A secondary bug: bloated sessions.json (100KB+) after a stall prevents harness registration on the next gateway startup

Request

  1. Primary: Fix stall recovery to not corrupt harness state when the session self-completes
  2. Secondary: Expose a configurable stall timeout (current ~2.5 min is too short for large file processing tasks)
  3. Secondary: Prevent sessions.json bloat from blocking harness registration on startup

Happy to share full logs if helpful.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix MissingAgentHarnessError after stall recovery on long-running sessions