openclaw - 💡(How to fix) Fix MissingAgentHarnessError race: non-atomic harness registry clear+restore during plugin cache cycle

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Symptoms: Lane-spawn for Discord channel events fails with MissingAgentHarnessError: Requested agent harness "claude-cli" is not registered. even though claude-cli sessions are actively running before and after the failure window.

Frequency: 28 distinct incident windows in 72h on our production instance (~9.3/day). User-visible in Discord channels as the error message surfaced in-channel.

Error Message

05:21:13 [agent/cli-backend] claude live session close: reason=restart 05:21:13 [agent/cli-backend] claude live session start: activeSessions=5 ← clearActivatedPluginRuntimeState() → clear → restore (gap) → 05:22:12 [diagnostic] lane task error: lane=session:agent:main:discord:channel:... durationMs=19 error="MissingAgentHarnessError: Requested agent harness "claude-cli" is not registered." 05:22:49 [agent/cli-backend] cli exec ... reuse=reusable ← succeeds normally

Root Cause

Root cause (from reading the dist)

Fix Action

Fix / Workaround

Both trigger on the plugin loader cache cycle, which fires on:

  • First agent dispatch after gateway restart (cold-start: plugin cache lazy-loads)
  • Every reason=restart session close+start cycle in a live gateway

Option C: Gate channel event dispatch on a harness-readiness signal — a GET /gateway/ready endpoint that 503s until harnesses are registered, allowing the Discord plugin to queue or retry.

Code Example

05:21:13 [agent/cli-backend] claude live session close: reason=restart
05:21:13 [agent/cli-backend] claude live session start: activeSessions=5
clearActivatedPluginRuntimeState() → clear → restore (gap)05:22:12 [diagnostic] lane task error: lane=session:agent:main:discord:channel:... durationMs=19 error="MissingAgentHarnessError: Requested agent harness \"claude-cli\" is not registered."
05:22:49 [agent/cli-backend] cli exec ... reuse=reusable  ← succeeds normally

---

[model-fallback/decision] candidate_failed ... detail=Requested agent harness "claude-cli" is not registered.

---

// Instead of:
map.clear();
for (const entry of entries) map.set(entry.harness.id, entry);

// Use atomic swap:
const newMap = new Map(entries.map(e => [e.harness.id, e]));
getAgentHarnessRegistryState().harnesses = newMap;
RAW_BUFFERClick to expand / collapse

Summary

Symptoms: Lane-spawn for Discord channel events fails with MissingAgentHarnessError: Requested agent harness "claude-cli" is not registered. even though claude-cli sessions are actively running before and after the failure window.

Frequency: 28 distinct incident windows in 72h on our production instance (~9.3/day). User-visible in Discord channels as the error message surfaced in-channel.

Root cause (from reading the dist)

The error is thrown in selection.js → selectAgentHarness() when the registry Map is empty. Two code paths in the plugin loader clear this Map:

  1. restoreRegisteredAgentHarnesses(entries)non-atomic: map.clear() then re-populates via for (const entry of entries) map.set(...). Window between clear and full repopulation is the race.
  2. clearActivatedPluginRuntimeState() → clearAgentHarnesses() → map.clear() — no immediate repopulation. Called from clearPluginLoaderCache().

Both trigger on the plugin loader cache cycle, which fires on:

  • First agent dispatch after gateway restart (cold-start: plugin cache lazy-loads)
  • Every reason=restart session close+start cycle in a live gateway

Observed log pattern

05:21:13 [agent/cli-backend] claude live session close: reason=restart
05:21:13 [agent/cli-backend] claude live session start: activeSessions=5
    ← clearActivatedPluginRuntimeState() → clear → restore (gap) →
05:22:12 [diagnostic] lane task error: lane=session:agent:main:discord:channel:... durationMs=19 error="MissingAgentHarnessError: Requested agent harness \"claude-cli\" is not registered."
05:22:49 [agent/cli-backend] cli exec ... reuse=reusable  ← succeeds normally

Cold-start window can be 5-6 minutes. Session-restart window is 19ms-133ms but happens on every heartbeat cycle.

Model-fallback amplification

Each incident generates 4-6 error lines because the model-fallback also tries claude-cli and fails:

[model-fallback/decision] candidate_failed ... detail=Requested agent harness "claude-cli" is not registered.

Requested fixes

Option A (preferred): Add retry-on-MissingAgentHarnessError in the lane-spawn path with bounded backoff (3 retries × 200ms = 600ms total). If harness comes back within 600ms, the original Discord message succeeds without user-visible error.

Option B: Make restoreRegisteredAgentHarnesses atomic — build a new Map and swap the reference instead of clear-then-populate:

// Instead of:
map.clear();
for (const entry of entries) map.set(entry.harness.id, entry);

// Use atomic swap:
const newMap = new Map(entries.map(e => [e.harness.id, e]));
getAgentHarnessRegistryState().harnesses = newMap;

Option C: Gate channel event dispatch on a harness-readiness signal — a GET /gateway/ready endpoint that 503s until harnesses are registered, allowing the Discord plugin to queue or retry.

Version

openclaw gateway v2026.5.12, claude-cli harness, Ubuntu 22.04.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix MissingAgentHarnessError race: non-atomic harness registry clear+restore during plugin cache cycle