openclaw - 💡(How to fix) Fix Gateway startup race: channels fire before anthropic plugin registers claude-cli harness [4 comments, 5 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71957Fetched 2026-04-27 05:36:53
View on GitHub
Comments
4
Participants
5
Timeline
8
Reactions
1
Author
Timeline (top)
commented ×4subscribed ×2closed ×1cross-referenced ×1

The gateway declares itself "ready" and fires channel startup before the anthropic plugin (which provides the claude-cli agent harness) has finished initializing. This causes immediate channel startup failure on every boot:

[gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
[gateway/channels] channel startup failed: Error: Requested agent harness "claude-cli" 
  is not registered and PI fallback is disabled.

The anthropic plugin loads later (triggered lazily by models.list), but by then the channel has already failed. This leaves the gateway in a degraded state where followup/embedded agent dispatches can fail with the same harness-not-registered error.

Error Message

[gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s) [gateway/channels] channel startup failed: Error: Requested agent harness "claude-cli" is not registered and PI fallback is disabled.

Root Cause

The gateway's plugin loading has two phases:

  1. Startup plugins (memory-core, memory-wiki) — loaded before "ready"
  2. Lazy plugins (anthropic, brave, google, openai, xai) — loaded on-demand, triggered by models.list

Channel startup fires immediately after phase 1, but the primary model's harness (claude-cli) is provided by the anthropic plugin in phase 2. There's no mechanism to defer channel startup until the primary model's harness is available.

Fix Action

Fix / Workaround

The anthropic plugin loads later (triggered lazily by models.list), but by then the channel has already failed. This leaves the gateway in a degraded state where followup/embedded agent dispatches can fail with the same harness-not-registered error.

  1. Followup agent dispatch fails — after a CLI turn completes, the gateway tries to dispatch followup work but the harness is deregistered. All 17 fallback models fail instantly with the same error.
  2. Cascading timeout loop — if the CLI session happens to go quiet (e.g., running a tool that produces no streaming output for 180s), the gateway kills it. The abort deregisters the harness, and every subsequent fallback attempt fails instantly. The gateway cycles through opus-4-7 → gpt-5.5 → gpt-5.4 → gemini → opus-4-6 → ... → all 17 candidates fail in under 1 second.
  3. User sees duplicate/lost messages — the Control UI retries sends, messages appear to vanish, and the session becomes unresponsive until a manual WS disconnect/reconnect.

Code Example

[gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
[gateway/channels] channel startup failed: Error: Requested agent harness "claude-cli" 
  is not registered and PI fallback is disabled.

---

21:23:30 [plugins] memory-core installed bundled runtime deps in 890ms
21:23:31 [plugins] memory-wiki installed bundled runtime deps in 347ms
21:23:32 [gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
21:23:32 [gateway/channels] channel startup failed: "claude-cli" is not registered
  ... 90+ seconds gap ...
  [models.list triggers remaining plugin init]
  [plugins] anthropic installed bundled runtime deps
  [plugins] brave installed bundled runtime deps
  ... etc ...
RAW_BUFFERClick to expand / collapse

Summary

The gateway declares itself "ready" and fires channel startup before the anthropic plugin (which provides the claude-cli agent harness) has finished initializing. This causes immediate channel startup failure on every boot:

[gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
[gateway/channels] channel startup failed: Error: Requested agent harness "claude-cli" 
  is not registered and PI fallback is disabled.

The anthropic plugin loads later (triggered lazily by models.list), but by then the channel has already failed. This leaves the gateway in a degraded state where followup/embedded agent dispatches can fail with the same harness-not-registered error.

Reproduction

  • Environment: WSL2 (Linux 6.6.x), OpenClaw v2026.4.24, gateway as systemd user service
  • Config: claude-cli/claude-opus-4-7 as primary model (requires anthropic plugin for harness)
  • Steps: Restart gateway (systemctl --user restart openclaw-gateway), observe logs
  • Result: Channel startup fails every time. The anthropic plugin loads 60-90+ seconds later when the Control UI triggers models.list.

Gateway log timeline (typical boot)

21:23:30 [plugins] memory-core installed bundled runtime deps in 890ms
21:23:31 [plugins] memory-wiki installed bundled runtime deps in 347ms
21:23:32 [gateway] ready (2 plugins: memory-core, memory-wiki; 3.6s)
21:23:32 [gateway/channels] channel startup failed: "claude-cli" is not registered
  ... 90+ seconds gap ...
  [models.list triggers remaining plugin init]
  [plugins] anthropic installed bundled runtime deps
  [plugins] brave installed bundled runtime deps
  ... etc ...

Impact

When the harness isn't registered at channel startup:

  1. Followup agent dispatch fails — after a CLI turn completes, the gateway tries to dispatch followup work but the harness is deregistered. All 17 fallback models fail instantly with the same error.
  2. Cascading timeout loop — if the CLI session happens to go quiet (e.g., running a tool that produces no streaming output for 180s), the gateway kills it. The abort deregisters the harness, and every subsequent fallback attempt fails instantly. The gateway cycles through opus-4-7 → gpt-5.5 → gpt-5.4 → gemini → opus-4-6 → ... → all 17 candidates fail in under 1 second.
  3. User sees duplicate/lost messages — the Control UI retries sends, messages appear to vanish, and the session becomes unresponsive until a manual WS disconnect/reconnect.

Root cause

The gateway's plugin loading has two phases:

  1. Startup plugins (memory-core, memory-wiki) — loaded before "ready"
  2. Lazy plugins (anthropic, brave, google, openai, xai) — loaded on-demand, triggered by models.list

Channel startup fires immediately after phase 1, but the primary model's harness (claude-cli) is provided by the anthropic plugin in phase 2. There's no mechanism to defer channel startup until the primary model's harness is available.

Suggested fix

Either:

  • Load the primary model's plugin in phase 1 — if the configured primary model requires a specific plugin (e.g., claude-cli/* requires anthropic), ensure that plugin is loaded before declaring "ready"
  • Defer channel startup — don't fire channel startup until the primary model's harness is registered, with a reasonable timeout
  • Retry channel startup — if channel startup fails due to a missing harness, retry after plugin lazy-load completes

Related

The harness deregistration on CLI abort is a separate issue that amplifies this one. When a CLI session is aborted (180s no-output timeout), the claude-cli harness deregisters, and all fallback candidates fail instantly instead of being able to spawn fresh CLI processes.

Environment

OpenClawv2026.4.24
OSWSL2 on Windows (Linux 6.6.87.2-microsoft-standard-WSL2)
Gatewayloopback-only, systemd user service
Primary modelclaude-cli/claude-opus-4-7

extent analysis

TL;DR

The most likely fix is to defer channel startup until the primary model's harness is registered, ensuring the anthropic plugin is loaded before declaring the gateway "ready".

Guidance

  • Load the anthropic plugin in phase 1 (startup plugins) if the primary model requires it, to ensure the claude-cli harness is available before channel startup.
  • Defer channel startup until the primary model's harness is registered, with a reasonable timeout to prevent indefinite waiting.
  • Consider implementing a retry mechanism for channel startup if it fails due to a missing harness, to handle cases where the plugin lazy-load completes after the initial startup attempt.

Example

No explicit code example is provided, but the suggested fix implies modifying the plugin loading sequence or introducing a delay in channel startup to ensure the anthropic plugin is loaded and the claude-cli harness is registered.

Notes

The issue is specific to the interaction between the gateway's plugin loading phases and the channel startup process. The suggested fixes aim to address the root cause by ensuring the primary model's harness is available before channel startup.

Recommendation

Apply a workaround by deferring channel startup until the primary model's harness is registered, as this approach directly addresses the identified root cause and provides a clear path to resolving the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway startup race: channels fire before anthropic plugin registers claude-cli harness [4 comments, 5 participants]