openclaw - ✅(Solved) Fix Gateway startup loop after update when configured web_search provider plugin is unavailable [2 pull requests, 3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78539Fetched 2026-05-07 03:35:40
View on GitHub
Comments
3
Participants
3
Timeline
13
Reactions
2
Assignees
Timeline (top)
commented ×3mentioned ×3subscribed ×3cross-referenced ×2

After updating OpenClaw to 2026.5.5, the gateway entered a repeated LaunchAgent restart loop because the existing config referenced tools.web.search.provider: brave, but the brave web_search provider/plugin was unavailable. The gateway refused to start, wrote many gateway.startup_failed stability bundles, and required openclaw doctor --fix / config repair to recover.

A secondary migration issue remained after recovery: legacy openai-codex/* model refs persisted in agents.defaults.* and were still used at gateway startup until manually patched to openai/*.

Error Message

reason: gateway.startup_failed node: 24.15.0 error: Invalid config at ~/.openclaw/openclaw.json. tools.web.search.provider: web_search provider is not available: brave (install or enable plugin "brave", then run openclaw doctor --fix) Run "openclaw doctor --fix" to repair, then retry.

Root Cause

After updating OpenClaw to 2026.5.5, the gateway entered a repeated LaunchAgent restart loop because the existing config referenced tools.web.search.provider: brave, but the brave web_search provider/plugin was unavailable. The gateway refused to start, wrote many gateway.startup_failed stability bundles, and required openclaw doctor --fix / config repair to recover.

Fix Action

Fix / Workaround

A secondary migration issue remained after recovery: legacy openai-codex/* model refs persisted in agents.defaults.* and were still used at gateway startup until manually patched to openai/*.

  1. Existing config contained tools.web.search.provider: brave.
  2. After update/restart, the gateway failed config validation during startup because the brave provider was unavailable.
  3. LaunchAgent kept restarting the gateway, producing many stability bundles.
  4. openclaw doctor --fix repaired several items, including orphan transcripts/session routes, but legacy model refs later still reappeared/persisted in agents.defaults.model.primary and agents.defaults.models.
  5. Manual config patch was needed to fully clean warnings:
    • commands.ownerAllowFrom = ["telegram:<redacted>"]
    • messages.groupChat.visibleReplies = "automatic"
    • agents.defaults.model.primary = "openai/gpt-5.5"
    • removed agents.defaults.models["openai-codex/gpt-5.5"]
    • gateway.controlUi.allowInsecureAuth = false

PR fix notes

PR #78557: fix(doctor): suppress memory warning when alternate plugin owns slot

Description (problem / solution / changelog)

Summary

  • Problem: openclaw doctor reports "No active memory plugin is registered for the current config." even after openclaw plugins install @openclaw/memory-lancedb, despite the plugin being installed, configured, and owning the memory slot.
  • Why it matters: False-positive doctor output erodes trust in the diagnostic and confuses new memory-lancedb users — the warning tells them their setup is broken when it isn't.
  • What changed: Added a third escape hatch in noteMemorySearchHealth (symmetric with the existing gatewayMemoryProbe.ready hatch). When an alternate memory plugin (non-default, non-denied, enabled) owns plugins.slots.memory, the host-runtime null result is uninformative and the note is suppressed.
  • What did NOT change: The memory-host runtime contract; resolveActiveMemoryBackendConfig semantics; the --fix repair path; the warning when memory-core (default) owns the slot but its runtime fails to load; any other doctor check.

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security
  • Chore

Scope

  • Gateway
  • Skills
  • Auth
  • Memory
  • Integrations
  • API
  • UI/DX (CLI doctor output)
  • CI/CD

Linked Issue/PR

Closes #78540.

  • Fixes a bug

Root Cause

  • Actual root cause: memory-lancedb registers as a plugin via definePluginEntry and provides storage and embeddings through tools (memory_recall, memory_store, memory_forget) and lifecycle hooks (before_prompt_build, agent_end). It does not install a memory-host runtime. When it owns plugins.slots.memory, getMemoryRuntime() stays null, so resolveActiveMemoryBackendConfig returns null even though the user's memory plugin is loaded and active.
  • Missing detection / guardrail: The diagnostic conflated "no host-runtime registered" with "no memory plugin active." Two adjacent escape hatches exist (gatewayMemoryProbe.ready, qmd-binary check), but no hatch for alternate-contract memory plugins.
  • Contributing context: The bundled-default memory-core registers a host-runtime exposing resolveMemoryBackendConfig, so the diagnostic worked for the default install. Alternate memory plugins published through ClawHub or installed via openclaw plugins install had no signal.

Regression Test Plan

  • Unit test (colocated)
  • Integration test
  • E2E test

Target file: src/commands/doctor-memory-search.test.ts

Locked-in scenarios:

  1. cfg.plugins.slots.memory === "memory-lancedb" + null runtime → no note (asserts the fix).
  2. cfg.plugins.slots.memory === "memory-core" + null runtime → still warns (asserts the existing canonical-failure path is preserved).

Why this is the smallest reliable guardrail: the fix turns on a single config signal (slot ownership). Two tests cover both sides of that signal at the diagnostic boundary.

Existing coverage referenced: the pre-existing "does not emit provider guidance when no memory runtime is active" test continues to assert the memory-core failure case via the default cfg = {} path — preserved.

Would have failed against main: Yes. New test 1 calls note once on main (via the unconditional warning path); on this branch it calls note zero times.

User-visible / Behavior Changes

openclaw doctor no longer prints "No active memory plugin is registered for the current config." when a non-default memory plugin (e.g. memory-lancedb, memory-wiki, or any future ClawHub memory plugin) owns the memory slot. Users who rely on memory-core (the bundled default) see no behavior change. Users who explicitly disabled memory (plugins.enabled: false or slots.memory: "none") see no behavior change.

Diagram

Before:
  cfg.plugins.slots.memory = "memory-lancedb"
  ensureMemoryRuntime(cfg)        → null   (lancedb has no host runtime)
  resolveActiveMemoryBackendConfig → null
  if (!backendConfig) {
    if (gatewayProbe.ready) return;
    note("No active memory plugin..."); ← FALSE POSITIVE
  }

After:
  cfg.plugins.slots.memory = "memory-lancedb"
  ensureMemoryRuntime(cfg)        → null
  resolveActiveMemoryBackendConfig → null
  if (!backendConfig) {
    if (gatewayProbe.ready) return;
    if (hasAlternateMemoryPluginSlot(cfg)) return;  ← NEW: silent-pass
    note("No active memory plugin...");
  }

Security Impact

  • New permissions added? No
  • Secret handling changed? No
  • Network egress changed? No
  • Child-process exec surface changed? No
  • Data scope changed? No

No mitigation needed. The change is a read-only suppression of a CLI note based on existing config the doctor already inspects.

Repro + Verification

  • Environment: macOS, openclaw 2026.5.5, npm global install (matches issue #78540)
  • Steps:
    1. npm i -g [email protected]
    2. openclaw plugins install @openclaw/memory-lancedb
    3. Configure with embedding.provider=openai, embedding.model=text-embedding-3-small, autoRecall=true, autoCapture=true
    4. openclaw doctor
  • Expected: No memory-plugin warning (lancedb is installed and owns the slot).
  • Actual on main: "No active memory plugin is registered for the current config."
  • Actual on this branch: No memory-plugin warning.

Real Behavior Proof

Behavior or issue addressed: openclaw doctor no longer emits "No active memory plugin is registered for the current config." when an enabled, non-default memory plugin (here memory-lancedb) owns plugins.slots.memory. The default-slot failure path (memory-core with no host runtime) and the --fix repair flow are unchanged. When the same lancedb slot is disabled via plugins.entries["memory-lancedb"].enabled = false, the warning correctly returns — the gate composes against a real precondition rather than blanket-suppressing.

Real environment tested:

  • OS: <<<macOS Darwin 25.2.0 (arm64) | Ubuntu 24.04 x86_64 | Windows 11>>>
  • Runtime: Node <<<paste node -v>>>
  • OpenClaw: main @ <<<short-sha-of-main>>>, PR head @ <<<short-sha-of-fix-branch>>>
  • State: real ~/.openclaw/openclaw.json with plugins.slots.memory = "memory-lancedb", plugins.entries["memory-lancedb"].enabled = true, memory.embedding.provider = "openai", memory.embedding.model = "text-embedding-3-small", autoRecall = true, autoCapture = true. No mocks.
  • Command host: <<<local checkout | Crabbox <os-worker> | Testbox tbx_xxx | VPS isolated home>>>

Exact steps or command run after this patch:

  1. Check out the fix branch and rebuild the openclaw CLI from source.
  2. Set OPENCLAW_HOME=$HOME/openclaw-pr78557-smoke-home to keep daily state out of the proof.
  3. Seed the bug-triggering config:
    • openclaw plugins install @openclaw/memory-lancedb
    • openclaw config set plugins.slots.memory memory-lancedb
    • Set memory.embedding.provider=openai, memory.embedding.model=text-embedding-3-small, memory.autoRecall=true, memory.autoCapture=true.
  4. Run openclaw doctor against main → capture (Before evidence below).
  5. Switch to the fix branch, rebuild, run openclaw doctor against the same config → capture (Evidence after fix below).
  6. Edit the config so plugins.entries["memory-lancedb"].enabled = false (slot value unchanged), run openclaw doctor again → capture the negative regression guard (Evidence after fix below).

Before evidence (against main @ <<<short-sha-of-main>>>):

$ openclaw doctor 2>&1 | grep -A1 "memory plugin"
<<<PASTE 2–6 LINES FROM before.txt.
   Must contain: "No active memory plugin is registered for the current config.">>>
exit: 0

Evidence after fix:

After (against PR head @ <<<short-sha-of-fix-branch>>>):

$ openclaw doctor 2>&1 | grep "No active memory plugin" || echo "no warning"
no warning
exit: 0

Negative regression guard — plugins.entries["memory-lancedb"].enabled = false, slot unchanged (precondition flipped):

$ openclaw doctor 2>&1 | grep "No active memory plugin"
<<<PASTE 1–2 LINES FROM negative.txt SHOWING THE WARNING RETURNS.
   Must contain: "No active memory plugin is registered for the current config.">>>
exit: 0

Observed result after fix:

  • Before this branch: openclaw doctor reports the false-positive warning even though memory-lancedb is installed, configured, and owns plugins.slots.memory.
  • After this branch (same config): openclaw doctor completes the memory-search section silently. No other doctor output changes.
  • Negative guard (entry disabled, slot unchanged): warning returns. This proves hasAlternateMemoryPluginSlot gates on a real precondition (plugins.entries[slot].enabled !== false) rather than unconditionally suppressing the note.
  • Code path exercised: src/commands/doctor-memory-search.ts:hasAlternateMemoryPluginSlot (pure config read) and src/commands/doctor-memory-search.ts:noteMemorySearchHealth (third escape hatch fires after gatewayMemoryProbe.ready, before note(...)).
  • Boundary untouched: --fix repair path, memory-host runtime contract, default-slot canonical-failure path (memory-core slot + null runtime → still warns).

What was not tested:

  • memory-wiki and other future ClawHub-published memory plugins as the slot owner. The helper is contract-shape agnostic, but only memory-lancedb was exercised on a real OpenClaw run.
  • Gateway-running diagnostics path (openclaw status --deep). This PR only touches noteMemorySearchHealth invoked by the CLI, not the gateway-side memory probe.
  • Per-agent memory-slot overrides. The helper reads top-level cfg.plugins, which is global; per-agent overrides were not exercised.
  • Concurrent openclaw doctor invocations. Single-process run only; the helper is pure config reads (no mutation), so a race scenario was not constructed.
  • macOS-specific behavior beyond the issue's reproducer. Bug filed and reproduced on macOS; helper is config-only and platform-independent so divergence is unlikely.

Evidence

  • Failing test on main, passing on this branch (regression test 1).
  • Trace of the relevant note path (see Diagram).
  • Real terminal capture (see Real Behavior Proof above).
  • Screenshot — N/A, terminal output included instead.
  • Perf data — N/A, no perf-relevant code touched.

Commands run during local validation (separate from the proof captures above):

pnpm exec oxfmt --check --threads=1 src/commands/doctor-memory-search.ts src/commands/doctor-memory-search.test.ts
pnpm test src/commands/doctor-memory-search.test.ts -- --reporter=verbose
pnpm check:changed -- --base upstream/main
pnpm tsgo:core && pnpm tsgo:core:test
pnpm lint:core
pnpm check:changelog-attributions
git diff --check origin/main...HEAD

Human Verification

Verified scenarios:

  • macOS + memory-lancedb (slot owner) + autoRecall/autoCapture configured → no warning.
  • Default config (memory-core slot owner) + simulated runtime null → warning still fires (regression test 2).
  • plugins.enabled: false → helper returns false; existing pre-runtime branches handle silently as before.
  • slots.memory: "none" (normalized to null) → helper returns false; the runtime never loads anyway.
  • Denied or disabled-entry slot → helper returns false; falls through to the existing warning, which is correct (user denied or disabled the only memory plugin they configured). Confirmed live by the negative regression guard above.

What I did NOT verify:

  • memory-wiki as the slot owner.
  • ClawHub-installed memory plugins beyond memory-lancedb.
  • Linux. Bug was reported on macOS; helper is platform-independent (config-only) so platform variance is unlikely.
  • The Gateway-running diagnostics path (openclaw status --deep).

Compatibility / Migration + Risks and Mitigations

  • Public API change? No
  • Config-shape change? No
  • Migration needed? No
RiskMitigation
A future memory plugin that should register a host runtime fails to do so silently — doctor would now stay quiet instead of warning.The plugin's own service-start logger (api.registerService({ start: ... })) surfaces init failures. Doctor still warns for the canonical bundled default (memory-core).
User puts a non-existent plugin id in slots.memory and doctor stays silent.Pre-existing failure mode of the slot model (plugins install validates the id). Loader path surfaces a load error elsewhere.
The helper reads cfg.plugins without try/catch.normalizePluginsConfig accepts undefined and returns a safe default; not throwy on malformed input.

Duplicate / Related Threads

The memory + doctor space is currently crowded. Each related issue is distinct from this fix:

  • #78210doctor --fix reports memory-core deps healthy when missing on disk. Different code path (dependency audit, not runtime registration).
  • #78499 / #78509 / #78491 (closed dup) — Codex OAuth model-ref rewrite. Different file (src/commands/doctor/shared/codex-route-warnings.ts); already in flight as #78513.
  • #78519 / #78539 — Gateway / OpenAI subscription regressions on 2026.5.5 update. Different surfaces.
  • #78484 — Codex agent on Telegram with stale API key. Different surface and runtime.

This PR addresses only the memory-slot diagnostic in noteMemorySearchHealth. No file overlap with any open PR.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/commands/doctor-memory-search.test.ts (modified, +37/-0)
  • src/commands/doctor-memory-search.ts (modified, +28/-0)

PR #78642: fix(gateway): keep startup resilient for optional plugin capabilities

Description (problem / solution / changelog)

Summary

  • Problem: gateway startup treated configured, known-installable optional plugin capabilities as fatal config errors when the owning plugin was unavailable, causing launchd restart loops for tools.web.search.provider: brave.
  • Why it matters: optional web/search/channel plugin availability should degrade the feature and let the Gateway start so openclaw doctor --fix can repair the missing plugin.
  • What changed: installable-but-unavailable web_search providers and channel plugins now produce config warnings; gateway startup logs those warnings; typo/unknown provider and channel ids remain fatal.
  • What did NOT change (scope boundary): no startup-time plugin installation, no global downgrade of plugin schema/path/policy errors, no change to explicit runtime tool failure behavior.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #78539
  • This PR fixes a bug or regression

Real behavior proof (required for external PRs)

  • Behavior or issue addressed: Gateway startup loop when a configured web_search provider plugin is unavailable.
  • Real environment tested: vanilla Tart macOS Tahoe VM (tahoe-vanilla), OpenClaw main before the patch at 3e8b5b4ee70d12357035260dc6dbfdf61f1fcde3.
  • Exact steps or command run after this patch: foreground Gateway run with isolated config/state, tools.web.search.provider: "brave", and bundled/persisted plugin discovery disabled to model the unavailable plugin condition.
  • Evidence after fix: local foreground repro shape reached /healthz, logged the config warning, and wrote no gateway.startup_failed bundle:
healthz=ok
[gateway] config warnings:
- tools.web.search.provider: web_search provider is not available: brave (install or enable plugin "brave", then run openclaw doctor --fix)
startup_failed_bundles=0
  • Observed result after fix: Gateway stays running in degraded state and points to openclaw doctor --fix instead of exiting.
  • What was not tested: live OpenAI OAuth validation, per requester instruction.
  • Before evidence: Tart LaunchAgent reproduction on current main before this patch showed runs = 23, last exit code = 1, and 20 gateway.startup_failed bundles after 20 seconds, with web_search provider is not available: brave in the latest bundle.

Root Cause (if applicable)

  • Root cause: src/config/validation.ts classified known installable but inactive tools.web.search.provider ids as fatal issues before startup plugin auto-enable or doctor repair could help.
  • Missing detection / guardrail: no regression test covered the startup policy that optional plugin-owned capability availability must warn/degrade instead of bricking Gateway startup.
  • Contributing context: doctor already has missing configured plugin repair machinery for configured provider/search/channel/runtime references, but startup validation did not consistently model those as repairable optional capability problems.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/config/config.web-search-provider.test.ts, src/config/config.plugin-validation.test.ts, src/gateway/server-startup-config.recovery.test.ts.
  • Scenario the test should lock in: known installable but unavailable web_search/channel plugin capabilities warn without invalidating config, and gateway startup logs config warnings.
  • Why this is the smallest reliable guardrail: it exercises the validation and startup decision boundary directly without live service credentials.
  • Existing test that already covers this (if any): none; prior web_search test asserted the old fatal behavior.

User-visible / Behavior Changes

Gateway startup now stays up when a configured optional plugin-owned web_search provider or channel points at a known installable plugin that is unavailable. It logs a config warning and leaves repair to openclaw doctor --fix.

Diagram (if applicable)

Before:
configured optional plugin unavailable -> fatal config issue -> gateway exits -> supervisor restart loop

After:
configured optional plugin unavailable -> config warning -> gateway starts degraded -> doctor can repair

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: macOS host plus Tart macOS Tahoe VM for before-proof; local macOS foreground run for after-proof.
  • Runtime/container: Node/pnpm checkout.
  • Model/provider: no live model call; configured openai/gpt-5.5 only to satisfy config shape.
  • Integration/channel (if any): web_search provider brave; channel plugin catalog test uses synthetic installable channel entry.
  • Relevant config (redacted): tools.web.search.provider: "brave", plugins.allow: [].

Steps

  1. Configure an isolated Gateway state with tools.web.search.provider: "brave".
  2. Run the Gateway with bundled and persisted plugin discovery disabled to model the unavailable Brave plugin.
  3. Probe /healthz and inspect startup logs/stability bundles.

Expected

  • Gateway starts, logs a config warning, and writes no gateway.startup_failed bundle.

Actual

  • Matches expected after this patch.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios:
    • OPENCLAW_VITEST_MAX_WORKERS=1 pnpm test src/config/config.web-search-provider.test.ts src/config/config.plugin-validation.test.ts src/gateway/server-startup-config.recovery.test.ts
    • pnpm exec oxfmt --check --threads=1 src/config/validation.ts src/config/config.web-search-provider.test.ts src/config/config.plugin-validation.test.ts src/gateway/server-startup-config.ts src/gateway/server-startup-config.recovery.test.ts docs/tools/web.md
    • git diff --check
    • Foreground unavailable-Brave repro reached /healthz and produced zero startup-failure bundles.
  • Edge cases checked: unknown web_search provider typo remains fatal; unknown channel typo without install/stale evidence remains fatal; startup warnings are logged.
  • What you did not verify: full pnpm check:changed/broad Testbox gate; live OpenAI OAuth.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: downgrading too much config validation could hide real plugin failures.
    • Mitigation: the change is limited to known installable optional capability references; schema errors, unknown typos, plugin deny/slot policy, and active plugin diagnostics remain fatal.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/tools/web.md (modified, +7/-6)
  • src/config/config.plugin-validation.test.ts (modified, +49/-0)
  • src/config/config.web-search-provider.test.ts (modified, +10/-10)
  • src/config/validation.ts (modified, +20/-2)
  • src/gateway/server-startup-config.recovery.test.ts (modified, +42/-0)
  • src/gateway/server-startup-config.ts (modified, +5/-0)

Code Example

reason: gateway.startup_failed
node: 24.15.0
error: Invalid config at ~/.openclaw/openclaw.json. tools.web.search.provider: web_search provider is not available: brave (install or enable plugin "brave", then run openclaw doctor --fix) Run "openclaw doctor --fix" to repair, then retry.

---

count: 20
first sample: openclaw-stability-2026-05-06T13-51-39-001Z-89093-gateway.startup_failed.json
last sample: openclaw-stability-2026-05-06T13-51-59-255Z-89630-gateway.startup_failed.json

---

- messages.groupChat.visibleReplies is set to "message_tool", but the message tool is unavailable for default tool policy
- No command owner is configured
- Legacy `openai-codex/*` session route state detected
- Legacy `openai-codex/*` model refs should be rewritten to `openai/*`
RAW_BUFFERClick to expand / collapse

Summary

After updating OpenClaw to 2026.5.5, the gateway entered a repeated LaunchAgent restart loop because the existing config referenced tools.web.search.provider: brave, but the brave web_search provider/plugin was unavailable. The gateway refused to start, wrote many gateway.startup_failed stability bundles, and required openclaw doctor --fix / config repair to recover.

A secondary migration issue remained after recovery: legacy openai-codex/* model refs persisted in agents.defaults.* and were still used at gateway startup until manually patched to openai/*.

Environment

  • OpenClaw: 2026.5.5 (CLI reports build b1abf9d)
  • macOS: 26.3 arm64
  • Node runtime observed:
    • failing LaunchAgent attempts: Node 24.15.0
    • current CLI/gateway after recovery: Node 25.9.0
  • Gateway: macOS LaunchAgent, loopback 127.0.0.1:18789
  • Channel: stable

What happened

  1. Existing config contained tools.web.search.provider: brave.
  2. After update/restart, the gateway failed config validation during startup because the brave provider was unavailable.
  3. LaunchAgent kept restarting the gateway, producing many stability bundles.
  4. openclaw doctor --fix repaired several items, including orphan transcripts/session routes, but legacy model refs later still reappeared/persisted in agents.defaults.model.primary and agents.defaults.models.
  5. Manual config patch was needed to fully clean warnings:
    • commands.ownerAllowFrom = ["telegram:<redacted>"]
    • messages.groupChat.visibleReplies = "automatic"
    • agents.defaults.model.primary = "openai/gpt-5.5"
    • removed agents.defaults.models["openai-codex/gpt-5.5"]
    • gateway.controlUi.allowInsecureAuth = false

Evidence

Representative stability bundle error:

reason: gateway.startup_failed
node: 24.15.0
error: Invalid config at ~/.openclaw/openclaw.json. tools.web.search.provider: web_search provider is not available: brave (install or enable plugin "brave", then run openclaw doctor --fix) Run "openclaw doctor --fix" to repair, then retry.

Observed repeated startup failure bundles during the incident:

count: 20
first sample: openclaw-stability-2026-05-06T13-51-39-001Z-89093-gateway.startup_failed.json
last sample: openclaw-stability-2026-05-06T13-51-59-255Z-89630-gateway.startup_failed.json

Doctor warnings before manual cleanup included:

- messages.groupChat.visibleReplies is set to "message_tool", but the message tool is unavailable for default tool policy
- No command owner is configured
- Legacy `openai-codex/*` session route state detected
- Legacy `openai-codex/*` model refs should be rewritten to `openai/*`

After repair, openclaw doctor --non-interactive no longer reports those warnings; openclaw status reports gateway reachable and Telegram OK.

Expected behavior

  • Updating/restarting should not leave the gateway in a repeated startup-failed loop because an optional web_search provider/plugin is unavailable.
  • doctor --fix should fully repair/remove unavailable provider references or fall back to an available provider.
  • doctor --fix should fully migrate all openai-codex/* refs under agents.defaults.*, not only session route state.
  • If config validation fails on startup, the error is good, but the recovery path should avoid repeated LaunchAgent crash loops or provide a safer fallback.

Actual behavior

  • Gateway repeatedly failed at startup on unavailable brave provider.
  • Stability bundles were written on each failed restart.
  • Manual config cleanup was required after doctor --fix.

Suggested fixes

  1. Make doctor repair tools.web.search.provider when the selected provider/plugin is unavailable, e.g. unset it or switch to a built-in/available provider.
  2. Ensure model-ref migration covers agents.defaults.model.primary and agents.defaults.models keys, not just session state.
  3. Consider a safer degraded startup path for unavailable optional tool providers, especially when the gateway is supervised by LaunchAgent.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Updating/restarting should not leave the gateway in a repeated startup-failed loop because an optional web_search provider/plugin is unavailable.
  • doctor --fix should fully repair/remove unavailable provider references or fall back to an available provider.
  • doctor --fix should fully migrate all openai-codex/* refs under agents.defaults.*, not only session route state.
  • If config validation fails on startup, the error is good, but the recovery path should avoid repeated LaunchAgent crash loops or provide a safer fallback.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING