openclaw - ✅(Solved) Fix [Bug]: doctor --fix rewrites Codex runtime model refs to openai/* and breaks Codex auth profile selection [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#78491Fetched 2026-05-07 03:36:14
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
2
Timeline (top)
cross-referenced ×2closed ×1renamed ×1

After openclaw doctor --fix on OpenClaw 2026.5.5, agents using agentRuntime.id: "codex" were configured with openai/gpt-5.5, and Telegram turns failed because the Codex app-server tried to use openai:default instead of an openai-codex auth profile.

Error Message

2026-05-06T10:22:27.131-03:00 [agents/harness] Codex agent harness failed; not falling back to embedded PI backend 2026-05-06T10:22:27.132-03:00 [diagnostic] lane task error: lane=main durationMs=1175 error="Error: Codex app-server auth profile "openai:default" must belong to provider "openai-codex" or a supported alias." 2026-05-06T10:22:27.132-03:00 [diagnostic] lane task error: lane=session:agent:main:telegram:direct:<redacted> durationMs=1176 error="Error: Codex app-server auth profile "openai:default" must belong to provider "openai-codex" or a supported alias."

Root Cause

Summary

After openclaw doctor --fix on OpenClaw 2026.5.5, agents using agentRuntime.id: "codex" were configured with openai/gpt-5.5, and Telegram turns failed because the Codex app-server tried to use openai:default instead of an openai-codex auth profile.

Fix Action

Fix / Workaround

Working workaround path: codex/gpt-5.5 with agentRuntime.id: "codex".

Observed successful workaround execution trace:

Manual workaround applied:

PR fix notes

PR #78557: fix(doctor): suppress memory warning when alternate plugin owns slot

Description (problem / solution / changelog)

Summary

  • Problem: openclaw doctor reports "No active memory plugin is registered for the current config." even after openclaw plugins install @openclaw/memory-lancedb, despite the plugin being installed, configured, and owning the memory slot.
  • Why it matters: False-positive doctor output erodes trust in the diagnostic and confuses new memory-lancedb users — the warning tells them their setup is broken when it isn't.
  • What changed: Added a third escape hatch in noteMemorySearchHealth (symmetric with the existing gatewayMemoryProbe.ready hatch). When an alternate memory plugin (non-default, non-denied, enabled) owns plugins.slots.memory, the host-runtime null result is uninformative and the note is suppressed.
  • What did NOT change: The memory-host runtime contract; resolveActiveMemoryBackendConfig semantics; the --fix repair path; the warning when memory-core (default) owns the slot but its runtime fails to load; any other doctor check.

Change Type

  • Bug fix
  • Feature
  • Refactor
  • Docs
  • Security
  • Chore

Scope

  • Gateway
  • Skills
  • Auth
  • Memory
  • Integrations
  • API
  • UI/DX (CLI doctor output)
  • CI/CD

Linked Issue/PR

Closes #78540.

  • Fixes a bug

Root Cause

  • Actual root cause: memory-lancedb registers as a plugin via definePluginEntry and provides storage and embeddings through tools (memory_recall, memory_store, memory_forget) and lifecycle hooks (before_prompt_build, agent_end). It does not install a memory-host runtime. When it owns plugins.slots.memory, getMemoryRuntime() stays null, so resolveActiveMemoryBackendConfig returns null even though the user's memory plugin is loaded and active.
  • Missing detection / guardrail: The diagnostic conflated "no host-runtime registered" with "no memory plugin active." Two adjacent escape hatches exist (gatewayMemoryProbe.ready, qmd-binary check), but no hatch for alternate-contract memory plugins.
  • Contributing context: The bundled-default memory-core registers a host-runtime exposing resolveMemoryBackendConfig, so the diagnostic worked for the default install. Alternate memory plugins published through ClawHub or installed via openclaw plugins install had no signal.

Regression Test Plan

  • Unit test (colocated)
  • Integration test
  • E2E test

Target file: src/commands/doctor-memory-search.test.ts

Locked-in scenarios:

  1. cfg.plugins.slots.memory === "memory-lancedb" + null runtime → no note (asserts the fix).
  2. cfg.plugins.slots.memory === "memory-core" + null runtime → still warns (asserts the existing canonical-failure path is preserved).

Why this is the smallest reliable guardrail: the fix turns on a single config signal (slot ownership). Two tests cover both sides of that signal at the diagnostic boundary.

Existing coverage referenced: the pre-existing "does not emit provider guidance when no memory runtime is active" test continues to assert the memory-core failure case via the default cfg = {} path — preserved.

Would have failed against main: Yes. New test 1 calls note once on main (via the unconditional warning path); on this branch it calls note zero times.

User-visible / Behavior Changes

openclaw doctor no longer prints "No active memory plugin is registered for the current config." when a non-default memory plugin (e.g. memory-lancedb, memory-wiki, or any future ClawHub memory plugin) owns the memory slot. Users who rely on memory-core (the bundled default) see no behavior change. Users who explicitly disabled memory (plugins.enabled: false or slots.memory: "none") see no behavior change.

Diagram

Before:
  cfg.plugins.slots.memory = "memory-lancedb"
  ensureMemoryRuntime(cfg)        → null   (lancedb has no host runtime)
  resolveActiveMemoryBackendConfig → null
  if (!backendConfig) {
    if (gatewayProbe.ready) return;
    note("No active memory plugin..."); ← FALSE POSITIVE
  }

After:
  cfg.plugins.slots.memory = "memory-lancedb"
  ensureMemoryRuntime(cfg)        → null
  resolveActiveMemoryBackendConfig → null
  if (!backendConfig) {
    if (gatewayProbe.ready) return;
    if (hasAlternateMemoryPluginSlot(cfg)) return;  ← NEW: silent-pass
    note("No active memory plugin...");
  }

Security Impact

  • New permissions added? No
  • Secret handling changed? No
  • Network egress changed? No
  • Child-process exec surface changed? No
  • Data scope changed? No

No mitigation needed. The change is a read-only suppression of a CLI note based on existing config the doctor already inspects.

Repro + Verification

  • Environment: macOS, openclaw 2026.5.5, npm global install (matches issue #78540)
  • Steps:
    1. npm i -g [email protected]
    2. openclaw plugins install @openclaw/memory-lancedb
    3. Configure with embedding.provider=openai, embedding.model=text-embedding-3-small, autoRecall=true, autoCapture=true
    4. openclaw doctor
  • Expected: No memory-plugin warning (lancedb is installed and owns the slot).
  • Actual on main: "No active memory plugin is registered for the current config."
  • Actual on this branch: No memory-plugin warning.

Real Behavior Proof

Behavior or issue addressed: openclaw doctor no longer emits "No active memory plugin is registered for the current config." when an enabled, non-default memory plugin (here memory-lancedb) owns plugins.slots.memory. The default-slot failure path (memory-core with no host runtime) and the --fix repair flow are unchanged. When the same lancedb slot is disabled via plugins.entries["memory-lancedb"].enabled = false, the warning correctly returns — the gate composes against a real precondition rather than blanket-suppressing.

Real environment tested:

  • OS: <<<macOS Darwin 25.2.0 (arm64) | Ubuntu 24.04 x86_64 | Windows 11>>>
  • Runtime: Node <<<paste node -v>>>
  • OpenClaw: main @ <<<short-sha-of-main>>>, PR head @ <<<short-sha-of-fix-branch>>>
  • State: real ~/.openclaw/openclaw.json with plugins.slots.memory = "memory-lancedb", plugins.entries["memory-lancedb"].enabled = true, memory.embedding.provider = "openai", memory.embedding.model = "text-embedding-3-small", autoRecall = true, autoCapture = true. No mocks.
  • Command host: <<<local checkout | Crabbox <os-worker> | Testbox tbx_xxx | VPS isolated home>>>

Exact steps or command run after this patch:

  1. Check out the fix branch and rebuild the openclaw CLI from source.
  2. Set OPENCLAW_HOME=$HOME/openclaw-pr78557-smoke-home to keep daily state out of the proof.
  3. Seed the bug-triggering config:
    • openclaw plugins install @openclaw/memory-lancedb
    • openclaw config set plugins.slots.memory memory-lancedb
    • Set memory.embedding.provider=openai, memory.embedding.model=text-embedding-3-small, memory.autoRecall=true, memory.autoCapture=true.
  4. Run openclaw doctor against main → capture (Before evidence below).
  5. Switch to the fix branch, rebuild, run openclaw doctor against the same config → capture (Evidence after fix below).
  6. Edit the config so plugins.entries["memory-lancedb"].enabled = false (slot value unchanged), run openclaw doctor again → capture the negative regression guard (Evidence after fix below).

Before evidence (against main @ <<<short-sha-of-main>>>):

$ openclaw doctor 2>&1 | grep -A1 "memory plugin"
<<<PASTE 2–6 LINES FROM before.txt.
   Must contain: "No active memory plugin is registered for the current config.">>>
exit: 0

Evidence after fix:

After (against PR head @ <<<short-sha-of-fix-branch>>>):

$ openclaw doctor 2>&1 | grep "No active memory plugin" || echo "no warning"
no warning
exit: 0

Negative regression guard — plugins.entries["memory-lancedb"].enabled = false, slot unchanged (precondition flipped):

$ openclaw doctor 2>&1 | grep "No active memory plugin"
<<<PASTE 1–2 LINES FROM negative.txt SHOWING THE WARNING RETURNS.
   Must contain: "No active memory plugin is registered for the current config.">>>
exit: 0

Observed result after fix:

  • Before this branch: openclaw doctor reports the false-positive warning even though memory-lancedb is installed, configured, and owns plugins.slots.memory.
  • After this branch (same config): openclaw doctor completes the memory-search section silently. No other doctor output changes.
  • Negative guard (entry disabled, slot unchanged): warning returns. This proves hasAlternateMemoryPluginSlot gates on a real precondition (plugins.entries[slot].enabled !== false) rather than unconditionally suppressing the note.
  • Code path exercised: src/commands/doctor-memory-search.ts:hasAlternateMemoryPluginSlot (pure config read) and src/commands/doctor-memory-search.ts:noteMemorySearchHealth (third escape hatch fires after gatewayMemoryProbe.ready, before note(...)).
  • Boundary untouched: --fix repair path, memory-host runtime contract, default-slot canonical-failure path (memory-core slot + null runtime → still warns).

What was not tested:

  • memory-wiki and other future ClawHub-published memory plugins as the slot owner. The helper is contract-shape agnostic, but only memory-lancedb was exercised on a real OpenClaw run.
  • Gateway-running diagnostics path (openclaw status --deep). This PR only touches noteMemorySearchHealth invoked by the CLI, not the gateway-side memory probe.
  • Per-agent memory-slot overrides. The helper reads top-level cfg.plugins, which is global; per-agent overrides were not exercised.
  • Concurrent openclaw doctor invocations. Single-process run only; the helper is pure config reads (no mutation), so a race scenario was not constructed.
  • macOS-specific behavior beyond the issue's reproducer. Bug filed and reproduced on macOS; helper is config-only and platform-independent so divergence is unlikely.

Evidence

  • Failing test on main, passing on this branch (regression test 1).
  • Trace of the relevant note path (see Diagram).
  • Real terminal capture (see Real Behavior Proof above).
  • Screenshot — N/A, terminal output included instead.
  • Perf data — N/A, no perf-relevant code touched.

Commands run during local validation (separate from the proof captures above):

pnpm exec oxfmt --check --threads=1 src/commands/doctor-memory-search.ts src/commands/doctor-memory-search.test.ts
pnpm test src/commands/doctor-memory-search.test.ts -- --reporter=verbose
pnpm check:changed -- --base upstream/main
pnpm tsgo:core && pnpm tsgo:core:test
pnpm lint:core
pnpm check:changelog-attributions
git diff --check origin/main...HEAD

Human Verification

Verified scenarios:

  • macOS + memory-lancedb (slot owner) + autoRecall/autoCapture configured → no warning.
  • Default config (memory-core slot owner) + simulated runtime null → warning still fires (regression test 2).
  • plugins.enabled: false → helper returns false; existing pre-runtime branches handle silently as before.
  • slots.memory: "none" (normalized to null) → helper returns false; the runtime never loads anyway.
  • Denied or disabled-entry slot → helper returns false; falls through to the existing warning, which is correct (user denied or disabled the only memory plugin they configured). Confirmed live by the negative regression guard above.

What I did NOT verify:

  • memory-wiki as the slot owner.
  • ClawHub-installed memory plugins beyond memory-lancedb.
  • Linux. Bug was reported on macOS; helper is platform-independent (config-only) so platform variance is unlikely.
  • The Gateway-running diagnostics path (openclaw status --deep).

Compatibility / Migration + Risks and Mitigations

  • Public API change? No
  • Config-shape change? No
  • Migration needed? No
RiskMitigation
A future memory plugin that should register a host runtime fails to do so silently — doctor would now stay quiet instead of warning.The plugin's own service-start logger (api.registerService({ start: ... })) surfaces init failures. Doctor still warns for the canonical bundled default (memory-core).
User puts a non-existent plugin id in slots.memory and doctor stays silent.Pre-existing failure mode of the slot model (plugins install validates the id). Loader path surfaces a load error elsewhere.
The helper reads cfg.plugins without try/catch.normalizePluginsConfig accepts undefined and returns a safe default; not throwy on malformed input.

Duplicate / Related Threads

The memory + doctor space is currently crowded. Each related issue is distinct from this fix:

  • #78210doctor --fix reports memory-core deps healthy when missing on disk. Different code path (dependency audit, not runtime registration).
  • #78499 / #78509 / #78491 (closed dup) — Codex OAuth model-ref rewrite. Different file (src/commands/doctor/shared/codex-route-warnings.ts); already in flight as #78513.
  • #78519 / #78539 — Gateway / OpenAI subscription regressions on 2026.5.5 update. Different surfaces.
  • #78484 — Codex agent on Telegram with stale API key. Different surface and runtime.

This PR addresses only the memory-slot diagnostic in noteMemorySearchHealth. No file overlap with any open PR.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/commands/doctor-memory-search.test.ts (modified, +37/-0)
  • src/commands/doctor-memory-search.ts (modified, +28/-0)

Code Example

Something went wrong while processing your request. Please try again, or use /new to start a fresh session.

---

2026-05-06T10:22:27.131-03:00 [agents/harness] Codex agent harness failed; not falling back to embedded PI backend
2026-05-06T10:22:27.132-03:00 [diagnostic] lane task error: lane=main durationMs=1175 error="Error: Codex app-server auth profile "openai:default" must belong to provider "openai-codex" or a supported alias."
2026-05-06T10:22:27.132-03:00 [diagnostic] lane task error: lane=session:agent:main:telegram:direct:<redacted> durationMs=1176 error="Error: Codex app-server auth profile "openai:default" must belong to provider "openai-codex" or a supported alias."

---

2026-05-06T10:22:28.400-03:00 [diagnostic] lane task error: lane=session:agent:murphy:telegram:direct:<redacted> durationMs=885 error="Error: Codex app-server auth profile "openai:default" must belong to provider "openai-codex" or a supported alias."

---

/opt/homebrew/opt/node/bin/node /opt/homebrew/lib/node_modules/openclaw/dist/index.js gateway --port 18789

---

{
  "winnerProvider": "codex",
  "winnerModel": "gpt-5.5",
  "attempts": [
    {
      "provider": "codex",
      "model": "gpt-5.5",
      "result": "success",
      "stage": "assistant"
    }
  ],
  "fallbackUsed": false,
  "runner": "embedded"
}

---

anthropic:default -> provider anthropic
openai-codex:<redacted-email> -> provider openai-codex, mode oauth
openai:default -> provider openai, mode api_key

---

{
  "model": { "primary": "openai/gpt-5.5" },
  "agentRuntime": { "id": "codex" }
}

---

agents.defaults.model.primary = codex/gpt-5.5
agents.list[*].model.primary = codex/gpt-5.5
plugins.entries.memory-core.* execution model refs = codex/gpt-5.5

---

Moved agents.defaults.model legacy runtime primary refs to canonical provider refs and selected codex runtime.
Moved agents.defaults.models legacy runtime keys to canonical provider keys.
Moved agents.list.main.model legacy runtime primary refs to canonical provider refs and selected codex runtime.
...
Run "openclaw doctor --fix" to apply these changes.

---

{
  legacyProvider: "codex",
  provider: "openai",
  runtime: "codex",
  cli: false
}

---

const DEFAULT_CODEX_HARNESS_PROVIDER_IDS = new Set(["codex"]);
RAW_BUFFERClick to expand / collapse

Bug type

CLI/config migration (openclaw doctor --fix)

Beta release blocker

No

Summary

After openclaw doctor --fix on OpenClaw 2026.5.5, agents using agentRuntime.id: "codex" were configured with openai/gpt-5.5, and Telegram turns failed because the Codex app-server tried to use openai:default instead of an openai-codex auth profile.

Steps to reproduce

  1. Use OpenClaw 2026.5.5 with the Codex plugin installed/enabled and agents using agentRuntime.id: "codex".
  2. Run openclaw doctor --fix.
  3. Restart the gateway.
  4. Send /new to a Telegram bot account.
  5. Send a normal Telegram message to the same bot.

Expected behavior

A normal Telegram message after /new should complete an agent turn. This was observed after manually setting the agents back to codex/gpt-5.5: the Claw Telegram session returned OpenClaw repair OK with deliverySucceeded: true, and a Murphy probe returned OK through provider: codex, model: gpt-5.5, agentHarnessId: codex.

Actual behavior

/new succeeded, but normal Telegram messages failed with the user-visible reply:

Something went wrong while processing your request. Please try again, or use /new to start a fresh session.

Gateway logs showed:

2026-05-06T10:22:27.131-03:00 [agents/harness] Codex agent harness failed; not falling back to embedded PI backend
2026-05-06T10:22:27.132-03:00 [diagnostic] lane task error: lane=main durationMs=1175 error="Error: Codex app-server auth profile "openai:default" must belong to provider "openai-codex" or a supported alias."
2026-05-06T10:22:27.132-03:00 [diagnostic] lane task error: lane=session:agent:main:telegram:direct:<redacted> durationMs=1176 error="Error: Codex app-server auth profile "openai:default" must belong to provider "openai-codex" or a supported alias."

The same error was observed for another Telegram agent session:

2026-05-06T10:22:28.400-03:00 [diagnostic] lane task error: lane=session:agent:murphy:telegram:direct:<redacted> durationMs=885 error="Error: Codex app-server auth profile "openai:default" must belong to provider "openai-codex" or a supported alias."

OpenClaw version

OpenClaw 2026.5.5 (b1abf9d)

Codex plugin inspected after repair: @openclaw/codex 2026.5.5.

Operating system

macOS 26.4.1 arm64, Node 25.9.0.

Install method

Gateway launched as a LaunchAgent:

/opt/homebrew/opt/node/bin/node /opt/homebrew/lib/node_modules/openclaw/dist/index.js gateway --port 18789

openclaw status --deep reported channel stable, gateway reachable at ws://127.0.0.1:18789, and update status pnpm · up to date · npm latest 2026.5.5.

Model

Failing path after doctor/config migration: openai/gpt-5.5 with agentRuntime.id: "codex".

Working workaround path: codex/gpt-5.5 with agentRuntime.id: "codex".

Provider / routing chain

Telegram DM -> local OpenClaw gateway 127.0.0.1:18789 -> configured agent -> agentRuntime.id: "codex" -> Codex app-server harness.

Observed failing auth selection: openai:default.

Observed successful workaround execution trace:

{
  "winnerProvider": "codex",
  "winnerModel": "gpt-5.5",
  "attempts": [
    {
      "provider": "codex",
      "model": "gpt-5.5",
      "result": "success",
      "stage": "assistant"
    }
  ],
  "fallbackUsed": false,
  "runner": "embedded"
}

Additional provider/model setup details

Auth profiles present, redacted:

anthropic:default -> provider anthropic
openai-codex:<redacted-email> -> provider openai-codex, mode oauth
openai:default -> provider openai, mode api_key

After the failure, the config showed all ten agents with:

{
  "model": { "primary": "openai/gpt-5.5" },
  "agentRuntime": { "id": "codex" }
}

Manual workaround applied:

agents.defaults.model.primary = codex/gpt-5.5
agents.list[*].model.primary = codex/gpt-5.5
plugins.entries.memory-core.* execution model refs = codex/gpt-5.5

After gateway restart, openclaw models list showed codex/gpt-5.5 as default,configured, and Telegram delivery succeeded.

Logs, screenshots, and evidence

openclaw doctor after the manual workaround still proposed migrating the working codex/gpt-5.5 refs:

Moved agents.defaults.model legacy runtime primary refs to canonical provider refs and selected codex runtime.
Moved agents.defaults.models legacy runtime keys to canonical provider keys.
Moved agents.list.main.model legacy runtime primary refs to canonical provider refs and selected codex runtime.
...
Run "openclaw doctor --fix" to apply these changes.

The installed OpenClaw migration code maps legacy provider codex to provider openai with runtime codex:

{
  legacyProvider: "codex",
  provider: "openai",
  runtime: "codex",
  cli: false
}

The installed Codex plugin harness only supports provider codex by default:

const DEFAULT_CODEX_HARNESS_PROVIDER_IDS = new Set(["codex"]);

The installed Codex plugin provider catalog uses CODEX_PROVIDER_ID = "codex".

Impact and severity

Affected users/systems/channels: observed on Telegram DM sessions for the default Claw agent and Murphy; the config showed the same openai/gpt-5.5 + agentRuntime.id: "codex" pairing for all ten configured agents.

Severity: blocks Telegram agent replies.

Frequency: observed on normal Telegram messages after /new at 10:22 and 10:23 on 2026-05-06.

Consequence: Telegram users receive the generic failure reply instead of an agent response.

Additional information

Before this issue was isolated, a stale Codex plugin install was also observed: host OpenClaw was 2026.5.5, installed @openclaw/codex was 2026.5.3, and openclaw plugins update codex reported it was up to date. For this report, the auth-profile failure above was reproduced/verified after installing @openclaw/[email protected] and disabling a separate active-memory timeout, so the remaining grounded failure was the model/provider migration mismatch.

Do not include API keys, tokens, or passwords in follow-up diagnostics.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A normal Telegram message after /new should complete an agent turn. This was observed after manually setting the agents back to codex/gpt-5.5: the Claw Telegram session returned OpenClaw repair OK with deliverySucceeded: true, and a Murphy probe returned OK through provider: codex, model: gpt-5.5, agentHarnessId: codex.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING