openclaw - 💡(How to fix) Fix Bug: message-channel runs can still route to stale codex/openai-codex paths after Codex is disabled [1 comments, 1 participants]

openclaw2026-04-18 15:28:55

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#68615•Fetched 2026-04-19 15:09:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

JoeshpCheung

Participants

JoeshpCheung

Timeline (top)

commented ×1

In a setup where the intended working model route is a proxy-backed model such as codex-proxy/gpt-5.4, some message-channel runs can still resolve to stale Codex-related paths even after Codex is disabled.

Observed stale paths include:

codex/gpt-5.4
provider=codex
openai-codex
https://chatgpt.com/backend-api

This can surface user-visible failures such as:

LLM request failed: DNS lookup for the provider endpoint failed
⚠️ API rate limit reached. Please try again later.
raw HTML / Cloudflare-like error pages

At the same time, the channel send layer itself may still be healthy (for example, successful sendMessage ok logs on Telegram). This makes the issue look like a channel problem when it is actually a runtime provider/model resolution problem.

Error Message

raw HTML / Cloudflare-like error pages
provider=codex error=LLM request failed: DNS lookup for the provider endpoint failed
provider=codex error=⚠️ API rate limit reached. Please try again later.

Root Cause

This can produce confusing message-channel failures where:

the configured model path is healthy
channel delivery is healthy
but runtime still selects a stale Codex path and surfaces HTML/DNS/rate-limit errors to users

That makes the issue appear channel-specific when it is actually a provider/model resolution problem.

Fix Action

Fix / Workaround

After local mitigation, behavior improved to:

request still starts as codex/gpt-5.4
runtime reports Unknown model: codex/gpt-5.4
runtime falls back to codex-proxy/gpt-5.4
channel reply succeeds

Representative post-mitigation evidence:

Local mitigation that restored user-facing functionality

RAW_BUFFERClick to expand / collapse

Bug: message-channel runs can still route to stale `codex` / `openai-codex` paths after Codex is disabled

Summary

Observed stale paths include:

codex/gpt-5.4
provider=codex
openai-codex
https://chatgpt.com/backend-api

This can surface user-visible failures such as:

LLM request failed: DNS lookup for the provider endpoint failed
⚠️ API rate limit reached. Please try again later.
raw HTML / Cloudflare-like error pages

Environment

OpenClaw package: [email protected]
Install mode: pnpm global install
Channel under test: Telegram
Intended working model route: codex-proxy/gpt-5.4
plugins.entries.codex.enabled = false

Expected behavior

When Codex is disabled and the effective configured working model is codex-proxy/gpt-5.4, message-channel runs should resolve directly to that configured route.

Expected path:

message-channel inbound event
effective configured agent/session
codex-proxy/gpt-5.4

Actual behavior

Some message-channel runs still attempt stale Codex-related routes first, producing failures such as:

provider=codex
chatgpt.com/backend-api
HTML / Cloudflare response pages
DNS lookup failures
apparent rate-limit errors

After local mitigation, behavior improved to:

request still starts as codex/gpt-5.4
runtime reports Unknown model: codex/gpt-5.4
runtime falls back to codex-proxy/gpt-5.4
channel reply succeeds

So the hard failure path can be suppressed, but stale model/provider resolution still exists.

Representative evidence

Representative failing runtime evidence:

provider=codex error=LLM request failed: DNS lookup for the provider endpoint failed
provider=codex error=⚠️ API rate limit reached. Please try again later.
rawError=<html> ...
embedded run failover decision ... reason=rate_limit from=codex/gpt-5.4

Representative post-mitigation evidence:

FailoverError: Unknown model: codex/gpt-5.4
model fallback decision: ... requested=codex/gpt-5.4 ... next=codex-proxy/gpt-5.4
candidate_succeeded ... candidate=codex-proxy/gpt-5.4
Model "codex/gpt-5.4" not found. Fell back to "codex-proxy/gpt-5.4".

Root-cause findings

1. Disabled plugin side effects can still happen

The bundled loader still appears to call register(api) before fully honoring shouldActivate === false.

That means a disabled plugin can still create side effects first and only then attempt rollback.

This is especially problematic for provider / harness / command registration.

2. The bundled `codex` plugin still has a direct provider registration path

Relevant bundled code path:

dist/extensions/codex/index.js

Observed behavior includes registration of:

agent harness
provider via buildCodexProvider(...)
command

3. There appears to be a second Codex-related route beyond the old plugin toggle

Additional bundled paths include:

dist/extensions/openai/index.js
dist/openai-codex-provider-*.js
dist/openai-codex-catalog-*.js

These still surface openai-codex / ChatGPT backend related routing behavior.

4. Derived/runtime state can keep stale Codex routes alive

Derived/runtime state can still contain entries such as:

codex
openai-codex
https://chatgpt.com/backend-api/v1

Auth/runtime state can also retain stale defaults such as:

openai-codex:default
codex:default

So even after disabling Codex, runtime can still retain stale candidates that may be selected.

Local mitigation that restored user-facing functionality

Two classes of local mitigation were needed:

A. Loader hotfix

Short-circuit disabled plugins before register(api).

B. Runtime state cleanup

Remove stale codex / openai-codex derived model entries and stale auth ordering so runtime can no longer use the broken remote Codex path.

After this, channel replies started working again, but stale model requests for codex/gpt-5.4 still needed runtime fallback to codex-proxy/gpt-5.4.

So the user-facing outage was resolved, but stale route normalization still appears incomplete.

Suggested upstream fixes

Fix 1: loader correctness

If shouldActivate === false, do not call register(api) at all.

Fix 2: derived/runtime model cleanup

When Codex is disabled (or effectively unused), do not keep surfacing stale derived model rows such as:

codex
openai-codex
ChatGPT backend transport routes

Fix 3: stale model alias normalization

If the configured working route is codex-proxy/gpt-5.4, stale requests like codex/gpt-5.4 should be normalized before runtime execution, not only recovered by fallback.

Fix 4: auth-state migration / cleanup

Stale auth defaults like codex:default / openai-codex:default should not silently remain preferred after route changes.

Why this matters

This can produce confusing message-channel failures where:

the configured model path is healthy
channel delivery is healthy
but runtime still selects a stale Codex path and surfaces HTML/DNS/rate-limit errors to users

That makes the issue appear channel-specific when it is actually a provider/model resolution problem.

Current status

As of final local validation:

message replies are functioning again
the previous blocking provider=codex HTML/DNS/rate-limit path is no longer the active failure mode
but there is still a stale request source asking for codex/gpt-5.4, which then falls back to codex-proxy/gpt-5.4

So the outage can be mitigated locally, but upstream normalization / cleanup would still be valuable.

extent analysis

TL;DR

The most likely fix involves modifying the loader to correctly handle disabled plugins and cleaning up derived/runtime state to prevent stale Codex routes from being used.

Guidance

Loader correction: Ensure that when a plugin is disabled (shouldActivate === false), the register(api) call is skipped to prevent side effects.
Derived/runtime state cleanup: Remove stale codex and openai-codex derived model entries and auth ordering to prevent the runtime from using broken remote Codex paths.
Stale model alias normalization: Implement a mechanism to normalize stale requests like codex/gpt-5.4 to the configured working route codex-proxy/gpt-5.4 before runtime execution.
Auth-state migration/cleanup: Remove stale auth defaults like codex:default and openai-codex:default to ensure they do not remain preferred after route changes.

Example

No specific code snippet is provided due to the complexity and specificity of the issue, but the fixes suggested involve modifying the loader and runtime state management logic.

Notes

The provided guidance is based on the detailed analysis of the issue and suggests a multi-step approach to resolving the problem. The fixes aim to address the root causes identified, including loader correctness, derived/runtime state cleanup, stale model alias normalization, and auth-state migration/cleanup.

Recommendation

Apply the suggested workaround by implementing the loader correction, derived/runtime state cleanup, stale model alias normalization, and auth-state migration/cleanup. This approach is recommended because it directly addresses the identified root causes and should prevent the stale Codex routes from being used, thus resolving the issue.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

When Codex is disabled and the effective configured working model is codex-proxy/gpt-5.4, message-channel runs should resolve directly to that configured route.

Expected path:

message-channel inbound event
effective configured agent/session
codex-proxy/gpt-5.4

#api #API rate limit #response parsing #generation error #database connection

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - 💡(How to fix) Fix Bug: message-channel runs can still route to stale codex/openai-codex paths after Codex is disabled [1 comments, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Fix / Workaround

Local mitigation that restored user-facing functionality

Bug: message-channel runs can still route to stale codex / openai-codex paths after Codex is disabled

Summary

Environment

Expected behavior

Actual behavior

Representative evidence

Root-cause findings

1. Disabled plugin side effects can still happen

2. The bundled codex plugin still has a direct provider registration path

3. There appears to be a second Codex-related route beyond the old plugin toggle

4. Derived/runtime state can keep stale Codex routes alive

Local mitigation that restored user-facing functionality

A. Loader hotfix

B. Runtime state cleanup

Suggested upstream fixes

Fix 1: loader correctness

Fix 2: derived/runtime model cleanup

Fix 3: stale model alias normalization

Fix 4: auth-state migration / cleanup

Why this matters

Current status

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Bug: message-channel runs can still route to stale `codex` / `openai-codex` paths after Codex is disabled

2. The bundled `codex` plugin still has a direct provider registration path