openclaw - 💡(How to fix) Fix Bug: message-channel runs can still route to stale codex/openai-codex paths after Codex is disabled [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68615Fetched 2026-04-19 15:09:31
View on GitHub
Comments
1
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
commented ×1

In a setup where the intended working model route is a proxy-backed model such as codex-proxy/gpt-5.4, some message-channel runs can still resolve to stale Codex-related paths even after Codex is disabled.

Observed stale paths include:

  • codex/gpt-5.4
  • provider=codex
  • openai-codex
  • https://chatgpt.com/backend-api

This can surface user-visible failures such as:

  • LLM request failed: DNS lookup for the provider endpoint failed
  • ⚠️ API rate limit reached. Please try again later.
  • raw HTML / Cloudflare-like error pages

At the same time, the channel send layer itself may still be healthy (for example, successful sendMessage ok logs on Telegram). This makes the issue look like a channel problem when it is actually a runtime provider/model resolution problem.


Error Message

  • raw HTML / Cloudflare-like error pages
  • provider=codex error=LLM request failed: DNS lookup for the provider endpoint failed
  • provider=codex error=⚠️ API rate limit reached. Please try again later.

Root Cause

This can produce confusing message-channel failures where:

  • the configured model path is healthy
  • channel delivery is healthy
  • but runtime still selects a stale Codex path and surfaces HTML/DNS/rate-limit errors to users

That makes the issue appear channel-specific when it is actually a provider/model resolution problem.


Fix Action

Fix / Workaround

After local mitigation, behavior improved to:

  • request still starts as codex/gpt-5.4
  • runtime reports Unknown model: codex/gpt-5.4
  • runtime falls back to codex-proxy/gpt-5.4
  • channel reply succeeds

Representative post-mitigation evidence:

Local mitigation that restored user-facing functionality

RAW_BUFFERClick to expand / collapse

Bug: message-channel runs can still route to stale codex / openai-codex paths after Codex is disabled

Summary

In a setup where the intended working model route is a proxy-backed model such as codex-proxy/gpt-5.4, some message-channel runs can still resolve to stale Codex-related paths even after Codex is disabled.

Observed stale paths include:

  • codex/gpt-5.4
  • provider=codex
  • openai-codex
  • https://chatgpt.com/backend-api

This can surface user-visible failures such as:

  • LLM request failed: DNS lookup for the provider endpoint failed
  • ⚠️ API rate limit reached. Please try again later.
  • raw HTML / Cloudflare-like error pages

At the same time, the channel send layer itself may still be healthy (for example, successful sendMessage ok logs on Telegram). This makes the issue look like a channel problem when it is actually a runtime provider/model resolution problem.


Environment

  • OpenClaw package: [email protected]
  • Install mode: pnpm global install
  • Channel under test: Telegram
  • Intended working model route: codex-proxy/gpt-5.4
  • plugins.entries.codex.enabled = false

Expected behavior

When Codex is disabled and the effective configured working model is codex-proxy/gpt-5.4, message-channel runs should resolve directly to that configured route.

Expected path:

  • message-channel inbound event
  • effective configured agent/session
  • codex-proxy/gpt-5.4

Actual behavior

Some message-channel runs still attempt stale Codex-related routes first, producing failures such as:

  • provider=codex
  • chatgpt.com/backend-api
  • HTML / Cloudflare response pages
  • DNS lookup failures
  • apparent rate-limit errors

After local mitigation, behavior improved to:

  • request still starts as codex/gpt-5.4
  • runtime reports Unknown model: codex/gpt-5.4
  • runtime falls back to codex-proxy/gpt-5.4
  • channel reply succeeds

So the hard failure path can be suppressed, but stale model/provider resolution still exists.


Representative evidence

Representative failing runtime evidence:

  • provider=codex error=LLM request failed: DNS lookup for the provider endpoint failed
  • provider=codex error=⚠️ API rate limit reached. Please try again later.
  • rawError=<html> ...
  • embedded run failover decision ... reason=rate_limit from=codex/gpt-5.4

Representative post-mitigation evidence:

  • FailoverError: Unknown model: codex/gpt-5.4
  • model fallback decision: ... requested=codex/gpt-5.4 ... next=codex-proxy/gpt-5.4
  • candidate_succeeded ... candidate=codex-proxy/gpt-5.4
  • Model "codex/gpt-5.4" not found. Fell back to "codex-proxy/gpt-5.4".

Root-cause findings

1. Disabled plugin side effects can still happen

The bundled loader still appears to call register(api) before fully honoring shouldActivate === false.

That means a disabled plugin can still create side effects first and only then attempt rollback.

This is especially problematic for provider / harness / command registration.

2. The bundled codex plugin still has a direct provider registration path

Relevant bundled code path:

  • dist/extensions/codex/index.js

Observed behavior includes registration of:

  • agent harness
  • provider via buildCodexProvider(...)
  • command

3. There appears to be a second Codex-related route beyond the old plugin toggle

Additional bundled paths include:

  • dist/extensions/openai/index.js
  • dist/openai-codex-provider-*.js
  • dist/openai-codex-catalog-*.js

These still surface openai-codex / ChatGPT backend related routing behavior.

4. Derived/runtime state can keep stale Codex routes alive

Derived/runtime state can still contain entries such as:

  • codex
  • openai-codex
  • https://chatgpt.com/backend-api/v1

Auth/runtime state can also retain stale defaults such as:

  • openai-codex:default
  • codex:default

So even after disabling Codex, runtime can still retain stale candidates that may be selected.


Local mitigation that restored user-facing functionality

Two classes of local mitigation were needed:

A. Loader hotfix

Short-circuit disabled plugins before register(api).

B. Runtime state cleanup

Remove stale codex / openai-codex derived model entries and stale auth ordering so runtime can no longer use the broken remote Codex path.

After this, channel replies started working again, but stale model requests for codex/gpt-5.4 still needed runtime fallback to codex-proxy/gpt-5.4.

So the user-facing outage was resolved, but stale route normalization still appears incomplete.


Suggested upstream fixes

Fix 1: loader correctness

If shouldActivate === false, do not call register(api) at all.

Fix 2: derived/runtime model cleanup

When Codex is disabled (or effectively unused), do not keep surfacing stale derived model rows such as:

  • codex
  • openai-codex
  • ChatGPT backend transport routes

Fix 3: stale model alias normalization

If the configured working route is codex-proxy/gpt-5.4, stale requests like codex/gpt-5.4 should be normalized before runtime execution, not only recovered by fallback.

Fix 4: auth-state migration / cleanup

Stale auth defaults like codex:default / openai-codex:default should not silently remain preferred after route changes.


Why this matters

This can produce confusing message-channel failures where:

  • the configured model path is healthy
  • channel delivery is healthy
  • but runtime still selects a stale Codex path and surfaces HTML/DNS/rate-limit errors to users

That makes the issue appear channel-specific when it is actually a provider/model resolution problem.


Current status

As of final local validation:

  • message replies are functioning again
  • the previous blocking provider=codex HTML/DNS/rate-limit path is no longer the active failure mode
  • but there is still a stale request source asking for codex/gpt-5.4, which then falls back to codex-proxy/gpt-5.4

So the outage can be mitigated locally, but upstream normalization / cleanup would still be valuable.

extent analysis

TL;DR

The most likely fix involves modifying the loader to correctly handle disabled plugins and cleaning up derived/runtime state to prevent stale Codex routes from being used.

Guidance

  1. Loader correction: Ensure that when a plugin is disabled (shouldActivate === false), the register(api) call is skipped to prevent side effects.
  2. Derived/runtime state cleanup: Remove stale codex and openai-codex derived model entries and auth ordering to prevent the runtime from using broken remote Codex paths.
  3. Stale model alias normalization: Implement a mechanism to normalize stale requests like codex/gpt-5.4 to the configured working route codex-proxy/gpt-5.4 before runtime execution.
  4. Auth-state migration/cleanup: Remove stale auth defaults like codex:default and openai-codex:default to ensure they do not remain preferred after route changes.

Example

No specific code snippet is provided due to the complexity and specificity of the issue, but the fixes suggested involve modifying the loader and runtime state management logic.

Notes

The provided guidance is based on the detailed analysis of the issue and suggests a multi-step approach to resolving the problem. The fixes aim to address the root causes identified, including loader correctness, derived/runtime state cleanup, stale model alias normalization, and auth-state migration/cleanup.

Recommendation

Apply the suggested workaround by implementing the loader correction, derived/runtime state cleanup, stale model alias normalization, and auth-state migration/cleanup. This approach is recommended because it directly addresses the identified root causes and should prevent the stale Codex routes from being used, thus resolving the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When Codex is disabled and the effective configured working model is codex-proxy/gpt-5.4, message-channel runs should resolve directly to that configured route.

Expected path:

  • message-channel inbound event
  • effective configured agent/session
  • codex-proxy/gpt-5.4

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Bug: message-channel runs can still route to stale codex/openai-codex paths after Codex is disabled [1 comments, 1 participants]