openclaw - ✅(Solved) Fix [Bug]: openai-codex OAuth runtime fails on 2026.4.9 with 403 HTML; 2026.3.28 works [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64174Fetched 2026-04-11 06:16:01
View on GitHub
Comments
2
Participants
2
Timeline
5
Reactions
0
Author
Participants
Timeline (top)
commented ×2labeled ×2cross-referenced ×1

On the same Ubuntu VPS and openai-codex OAuth setup, OpenClaw 2026.4.9 fails to return runtime chat replies in Control UI, WhatsApp, and WeCom, while 2026.3.28 works.

Error Message

  1. Observe that Control UI gets stuck on "writing" or the channel returns an error instead of a normal reply.
  • error":"403 <html>..." Control UI becomes unusable for normal chat replies and channel users receive misleading error messages instead of agent responses
  • The repeated raw runtime symptom observed in logs was upstream 403 <html>..., while the surfaced user-facing error text varied

Root Cause

On the same Ubuntu VPS and openai-codex OAuth setup, OpenClaw 2026.4.9 fails to return runtime chat replies in Control UI, WhatsApp, and WeCom, while 2026.3.28 works.

Fix Action

Fixed

PR fix notes

PR #64286: openai-codex: fix auth scope handling and classify provider/runtime failures

Description (problem / solution / changelog)

Summary

This is PR 2 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64229.

It fixes the maintained-source OpenAI Codex OAuth scope gap in OpenClaw's login wrapper and adds a separate provider/runtime failure taxonomy that makes auth-scope, refresh, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid failures observable in logs and easier to explain to users.

What changed

  • normalize OpenAI Codex authorize URLs so the required scopes are always present:
    • openid
    • profile
    • email
    • offline_access
    • model.request
    • api.responses.write
  • add classifyProviderRuntimeFailureKind(...) as a typed provider/runtime failure classifier
  • keep the older failover-reason contract intact instead of widening it in this slice
  • thread providerRuntimeFailureKind through embedded-run observation fields and lifecycle logging
  • surface more truthful user-facing copy for:
    • OAuth refresh failures
    • missing OpenAI Codex scopes
    • HTML 403 auth failures
    • proxy/tunnel misroutes
    • replay-invalid failures
  • add focused regressions for scope failures, refresh failures, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid paths

Why

GPT-5.4 / Codex failures in OpenClaw are still too easy to misdiagnose as generic model stops. This slice makes the auth/runtime layer tell the truth before we move on to tool-contract and parity-harness work.

Non-goals

  • does not implement tool compatibility work from #64230
  • does not implement permission truthfulness work from #64231
  • does not implement replay/liveness hardening from #64232
  • does not implement the benchmark harness from #64233
  • does not widen the generic failover-reason enum for every caller in this slice

Builds on prior groundwork

  • #45176
  • #48592
  • #53702
  • #55206
  • #44019

Validation

Focused checks run:

  • CI=1 pnpm exec vitest run src/commands/openai-codex-oauth.test.ts src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts src/agents/pi-embedded-error-observation.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts
  • repo hook gate during commit:
    • pnpm check:no-conflict-markers
    • pnpm tool-display:check
    • pnpm check:host-env-policy:swift
    • pnpm tsgo
    • node scripts/prepare-extension-package-boundary-artifacts.mjs
    • pnpm lint
    • pnpm lint:webhook:no-low-level-body-read
    • pnpm lint:auth:no-pairing-store-group
    • pnpm lint:auth:pairing-account-scope

Linked issues

  • Closes #64229
  • Refs #64227
  • Refs #64133
  • Refs #64174
  • Refs #64092
  • Refs #57399
  • Refs #62672

Changed files

  • src/agents/failover-error.test.ts (modified, +10/-0)
  • src/agents/pi-embedded-error-observation.test.ts (modified, +14/-0)
  • src/agents/pi-embedded-error-observation.ts (modified, +23/-4)
  • src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts (modified, +67/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +79/-0)
  • src/agents/pi-embedded-helpers.ts (modified, +2/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +219/-4)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +22/-0)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +16/-3)
  • src/commands/openai-codex-oauth.test.ts (modified, +28/-3)
  • src/plugins/provider-openai-codex-oauth.ts (modified, +40/-1)

Code Example

Observed log patterns on 2026.4.9 include:

- `embedded_run_agent_end`
  - `error":"403 <html>..."`
  - `failoverReason":"auth"`
  - `provider":"openai-codex"`
  - `model":"gpt-5.4"`

- Followed by surfaced user-facing messages such as:
  - `LLM request failed: DNS lookup for the provider endpoint failed.`
  - `⚠️ API rate limit reached. Please try again later.`

Other observed evidence:
- VPS resolver checks for `api.openai.com` succeeded
- `openclaw models status --agent main --probe` succeeded on the intended Codex OAuth profile
- Downgrading from 2026.4.9 to 2026.3.28 restored normal behavior on the same VPS and same setup
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

On the same Ubuntu VPS and openai-codex OAuth setup, OpenClaw 2026.4.9 fails to return runtime chat replies in Control UI, WhatsApp, and WeCom, while 2026.3.28 works.

Steps to reproduce

  1. Start the same self-hosted Ubuntu VPS setup on OpenClaw 2026.4.9 with openai-codex OAuth.
  2. Open a fresh Control UI chat, or send /new and then a simple text message in WhatsApp or WeCom.
  3. Observe that Control UI gets stuck on "writing" or the channel returns an error instead of a normal reply.
  4. Check gateway logs and observe repeated 403 <html>... for provider":"openai-codex" and model":"gpt-5.4".
  5. Downgrade the same setup to 2026.3.28 and repeat the same tests.
  6. Observe that replies work again on 2026.3.28.

Expected behavior

On the same VPS and same openai-codex OAuth setup, Control UI, WhatsApp, and WeCom should return normal replies, as observed on OpenClaw 2026.3.28.

Actual behavior

On 2026.4.9, Control UI main gets stuck on "writing" and WhatsApp/WeCom return misleading user-facing errors such as LLM request failed: DNS lookup for the provider endpoint failed. or ⚠️ API rate limit reached. Please try again later.; the gateway logs repeatedly show upstream 403 <html>... with failoverReason":"auth", provider":"openai-codex", and model":"gpt-5.4".

OpenClaw version

2026.4.9 (0512059)

Operating system

Ubuntu 22.04.5 LTS

Install method

npm global

Model

openai-codex/gpt-5.4

Provider / routing chain

openclaw -> openai-codex OAuth -> https://chatgpt.com/backend-api

Additional provider/model setup details

  • Node version: v24.13.1
  • Main LLM path is openai-codex OAuth only
  • No non-Codex fallback is configured
  • openclaw models status --agent main --probe succeeds for the intended profile while runtime chats still fail
  • Per-agent probes for live handler agents also showed the intended Codex profile probing OK
  • One malformed main/agent/models.json entry (https:/chatgpt.com/backend-api) was found and fixed to https://chatgpt.com/backend-api, but the broader 2026.4.9 runtime failure persisted
  • SSE transport was forced in config and did not resolve the issue

Logs, screenshots, and evidence

Observed log patterns on 2026.4.9 include:

- `embedded_run_agent_end`
  - `error":"403 <html>..."`
  - `failoverReason":"auth"`
  - `provider":"openai-codex"`
  - `model":"gpt-5.4"`

- Followed by surfaced user-facing messages such as:
  - `LLM request failed: DNS lookup for the provider endpoint failed.`
  - `⚠️ API rate limit reached. Please try again later.`

Other observed evidence:
- VPS resolver checks for `api.openai.com` succeeded
- `openclaw models status --agent main --probe` succeeded on the intended Codex OAuth profile
- Downgrading from 2026.4.9 to 2026.3.28 restored normal behavior on the same VPS and same setup

Impact and severity

Affected users/systems/channels:

  • Control UI main
  • WhatsApp channel
  • WeCom channel
  • Same self-hosted Ubuntu VPS setup

Severity: High; blocks normal agent replies across the main UI and both tested channels

Frequency: Repeated across observed attempts on 2026.4.9

Consequence: Control UI becomes unusable for normal chat replies and channel users receive misleading error messages instead of agent responses

Additional information

  • Last known good version: 2026.3.28
  • First known bad version observed: 2026.4.9
  • Downgrading back to 2026.3.28 restored normal behavior
  • This issue reproduced across Control UI, WhatsApp, and WeCom on the same VPS
  • The repeated raw runtime symptom observed in logs was upstream 403 <html>..., while the surfaced user-facing error text varied

extent analysis

TL;DR

The most likely fix is to downgrade OpenClaw to version 2026.3.28, as it is the last known good version where the issue did not occur.

Guidance

  • Verify that the openai-codex OAuth setup and configuration are correct and unchanged between versions 2026.3.28 and 2026.4.9.
  • Check the gateway logs for any other error patterns or clues that might indicate a specific issue with the openai-codex integration in version 2026.4.9.
  • Test the openclaw models status --agent main --probe command again in version 2026.4.9 to confirm that the issue is not related to model configuration or probing.
  • Consider reaching out to the OpenClaw community or support team for further assistance, as this issue may be related to a regression or bug in version 2026.4.9.

Example

No specific code snippet is provided, as the issue seems to be related to a version-specific bug or regression.

Notes

The fact that downgrading to version 2026.3.28 resolves the issue suggests that the problem is likely related to a change or bug introduced in version 2026.4.9. However, without further information or debugging, it is difficult to pinpoint the exact cause.

Recommendation

Downgrade to version 2026.3.28, as it is the last known good version where the issue did not occur. This will allow for normal agent replies to function across the main UI and channels until a fix for version 2026.4.9 is available.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

On the same VPS and same openai-codex OAuth setup, Control UI, WhatsApp, and WeCom should return normal replies, as observed on OpenClaw 2026.3.28.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING