openclaw - ✅(Solved) Fix [Bug]: openai-codex error classification: actual 403/rate-limit shown as 'DNS lookup failed' [1 pull requests, 2 comments, 3 participants]

openclaw2026-04-10 05:03:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#64092•Fetched 2026-04-11 06:16:23

View on GitHub

Comments

Participants

Timeline

Reactions

Author

itsclawdettebot-debug

Participants

GHesericsu

hellonewvisions-lang

itsclawdettebot-debug

Timeline (top)

commented ×2cross-referenced ×1subscribed ×1

Error Message

timestamp: 2026-04-10T05:04:38.436Z user-facing error: "LLM request failed: DNS lookup for the provider endpoint failed." rawErrorPreview: "403 <html>\n <head>\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..." failoverReason: "auth" status: (not present in this entry)

Root Cause

The real issue was an invalid OAuth session. Users cannot self-diagnose because the error message does not reflect the actual failure.

Fix Action

Fixed

Fixed by PR: openai-codex: fix auth scope handling and classify provider/runtime failures (https://github.com/openclaw/openclaw/pull/64286)

PR fix notes

PR #64286: openai-codex: fix auth scope handling and classify provider/runtime failures

Repository: openclaw/openclaw
Author: 100yenadmin
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/64286

Description (problem / solution / changelog)

Summary

This is PR 2 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64229.

It fixes the maintained-source OpenAI Codex OAuth scope gap in OpenClaw's login wrapper and adds a separate provider/runtime failure taxonomy that makes auth-scope, refresh, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid failures observable in logs and easier to explain to users.

What changed

normalize OpenAI Codex authorize URLs so the required scopes are always present:
- openid
- profile
- email
- offline_access
- model.request
- api.responses.write
add classifyProviderRuntimeFailureKind(...) as a typed provider/runtime failure classifier
keep the older failover-reason contract intact instead of widening it in this slice
thread providerRuntimeFailureKind through embedded-run observation fields and lifecycle logging
surface more truthful user-facing copy for:
- OAuth refresh failures
- missing OpenAI Codex scopes
- HTML 403 auth failures
- proxy/tunnel misroutes
- replay-invalid failures
add focused regressions for scope failures, refresh failures, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid paths

Why

GPT-5.4 / Codex failures in OpenClaw are still too easy to misdiagnose as generic model stops. This slice makes the auth/runtime layer tell the truth before we move on to tool-contract and parity-harness work.

Non-goals

does not implement tool compatibility work from #64230
does not implement permission truthfulness work from #64231
does not implement replay/liveness hardening from #64232
does not implement the benchmark harness from #64233
does not widen the generic failover-reason enum for every caller in this slice

Builds on prior groundwork

#45176
#48592
#53702
#55206
#44019

Validation

Focused checks run:

CI=1 pnpm exec vitest run src/commands/openai-codex-oauth.test.ts src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts src/agents/pi-embedded-error-observation.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts
repo hook gate during commit:
- pnpm check:no-conflict-markers
- pnpm tool-display:check
- pnpm check:host-env-policy:swift
- pnpm tsgo
- node scripts/prepare-extension-package-boundary-artifacts.mjs
- pnpm lint
- pnpm lint:webhook:no-low-level-body-read
- pnpm lint:auth:no-pairing-store-group
- pnpm lint:auth:pairing-account-scope

Linked issues

Closes #64229
Refs #64227
Refs #64133
Refs #64174
Refs #64092
Refs #57399
Refs #62672

Changed files

src/agents/failover-error.test.ts (modified, +10/-0)
src/agents/pi-embedded-error-observation.test.ts (modified, +14/-0)
src/agents/pi-embedded-error-observation.ts (modified, +23/-4)
src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts (modified, +67/-0)
src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +79/-0)
src/agents/pi-embedded-helpers.ts (modified, +2/-0)
src/agents/pi-embedded-helpers/errors.ts (modified, +219/-4)
src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +22/-0)
src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +16/-3)
src/commands/openai-codex-oauth.test.ts (modified, +28/-3)
src/plugins/provider-openai-codex-oauth.ts (modified, +40/-1)

Code Example

timestamp:   2026-04-10T05:04:38.436Z
user-facing error:  "LLM request failed: DNS lookup for the provider endpoint failed."
rawErrorPreview:    "403 <html>\\n  <head>\\n    <meta name=\\"viewport\\" content=\\"width=device-width, initial-scale=1\\" />\\n    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..."
failoverReason:     "auth"
status:             (not present in this entry)

---

timestamp:   2026-04-10T05:05:53.844Z
user-facing error:  "⚠️ API rate limit reached. Please try again later."
rawErrorPreview:    "403 <html>\\n  <head>\\n    <meta name=\\"viewport\\" content=\\"width=device-width, initial-scale=1\\" />\\n    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..."
failoverReason:     "auth"
status:             (not present in this entry)

---

timestamp:   2026-04-10T05:04:59.634Z
user-facing error:  "403 <html>\\n  <head>..."
rawErrorPreview:     same 403 HTML as above
failoverReason:     "auth"

---

timestamp:       2026-04-10T04:28:49.596Z
Embedded agent failed: "All models failed (2): openai-codex/gpt-5.4: LLM request failed: DNS lookup for the provider endpoint failed. (auth) | openai-codex/gpt-5.3-codex: LLM request failed: DNS lookup for the provider endpoint failed. (auth)"
failoverReason:  "auth"

---

403 <html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex;flex-direction:column;gap:2rem;height:100%;justify-content:center;width:100%}@keyframes enlarge-appear{0%{opacity:0;transform:scale(75%) rotate(-90deg)}to{opacity:1;transform:scale(100%) rotate(0deg)}…

RAW_BUFFERClick to expand / collapse

Raised on behalf of Chad J.

Bug Description

When openai-codex requests fail with a 403 HTTP response (HTML body), OpenClaw displays one of several misleading user-facing errors instead of correctly reporting a 403 auth failure. The raw response is visible in logs as rawErrorPreview but the user-facing message is wrong.

Critically: rawErrorPreview always contains an HTTP 403 HTML response, proving the endpoint was reached and responded. This definitively rules out DNS failure — yet users are told "DNS lookup failed."

Environment

OpenClaw: 2026.4.9
Provider: openai-codex
Auth: ChatGPT OAuth (expires in 9d per status check)
Models: gpt-5.4 and gpt-5.3-codex
Endpoint configured: https://chatgpt.com/backend-api/codex

Observed Evidence

All three entries below are from the same run, same session, same ChatGPT endpoint — yet showing completely different user-facing errors despite identical rawErrorPreview:

Case 1 — "DNS lookup failed"

timestamp:   2026-04-10T05:04:38.436Z
user-facing error:  "LLM request failed: DNS lookup for the provider endpoint failed."
rawErrorPreview:    "403 <html>\\n  <head>\\n    <meta name=\\"viewport\\" content=\\"width=device-width, initial-scale=1\\" />\\n    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..."
failoverReason:     "auth"
status:             (not present in this entry)

Case 2 — "API rate limit reached"

timestamp:   2026-04-10T05:05:53.844Z
user-facing error:  "⚠️ API rate limit reached. Please try again later."
rawErrorPreview:    "403 <html>\\n  <head>\\n    <meta name=\\"viewport\\" content=\\"width=device-width, initial-scale=1\\" />\\n    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..."
failoverReason:     "auth"
status:             (not present in this entry)

Case 3 — raw 403 HTML shown

timestamp:   2026-04-10T05:04:59.634Z
user-facing error:  "403 <html>\\n  <head>..."
rawErrorPreview:     same 403 HTML as above
failoverReason:     "auth"

Case 4 — gpt-5.4 variant

timestamp:       2026-04-10T04:28:49.596Z
Embedded agent failed: "All models failed (2): openai-codex/gpt-5.4: LLM request failed: DNS lookup for the provider endpoint failed. (auth) | openai-codex/gpt-5.3-codex: LLM request failed: DNS lookup for the provider endpoint failed. (auth)"
failoverReason:  "auth"

Sanitized Raw Snippet

The recurring rawErrorPreview:

403 <html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex;flex-direction:column;gap:2rem;height:100%;justify-content:center;width:100%}@keyframes enlarge-appear{0%{opacity:0;transform:scale(75%) rotate(-90deg)}to{opacity:1;transform:scale(100%) rotate(0deg)}…

This is a ChatGPT login/access page (CSS classes .container, .scale-appear, .logo visible). Possible challenge/login HTML response.

Key Distinction: This Was NOT DNS

The rawErrorPreview contains an HTTP 403 response body — an HTML page. A true DNS failure produces no HTTP response at all. The fact that rawErrorPreview contains a 403 HTML body means:

DNS resolution succeeded
A TCP connection was established
TLS handshake succeeded
An HTTP response was received (403 status)
The response body was an HTML login/challenge page

Failure Classification

Actual Condition	Correct User Message	Observed User Messages
True DNS/network failure	"Cannot reach provider"	—
HTTP 403 auth/session invalid	"ChatGPT session invalid — please re-authenticate"	"DNS lookup failed" (wrong), "API rate limit reached" (wrong)
HTTP 429 rate limit	"Rate limited — try again later"	"API rate limit reached" (sometimes correct)
Possible Cloudflare/challenge HTML	"Access challenge — try again"	(mixed with auth failures above)

Expected Behavior

Classify errors from the actual HTTP status/body first
Surface DNS failure only when no HTTP response was received
When rawErrorPreview contains HTML (login page, challenge), show the appropriate auth-related message
Rate-limit errors should be shown when HTTP 429 is actually returned, not from a 403 HTML body

Actual Impact

Users are sent down the wrong debugging path for hours:

Checking DNS configuration
Toggling VPN/Tailscale
Reconfiguring network settings
Verifying firewall rules

The real issue was an invalid OAuth session. Users cannot self-diagnose because the error message does not reflect the actual failure.

Relevant Existing Issues

#41282 (similar openai-codex timeout/auth issues)
#42311 (proxy/env handling for backend-api)

extent analysis

TL;DR

Update the error handling logic in OpenClaw to correctly parse and display HTTP 403 responses from the openai-codex provider, rather than misinterpreting them as DNS lookup failures.

Guidance

Review the OpenClaw code to identify where the rawErrorPreview is being parsed and the user-facing error message is being generated.
Modify the error handling logic to check for HTTP 403 responses and display a relevant auth-related error message, such as "ChatGPT session invalid — please re-authenticate".
Ensure that DNS failure errors are only surfaced when no HTTP response is received, rather than when a 403 HTML body is returned.
Consider adding additional logging or debugging statements to help identify the root cause of errors and improve the accuracy of user-facing error messages.

Example

if '403' in rawErrorPreview and 'html' in rawErrorPreview:
    userFacingError = "ChatGPT session invalid — please re-authenticate"
else:
    # existing error handling logic
    pass

Notes

The provided code snippet is a simplified example and may require modification to fit the actual OpenClaw codebase. Additionally, this fix may not address all possible error scenarios, and further testing and refinement may be necessary.

Recommendation

Apply the workaround by updating the error handling logic in OpenClaw to correctly handle HTTP 403 responses from the openai-codex provider. This will help ensure that users receive accurate and relevant error messages, rather than being misdirected to investigate DNS configuration issues.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #API rate limit #parallel task #integration issue #index setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.