openclaw - ✅(Solved) Fix [Bug]: openai-codex error classification: actual 403/rate-limit shown as 'DNS lookup failed' [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#64092Fetched 2026-04-11 06:16:23
View on GitHub
Comments
2
Participants
3
Timeline
4
Reactions
0
Timeline (top)
commented ×2cross-referenced ×1subscribed ×1

Error Message

timestamp: 2026-04-10T05:04:38.436Z user-facing error: "LLM request failed: DNS lookup for the provider endpoint failed." rawErrorPreview: "403 <html>\n <head>\n <meta name=\"viewport\" content=\"width=device-width, initial-scale=1\" />\n <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..." failoverReason: "auth" status: (not present in this entry)

Root Cause

The real issue was an invalid OAuth session. Users cannot self-diagnose because the error message does not reflect the actual failure.

Fix Action

Fixed

PR fix notes

PR #64286: openai-codex: fix auth scope handling and classify provider/runtime failures

Description (problem / solution / changelog)

Summary

This is PR 2 of the GPT-5.4 / Codex agentic runtime parity program tracked in #64227 and scoped by #64229.

It fixes the maintained-source OpenAI Codex OAuth scope gap in OpenClaw's login wrapper and adds a separate provider/runtime failure taxonomy that makes auth-scope, refresh, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid failures observable in logs and easier to explain to users.

What changed

  • normalize OpenAI Codex authorize URLs so the required scopes are always present:
    • openid
    • profile
    • email
    • offline_access
    • model.request
    • api.responses.write
  • add classifyProviderRuntimeFailureKind(...) as a typed provider/runtime failure classifier
  • keep the older failover-reason contract intact instead of widening it in this slice
  • thread providerRuntimeFailureKind through embedded-run observation fields and lifecycle logging
  • surface more truthful user-facing copy for:
    • OAuth refresh failures
    • missing OpenAI Codex scopes
    • HTML 403 auth failures
    • proxy/tunnel misroutes
    • replay-invalid failures
  • add focused regressions for scope failures, refresh failures, HTML 403, proxy, DNS, timeout, schema, sandbox-blocked, and replay-invalid paths

Why

GPT-5.4 / Codex failures in OpenClaw are still too easy to misdiagnose as generic model stops. This slice makes the auth/runtime layer tell the truth before we move on to tool-contract and parity-harness work.

Non-goals

  • does not implement tool compatibility work from #64230
  • does not implement permission truthfulness work from #64231
  • does not implement replay/liveness hardening from #64232
  • does not implement the benchmark harness from #64233
  • does not widen the generic failover-reason enum for every caller in this slice

Builds on prior groundwork

  • #45176
  • #48592
  • #53702
  • #55206
  • #44019

Validation

Focused checks run:

  • CI=1 pnpm exec vitest run src/commands/openai-codex-oauth.test.ts src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts src/agents/pi-embedded-error-observation.test.ts src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts
  • repo hook gate during commit:
    • pnpm check:no-conflict-markers
    • pnpm tool-display:check
    • pnpm check:host-env-policy:swift
    • pnpm tsgo
    • node scripts/prepare-extension-package-boundary-artifacts.mjs
    • pnpm lint
    • pnpm lint:webhook:no-low-level-body-read
    • pnpm lint:auth:no-pairing-store-group
    • pnpm lint:auth:pairing-account-scope

Linked issues

  • Closes #64229
  • Refs #64227
  • Refs #64133
  • Refs #64174
  • Refs #64092
  • Refs #57399
  • Refs #62672

Changed files

  • src/agents/failover-error.test.ts (modified, +10/-0)
  • src/agents/pi-embedded-error-observation.test.ts (modified, +14/-0)
  • src/agents/pi-embedded-error-observation.ts (modified, +23/-4)
  • src/agents/pi-embedded-helpers.formatassistanterrortext.test.ts (modified, +67/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +79/-0)
  • src/agents/pi-embedded-helpers.ts (modified, +2/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +219/-4)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.test.ts (modified, +22/-0)
  • src/agents/pi-embedded-subscribe.handlers.lifecycle.ts (modified, +16/-3)
  • src/commands/openai-codex-oauth.test.ts (modified, +28/-3)
  • src/plugins/provider-openai-codex-oauth.ts (modified, +40/-1)

Code Example

timestamp:   2026-04-10T05:04:38.436Z
user-facing error:  "LLM request failed: DNS lookup for the provider endpoint failed."
rawErrorPreview:    "403 <html>\\n  <head>\\n    <meta name=\\"viewport\\" content=\\"width=device-width, initial-scale=1\\" />\\n    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..."
failoverReason:     "auth"
status:             (not present in this entry)

---

timestamp:   2026-04-10T05:05:53.844Z
user-facing error:  "⚠️ API rate limit reached. Please try again later."
rawErrorPreview:    "403 <html>\\n  <head>\\n    <meta name=\\"viewport\\" content=\\"width=device-width, initial-scale=1\\" />\\n    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..."
failoverReason:     "auth"
status:             (not present in this entry)

---

timestamp:   2026-04-10T05:04:59.634Z
user-facing error:  "403 <html>\\n  <head>..."
rawErrorPreview:     same 403 HTML as above
failoverReason:     "auth"

---

timestamp:       2026-04-10T04:28:49.596Z
Embedded agent failed: "All models failed (2): openai-codex/gpt-5.4: LLM request failed: DNS lookup for the provider endpoint failed. (auth) | openai-codex/gpt-5.3-codex: LLM request failed: DNS lookup for the provider endpoint failed. (auth)"
failoverReason:  "auth"

---

403 <html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex;flex-direction:column;gap:2rem;height:100%;justify-content:center;width:100%}@keyframes enlarge-appear{0%{opacity:0;transform:scale(75%) rotate(-90deg)}to{opacity:1;transform:scale(100%) rotate(0deg)}
RAW_BUFFERClick to expand / collapse

Raised on behalf of Chad J.

Bug Description

When openai-codex requests fail with a 403 HTTP response (HTML body), OpenClaw displays one of several misleading user-facing errors instead of correctly reporting a 403 auth failure. The raw response is visible in logs as rawErrorPreview but the user-facing message is wrong.

Critically: rawErrorPreview always contains an HTTP 403 HTML response, proving the endpoint was reached and responded. This definitively rules out DNS failure — yet users are told "DNS lookup failed."

Environment

  • OpenClaw: 2026.4.9
  • Provider: openai-codex
  • Auth: ChatGPT OAuth (expires in 9d per status check)
  • Models: gpt-5.4 and gpt-5.3-codex
  • Endpoint configured: https://chatgpt.com/backend-api/codex

Observed Evidence

All three entries below are from the same run, same session, same ChatGPT endpoint — yet showing completely different user-facing errors despite identical rawErrorPreview:

Case 1 — "DNS lookup failed"

timestamp:   2026-04-10T05:04:38.436Z
user-facing error:  "LLM request failed: DNS lookup for the provider endpoint failed."
rawErrorPreview:    "403 <html>\\n  <head>\\n    <meta name=\\"viewport\\" content=\\"width=device-width, initial-scale=1\\" />\\n    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..."
failoverReason:     "auth"
status:             (not present in this entry)

Case 2 — "API rate limit reached"

timestamp:   2026-04-10T05:05:53.844Z
user-facing error:  "⚠️ API rate limit reached. Please try again later."
rawErrorPreview:    "403 <html>\\n  <head>\\n    <meta name=\\"viewport\\" content=\\"width=device-width, initial-scale=1\\" />\\n    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex..."
failoverReason:     "auth"
status:             (not present in this entry)

Case 3 — raw 403 HTML shown

timestamp:   2026-04-10T05:04:59.634Z
user-facing error:  "403 <html>\\n  <head>..."
rawErrorPreview:     same 403 HTML as above
failoverReason:     "auth"

Case 4 — gpt-5.4 variant

timestamp:       2026-04-10T04:28:49.596Z
Embedded agent failed: "All models failed (2): openai-codex/gpt-5.4: LLM request failed: DNS lookup for the provider endpoint failed. (auth) | openai-codex/gpt-5.3-codex: LLM request failed: DNS lookup for the provider endpoint failed. (auth)"
failoverReason:  "auth"

Sanitized Raw Snippet

The recurring rawErrorPreview:

403 <html>
  <head>
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style global>body{font-family:Arial,Helvetica,sans-serif}.container{align-items:center;display:flex;flex-direction:column;gap:2rem;height:100%;justify-content:center;width:100%}@keyframes enlarge-appear{0%{opacity:0;transform:scale(75%) rotate(-90deg)}to{opacity:1;transform:scale(100%) rotate(0deg)}…

This is a ChatGPT login/access page (CSS classes .container, .scale-appear, .logo visible). Possible challenge/login HTML response.

Key Distinction: This Was NOT DNS

The rawErrorPreview contains an HTTP 403 response body — an HTML page. A true DNS failure produces no HTTP response at all. The fact that rawErrorPreview contains a 403 HTML body means:

  • DNS resolution succeeded
  • A TCP connection was established
  • TLS handshake succeeded
  • An HTTP response was received (403 status)
  • The response body was an HTML login/challenge page

Failure Classification

Actual ConditionCorrect User MessageObserved User Messages
True DNS/network failure"Cannot reach provider"
HTTP 403 auth/session invalid"ChatGPT session invalid — please re-authenticate""DNS lookup failed" (wrong), "API rate limit reached" (wrong)
HTTP 429 rate limit"Rate limited — try again later""API rate limit reached" (sometimes correct)
Possible Cloudflare/challenge HTML"Access challenge — try again"(mixed with auth failures above)

Expected Behavior

  1. Classify errors from the actual HTTP status/body first
  2. Surface DNS failure only when no HTTP response was received
  3. When rawErrorPreview contains HTML (login page, challenge), show the appropriate auth-related message
  4. Rate-limit errors should be shown when HTTP 429 is actually returned, not from a 403 HTML body

Actual Impact

Users are sent down the wrong debugging path for hours:

  • Checking DNS configuration
  • Toggling VPN/Tailscale
  • Reconfiguring network settings
  • Verifying firewall rules

The real issue was an invalid OAuth session. Users cannot self-diagnose because the error message does not reflect the actual failure.

Relevant Existing Issues

  • #41282 (similar openai-codex timeout/auth issues)
  • #42311 (proxy/env handling for backend-api)

extent analysis

TL;DR

Update the error handling logic in OpenClaw to correctly parse and display HTTP 403 responses from the openai-codex provider, rather than misinterpreting them as DNS lookup failures.

Guidance

  • Review the OpenClaw code to identify where the rawErrorPreview is being parsed and the user-facing error message is being generated.
  • Modify the error handling logic to check for HTTP 403 responses and display a relevant auth-related error message, such as "ChatGPT session invalid — please re-authenticate".
  • Ensure that DNS failure errors are only surfaced when no HTTP response is received, rather than when a 403 HTML body is returned.
  • Consider adding additional logging or debugging statements to help identify the root cause of errors and improve the accuracy of user-facing error messages.

Example

if '403' in rawErrorPreview and 'html' in rawErrorPreview:
    userFacingError = "ChatGPT session invalid — please re-authenticate"
else:
    # existing error handling logic
    pass

Notes

The provided code snippet is a simplified example and may require modification to fit the actual OpenClaw codebase. Additionally, this fix may not address all possible error scenarios, and further testing and refinement may be necessary.

Recommendation

Apply the workaround by updating the error handling logic in OpenClaw to correctly handle HTTP 403 responses from the openai-codex provider. This will help ensure that users receive accurate and relevant error messages, rather than being misdirected to investigate DNS configuration issues.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING