openclaw - 💡(How to fix) Fix [Feature]: [4.29] Gateway startup hangs due to blocking provider auth (codex token exchange with no timeout) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76029Fetched 2026-05-03 04:43:10
View on GitHub
Comments
1
Participants
2
Timeline
8
Reactions
2
Author
Timeline (top)
mentioned ×2subscribed ×2closed ×1commented ×1

Provider auth during startup blocks event loop with no timeout when endpoint is unreachable

Root Cause

The practical consequences are severe:

  • Workflow is blocked: no model interactions, API calls, or control UI access.
  • Requires manual intervention (config editing, provider removal, or version downgrade).
  • Debugging the root cause took hours of log analysis and testing.
  • Causes cascading failures (stuck sessions, file lock timeouts, WebSocket disconnects) that may also affect session persistence and data integrity.
RAW_BUFFERClick to expand / collapse

Summary

Provider auth during startup blocks event loop with no timeout when endpoint is unreachable

Problem to solve

When the gateway starts, it probes every configured provider — even unused ones — and performs an auth/token exchange synchronously on the main thread. If a provider uses “auth: token” (e.g., codex pointing to chatgpt.com/backend-api) and the endpoint is slow or unreachable, the auth step hangs without any timeout. This blocks the Node.js event loop for 40+ seconds, causing event loop delays over 150 seconds, stuck sessions, file lock contention (up to 160s), and WebSocket disconnections. Users must manually delete the offending provider from config to recover, which is poor UX and hard to debug. Environment: Windows Server 2022 Standard 21H2, Node v24.15.0, OpenClaw v2026.4.29, restricted access to chatgpt.com.

Proposed solution

  1. Add a configurable timeout (default 10–15s) to all provider auth/health-check requests, so a single slow endpoint cannot freeze the gateway.
  2. Run provider initialization off the main thread (worker or async) to avoid blocking the event loop.
  3. Lazy‑initialize providers (only when actually used) or support an “enabled: false” flag to allow users to selectively skip providers without deleting them.

Alternatives considered

No response

Impact

All users of the gateway (local and any connected clients) are affected. The gateway becomes completely unresponsive for several minutes after startup, blocking all chat, management, and health-check requests. This is a consistent issue: whenever a provider with unreachable token auth (e.g., codex) exists in the configuration, the gateway freezes on every start. It is not intermittent — it happens on every startup until the offending provider is manually removed from the config file.

The practical consequences are severe:

  • Workflow is blocked: no model interactions, API calls, or control UI access.
  • Requires manual intervention (config editing, provider removal, or version downgrade).
  • Debugging the root cause took hours of log analysis and testing.
  • Causes cascading failures (stuck sessions, file lock timeouts, WebSocket disconnects) that may also affect session persistence and data integrity.
<img width="1920" height="1080" alt="Image" src="https://github.com/user-attachments/assets/710ea332-b32d-4a50-b668-39ca1dd7aedd" />

Evidence/examples

No response

Additional information

No response

extent analysis

TL;DR

Implement a configurable timeout for provider auth/health-check requests to prevent the gateway from freezing on startup when a provider's endpoint is unreachable.

Guidance

  • Add a timeout (e.g., 10-15 seconds) to all provider auth/health-check requests to prevent indefinite blocking.
  • Consider running provider initialization off the main thread (using a worker or async approach) to avoid blocking the event loop.
  • Implement lazy initialization of providers or add an "enabled: false" flag to allow users to selectively skip providers without deleting them.
  • Verify the fix by testing the gateway startup with a provider having an unreachable endpoint and checking for event loop delays or freezes.

Example

// Example of adding a timeout to a provider auth request
const authRequest = async (provider) => {
  const timeout = 15000; // 15 seconds
  const controller = new AbortController();
  const signal = controller.signal;
  setTimeout(() => controller.abort(), timeout);
  try {
    const response = await fetch(provider.authUrl, { signal });
    // Handle response
  } catch (error) {
    if (error.name === 'AbortError') {
      console.log('Auth request timed out');
    } else {
      throw error;
    }
  }
};

Notes

The proposed solution focuses on adding a timeout and running provider initialization off the main thread. However, the actual implementation may vary depending on the specific requirements and constraints of the gateway and its providers.

Recommendation

Apply the workaround by adding a configurable timeout to provider auth/health-check requests, as this directly addresses the issue of the gateway freezing on startup due to an unreachable provider endpoint.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Feature]: [4.29] Gateway startup hangs due to blocking provider auth (codex token exchange with no timeout) [1 comments, 2 participants]