openclaw - ✅(Solved) Fix [Bug]: 2026.4.29 — all openai/* embedded runs hang with zero tokens until timeout; direct curl to OpenAI works in 2s [1 pull requests, 5 comments, 5 participants]

openclaw2026-05-02 17:18:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#76174•Fetched 2026-05-03 04:41:17

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

cross-referenced ×7commented ×5assigned ×1closed ×1

On OpenClaw 2026.4.29, every openai/* model embedded run hangs for the full timeout window with zero tokens emitted and ends with FailoverError: LLM request timed out. Direct curl from the same host to api.openai.com using the same API key returns a real response in ~1–2 seconds, so the OpenAI Platform itself is healthy and reachable. The gateway-internal OpenAI call never produces output.

This affects openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano (the last triggered by the active-memory plugin). Other providers (anthropic/claude-opus-4-7, google/gemini-3-flash) work fine in the same gateway. Only openai/* (direct API-key route, not codex OAuth) is broken.

Root Cause

Fix Action

Fix / Workaround

Any pointer to a known issue / fix on this OpenClaw build for the OpenAI direct-API-key path
Or a diagnostics flag / DEBUG knob that would log whether the OpenAI HTTP request is actually being dispatched (and what response, if any, the SDK is receiving) — so we can localize whether this is in the OpenAI SDK, an undici/fetch pool issue, or further upstream

PR fix notes

PR #76324: fix: keep reading split SSE frames until a sanitized event is ready

Repository: openclaw/openclaw
Author: andhai
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/76324

Description (problem / solution / changelog)

Summary

keep reading the OpenAI SSE response stream until the sanitizer can emit at least one complete readable event
avoid returning early when the first chunk only contains a partial frame boundary
add a regression test covering a readable SSE frame split across chunk boundaries

Root cause

sanitizeOpenAISdkSseResponse() returned from pull() as soon as it failed to find a complete SSE boundary in the current buffer. When OpenAI delivered an event header in one chunk and the readable data: line in the next chunk, the sanitizer produced no output and stopped pulling, which could starve the embedded Responses path until it hit the LLM idle timeout.

Testing

verified the live Lab bundle end-to-end after patching the installed sanitizer: embedded openai/gpt-4o-mini runs returned normally with sanitization still enabled
ran node scripts/run-vitest.mjs run src/agents/provider-transport-fetch.test.ts

Fixes #76305 Related to #76174

Changed files

src/agents/provider-transport-fetch.test.ts (modified, +48/-0)
src/agents/provider-transport-fetch.ts (modified, +22/-16)

Code Example

$ curl -sS -w "HTTP: %{http_code}, time: %{time_total}s\n" \
    https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"hi"}]}'

{ ... valid response with content ... }
HTTP: 200, time: 2.161660s

---

[trace:embedded-run] prep stages: runId=<id> phase=stream-ready totalMs=12892
  stages=workspace-sandbox:1ms, skills:0ms, core-plugin-tools:5216ms,
  bootstrap-context:21ms, bundle-tools:1452ms, system-prompt:2476ms,
  session-resource-loader:681ms, agent-session:1ms, stream-setup:3044ms

[agent/embedded] embedded run timeout: runId=<id> timeoutMs=120000
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=openai/gpt-5.4 profile=-
[model-fallback/decision] decision=candidate_failed requested=openai/gpt-5.4 candidate=openai/gpt-5.4 reason=timeout next=none detail=LLM request timed out.

RAW_BUFFERClick to expand / collapse

Summary

Environment

OpenClaw 2026.4.29 (a448042)
macOS 26.4.1 arm64 (Mac Studio M3 Ultra)
Node v25.9.0
Auth: direct OpenAI Platform API key (OPENAI_API_KEY env), no Codex OAuth
Provider config: models.providers.openai.timeoutSeconds: 600, baseUrl: https://api.openai.com/v1

Reproduction

Configure agents.defaults.model.primary (or any sub-agent model) to openai/gpt-5.5 (also tested openai/gpt-5.4).
Trigger any embedded run (sub-agent spawn, /model gpt5.5 from chat, or active-memory pre-reply).
Run hangs through full timeout window. Zero input/output tokens. No HTTP response observed.

Tried with and without:

agents.defaults.models["openai/gpt-5.5"].streaming: false
agents.defaults.timeoutSeconds: 900
plugins.entries.active-memory.enabled: false

None of these affect the outcome.

Proof OpenAI itself is healthy from the same host

$ curl -sS -w "HTTP: %{http_code}, time: %{time_total}s\n" \
    https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"hi"}]}'

{ ... valid response with content ... }
HTTP: 200, time: 2.161660s

DNS/network confirmed clean: api.openai.com resolves to 172.66.0.243 / 162.159.140.245, IPv4 + IPv6 reachable in < 100 ms, no proxy on user shell or gateway process, openclaw doctor clean (0 plugin errors, no warnings about OpenAI auth, runtime, or model config).

Gateway log evidence

Embedded run trace shows prep stages all complete normally and stream-ready fires, then nothing comes back from OpenAI until the embedded-run timeout aborts:

[trace:embedded-run] prep stages: runId=<id> phase=stream-ready totalMs=12892
  stages=workspace-sandbox:1ms, skills:0ms, core-plugin-tools:5216ms,
  bootstrap-context:21ms, bundle-tools:1452ms, system-prompt:2476ms,
  session-resource-loader:681ms, agent-session:1ms, stream-setup:3044ms

[agent/embedded] embedded run timeout: runId=<id> timeoutMs=120000
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=openai/gpt-5.4 profile=-
[model-fallback/decision] decision=candidate_failed requested=openai/gpt-5.4 candidate=openai/gpt-5.4 reason=timeout next=none detail=LLM request timed out.

This pattern repeats across openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano.

The active-memory plugin also fails in the same shape (its own internal gpt-5.4-nano calls time out), but disabling active-memory does NOT fix the user-visible embedded runs — the assistant-stage OpenAI call hangs independently.

Stats reported by the runtime for hung sub-agent runs: runtime 1m10s–6m46s • tokens 0 (in 0 / out 0) — i.e. no bytes ever come back from the OpenAI request before the watchdog fires.

What I'm asking for

Any pointer to a known issue / fix on this OpenClaw build for the OpenAI direct-API-key path
Or a diagnostics flag / DEBUG knob that would log whether the OpenAI HTTP request is actually being dispatched (and what response, if any, the SDK is receiving) — so we can localize whether this is in the OpenAI SDK, an undici/fetch pool issue, or further upstream

Happy to grab additional logs with whatever flag you suggest.

extent analysis

TL;DR

The issue is likely related to the OpenAI SDK or the undici/fetch pool, causing the OpenAI requests to hang and timeout, and enabling debug logging or using a diagnostics flag may help localize the problem.

Guidance

Check the OpenClaw documentation for any known issues or fixes related to the OpenAI direct-API-key path on build 2026.4.29.
Enable debug logging for the OpenAI SDK to see if the HTTP request is being dispatched and what response, if any, the SDK is receiving.
Investigate the undici/fetch pool configuration to ensure it is properly set up and not causing the requests to hang.
Try increasing the timeout value for the OpenAI requests to see if it makes a difference.

Example

No code snippet is provided as the issue is more related to configuration and debugging.

Notes

The issue seems to be specific to the OpenAI direct-API-key path and does not affect other providers. The fact that the curl command works fine from the same host suggests that the issue is not with the OpenAI Platform itself, but rather with the OpenClaw gateway or the OpenAI SDK.

Recommendation

Apply a workaround by enabling debug logging for the OpenAI SDK to gather more information about the issue, as the root cause is still unknown. This will help to determine if the problem is with the SDK, the undici/fetch pool, or further upstream.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #tool integration #LLM response #prompt template #agent execution

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix [Bug]: 2026.4.29 — all openai/* embedded runs hang with zero tokens until timeout; direct curl to OpenAI works in 2s [1 pull requests, 5 comments, 5 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #76324: fix: keep reading split SSE frames until a sanitized event is ready

Description (problem / solution / changelog)

Summary

Root cause

Testing

Changed files

Code Example

Summary

Environment

Reproduction

Proof OpenAI itself is healthy from the same host

Gateway log evidence

What I'm asking for

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix [Bug]: 2026.4.29 — all openai/* embedded runs hang with zero tokens until timeout; direct curl to OpenAI works in 2s [1 pull requests, 5 comments, 5 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #76324: fix: keep reading split SSE frames until a sanitized event is ready

Description (problem / solution / changelog)

Summary

Root cause

Testing

Changed files

Code Example

Summary

Environment

Reproduction

Proof OpenAI itself is healthy from the same host

Gateway log evidence

What I'm asking for

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING