openclaw - ✅(Solved) Fix [Bug]: 2026.4.29 — all openai/* embedded runs hang with zero tokens until timeout; direct curl to OpenAI works in 2s [1 pull requests, 5 comments, 5 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76174Fetched 2026-05-03 04:41:17
View on GitHub
Comments
5
Participants
5
Timeline
16
Reactions
2
Author
Timeline (top)
cross-referenced ×7commented ×5assigned ×1closed ×1

On OpenClaw 2026.4.29, every openai/* model embedded run hangs for the full timeout window with zero tokens emitted and ends with FailoverError: LLM request timed out. Direct curl from the same host to api.openai.com using the same API key returns a real response in ~1–2 seconds, so the OpenAI Platform itself is healthy and reachable. The gateway-internal OpenAI call never produces output.

This affects openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano (the last triggered by the active-memory plugin). Other providers (anthropic/claude-opus-4-7, google/gemini-3-flash) work fine in the same gateway. Only openai/* (direct API-key route, not codex OAuth) is broken.

Root Cause

On OpenClaw 2026.4.29, every openai/* model embedded run hangs for the full timeout window with zero tokens emitted and ends with FailoverError: LLM request timed out. Direct curl from the same host to api.openai.com using the same API key returns a real response in ~1–2 seconds, so the OpenAI Platform itself is healthy and reachable. The gateway-internal OpenAI call never produces output.

This affects openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano (the last triggered by the active-memory plugin). Other providers (anthropic/claude-opus-4-7, google/gemini-3-flash) work fine in the same gateway. Only openai/* (direct API-key route, not codex OAuth) is broken.

Fix Action

Fix / Workaround

  • Any pointer to a known issue / fix on this OpenClaw build for the OpenAI direct-API-key path
  • Or a diagnostics flag / DEBUG knob that would log whether the OpenAI HTTP request is actually being dispatched (and what response, if any, the SDK is receiving) — so we can localize whether this is in the OpenAI SDK, an undici/fetch pool issue, or further upstream

PR fix notes

PR #76324: fix: keep reading split SSE frames until a sanitized event is ready

Description (problem / solution / changelog)

Summary

  • keep reading the OpenAI SSE response stream until the sanitizer can emit at least one complete readable event
  • avoid returning early when the first chunk only contains a partial frame boundary
  • add a regression test covering a readable SSE frame split across chunk boundaries

Root cause

sanitizeOpenAISdkSseResponse() returned from pull() as soon as it failed to find a complete SSE boundary in the current buffer. When OpenAI delivered an event header in one chunk and the readable data: line in the next chunk, the sanitizer produced no output and stopped pulling, which could starve the embedded Responses path until it hit the LLM idle timeout.

Testing

  • verified the live Lab bundle end-to-end after patching the installed sanitizer: embedded openai/gpt-4o-mini runs returned normally with sanitization still enabled
  • ran node scripts/run-vitest.mjs run src/agents/provider-transport-fetch.test.ts

Fixes #76305 Related to #76174

Changed files

  • src/agents/provider-transport-fetch.test.ts (modified, +48/-0)
  • src/agents/provider-transport-fetch.ts (modified, +22/-16)

Code Example

$ curl -sS -w "HTTP: %{http_code}, time: %{time_total}s\n" \
    https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"hi"}]}'

{ ... valid response with content ... }
HTTP: 200, time: 2.161660s

---

[trace:embedded-run] prep stages: runId=<id> phase=stream-ready totalMs=12892
  stages=workspace-sandbox:1ms, skills:0ms, core-plugin-tools:5216ms,
  bootstrap-context:21ms, bundle-tools:1452ms, system-prompt:2476ms,
  session-resource-loader:681ms, agent-session:1ms, stream-setup:3044ms

[agent/embedded] embedded run timeout: runId=<id> timeoutMs=120000
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=openai/gpt-5.4 profile=-
[model-fallback/decision] decision=candidate_failed requested=openai/gpt-5.4 candidate=openai/gpt-5.4 reason=timeout next=none detail=LLM request timed out.
RAW_BUFFERClick to expand / collapse

Summary

On OpenClaw 2026.4.29, every openai/* model embedded run hangs for the full timeout window with zero tokens emitted and ends with FailoverError: LLM request timed out. Direct curl from the same host to api.openai.com using the same API key returns a real response in ~1–2 seconds, so the OpenAI Platform itself is healthy and reachable. The gateway-internal OpenAI call never produces output.

This affects openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano (the last triggered by the active-memory plugin). Other providers (anthropic/claude-opus-4-7, google/gemini-3-flash) work fine in the same gateway. Only openai/* (direct API-key route, not codex OAuth) is broken.

Environment

  • OpenClaw 2026.4.29 (a448042)
  • macOS 26.4.1 arm64 (Mac Studio M3 Ultra)
  • Node v25.9.0
  • Auth: direct OpenAI Platform API key (OPENAI_API_KEY env), no Codex OAuth
  • Provider config: models.providers.openai.timeoutSeconds: 600, baseUrl: https://api.openai.com/v1

Reproduction

  1. Configure agents.defaults.model.primary (or any sub-agent model) to openai/gpt-5.5 (also tested openai/gpt-5.4).
  2. Trigger any embedded run (sub-agent spawn, /model gpt5.5 from chat, or active-memory pre-reply).
  3. Run hangs through full timeout window. Zero input/output tokens. No HTTP response observed.

Tried with and without:

  • agents.defaults.models["openai/gpt-5.5"].streaming: false
  • agents.defaults.timeoutSeconds: 900
  • plugins.entries.active-memory.enabled: false

None of these affect the outcome.

Proof OpenAI itself is healthy from the same host

$ curl -sS -w "HTTP: %{http_code}, time: %{time_total}s\n" \
    https://api.openai.com/v1/chat/completions \
    -H "Authorization: Bearer $OPENAI_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"model":"gpt-5.5","messages":[{"role":"user","content":"hi"}]}'

{ ... valid response with content ... }
HTTP: 200, time: 2.161660s

DNS/network confirmed clean: api.openai.com resolves to 172.66.0.243 / 162.159.140.245, IPv4 + IPv6 reachable in < 100 ms, no proxy on user shell or gateway process, openclaw doctor clean (0 plugin errors, no warnings about OpenAI auth, runtime, or model config).

Gateway log evidence

Embedded run trace shows prep stages all complete normally and stream-ready fires, then nothing comes back from OpenAI until the embedded-run timeout aborts:

[trace:embedded-run] prep stages: runId=<id> phase=stream-ready totalMs=12892
  stages=workspace-sandbox:1ms, skills:0ms, core-plugin-tools:5216ms,
  bootstrap-context:21ms, bundle-tools:1452ms, system-prompt:2476ms,
  session-resource-loader:681ms, agent-session:1ms, stream-setup:3044ms

[agent/embedded] embedded run timeout: runId=<id> timeoutMs=120000
[agent/embedded] embedded run failover decision: stage=assistant decision=fallback_model reason=timeout from=openai/gpt-5.4 profile=-
[model-fallback/decision] decision=candidate_failed requested=openai/gpt-5.4 candidate=openai/gpt-5.4 reason=timeout next=none detail=LLM request timed out.

This pattern repeats across openai/gpt-5.5, openai/gpt-5.4, and openai/gpt-5.4-nano.

The active-memory plugin also fails in the same shape (its own internal gpt-5.4-nano calls time out), but disabling active-memory does NOT fix the user-visible embedded runs — the assistant-stage OpenAI call hangs independently.

Stats reported by the runtime for hung sub-agent runs: runtime 1m10s–6m46s • tokens 0 (in 0 / out 0) — i.e. no bytes ever come back from the OpenAI request before the watchdog fires.

What I'm asking for

  • Any pointer to a known issue / fix on this OpenClaw build for the OpenAI direct-API-key path
  • Or a diagnostics flag / DEBUG knob that would log whether the OpenAI HTTP request is actually being dispatched (and what response, if any, the SDK is receiving) — so we can localize whether this is in the OpenAI SDK, an undici/fetch pool issue, or further upstream

Happy to grab additional logs with whatever flag you suggest.

extent analysis

TL;DR

The issue is likely related to the OpenAI SDK or the undici/fetch pool, causing the OpenAI requests to hang and timeout, and enabling debug logging or using a diagnostics flag may help localize the problem.

Guidance

  • Check the OpenClaw documentation for any known issues or fixes related to the OpenAI direct-API-key path on build 2026.4.29.
  • Enable debug logging for the OpenAI SDK to see if the HTTP request is being dispatched and what response, if any, the SDK is receiving.
  • Investigate the undici/fetch pool configuration to ensure it is properly set up and not causing the requests to hang.
  • Try increasing the timeout value for the OpenAI requests to see if it makes a difference.

Example

No code snippet is provided as the issue is more related to configuration and debugging.

Notes

The issue seems to be specific to the OpenAI direct-API-key path and does not affect other providers. The fact that the curl command works fine from the same host suggests that the issue is not with the OpenAI Platform itself, but rather with the OpenClaw gateway or the OpenAI SDK.

Recommendation

Apply a workaround by enabling debug logging for the OpenAI SDK to gather more information about the issue, as the root cause is still unknown. This will help to determine if the problem is with the SDK, the undici/fetch pool, or further upstream.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: 2026.4.29 — all openai/* embedded runs hang with zero tokens until timeout; direct curl to OpenAI works in 2s [1 pull requests, 5 comments, 5 participants]