`openai-codex` / `gpt-5.5` should either: - emit a first stream event normally, - return a structured error if the backend rejects the request, or - fail in a way that allows Hermes to cleanly trigger fallback/retry without repeated first-byte stalls.

hermes - 💡(How to fix) Fix openai-codex / gpt-5.5 repeatedly produces no first byte after #31967/#32016

Error Message

Codex stream produced no bytes within TTFB cutoff (45s > 45s, model=gpt-5.5) Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens API call failed after 3 retries. [Errno 32] Broken pipe Scheduled job failed: RuntimeError: [Errno 32] Broken pipe

Code Example

No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.9s (attempt 1/3)...
⏳ Still working... (2 min elapsed — iteration 8/9999, receiving stream response)
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.3s (attempt 1/3)...
⏳ Retrying in 2.1s (attempt 1/3)...
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.

---

Codex stream produced no bytes within TTFB cutoff (45s > 45s, model=gpt-5.5)
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
API call failed after 3 retries. [Errno 32] Broken pipe
Scheduled job failed: RuntimeError: [Errno 32] Broken pipe

What happened

After the recent Codex timeout fixes, openai-codex / gpt-5.5 still frequently stalls before emitting the first stream event.

This is not the old context=~0 tokens / 300s stale-timeout behavior. My local checkout already includes the merged fixes from #31967 and #32016. Hermes now detects the stall earlier via the Codex TTFB watchdog, but the underlying request still often accepts the connection and emits no stream events.

Typical user-facing output:

No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.9s (attempt 1/3)...
⏳ Still working... (2 min elapsed — iteration 8/9999, receiving stream response)
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.3s (attempt 1/3)...
⏳ Retrying in 2.1s (attempt 1/3)...
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.

In one scheduled gateway run, the same failure family progressed like this:

Codex stream produced no bytes within TTFB cutoff (45s > 45s, model=gpt-5.5)
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
API call failed after 3 retries. [Errno 32] Broken pipe
Scheduled job failed: RuntimeError: [Errno 32] Broken pipe

Expected behavior

openai-codex / gpt-5.5 should either:

emit a first stream event normally,
return a structured error if the backend rejects the request, or
fail in a way that allows Hermes to cleanly trigger fallback/retry without repeated first-byte stalls.

Actual behavior

The backend frequently accepts the request but emits no stream events for 45s. Hermes kills the request and retries, but the same first-byte stall can repeat multiple times in a single turn.

When the retry eventually reaches the stale-call detector, the request can end with ReadError: [Errno 32] Broken pipe after Hermes closes the stalled connection.

Environment

Hermes checkout: local main
Local checkout includes:
- #31967 (fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults)
- #32016 (fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern)
Platform: macOS, launchd gateway
Provider: openai-codex
Model: gpt-5.5
Base URL: https://chatgpt.com/backend-api/codex
Affected surfaces: gateway sessions and scheduled agent runs
fallback_providers: none on the affected profile(s)

Why this seems distinct from the fixed issue

#31967 fixed the timeout accounting bug and lowered the default stale timeout from 300s to 90s. That fix appears to be active: logs now show realistic context estimates such as context=~45,779 tokens, not context=~0 tokens.

#32016 also appears active: the checkout contains the gpt-5.5 Codex silent-hang hint path.

The remaining issue is earlier in the request lifecycle: the Codex Responses stream often produces no first byte at all, so the 45s TTFB watchdog fires repeatedly.

Related issues / PRs

Related to #21444
Related to #32370
#31967 is already present locally
#32016 is already present locally
Possibly related to #29740, which proposes sanitizing reasoning, include, and store for gpt-5.5 on the Codex backend

Notes

This may still be backend-side, but it is severe enough in gateway/scheduled usage that the current retry behavior produces repeated stalls and occasional job failure. A clean structured backend rejection, a payload sanitizer, or a more reliable fallback trigger would make this much easier to operate.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix openai-codex / gpt-5.5 repeatedly produces no first byte after #31967/#32016

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

What happened

Expected behavior

Actual behavior

Environment

Why this seems distinct from the fixed issue

Related issues / PRs

Notes

FAQ

Expected behavior

Still need to ship something?

TRENDING