hermes - 💡(How to fix) Fix openai-codex / gpt-5.5 repeatedly produces no first byte after #31967/#32016

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

Codex stream produced no bytes within TTFB cutoff (45s > 45s, model=gpt-5.5) Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens API call failed after 3 retries. [Errno 32] Broken pipe Scheduled job failed: RuntimeError: [Errno 32] Broken pipe

Code Example

No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
 Retrying in 2.9s (attempt 1/3)...
Still working... (2 min elapsed — iteration 8/9999, receiving stream response)
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
 Retrying in 2.3s (attempt 1/3)...
Retrying in 2.1s (attempt 1/3)...
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.

---

Codex stream produced no bytes within TTFB cutoff (45s > 45s, model=gpt-5.5)
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
API call failed after 3 retries. [Errno 32] Broken pipe
Scheduled job failed: RuntimeError: [Errno 32] Broken pipe
RAW_BUFFERClick to expand / collapse

What happened

After the recent Codex timeout fixes, openai-codex / gpt-5.5 still frequently stalls before emitting the first stream event.

This is not the old context=~0 tokens / 300s stale-timeout behavior. My local checkout already includes the merged fixes from #31967 and #32016. Hermes now detects the stall earlier via the Codex TTFB watchdog, but the underlying request still often accepts the connection and emits no stream events.

Typical user-facing output:

No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.9s (attempt 1/3)...
⏳ Still working... (2 min elapsed — iteration 8/9999, receiving stream response)
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.
⏳ Retrying in 2.3s (attempt 1/3)...
⏳ Retrying in 2.1s (attempt 1/3)...
⚠️ No first byte from provider in 45s (codex stream, model: gpt-5.5). Reconnecting.

In one scheduled gateway run, the same failure family progressed like this:

Codex stream produced no bytes within TTFB cutoff (45s > 45s, model=gpt-5.5)
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
Non-streaming API call stale for 90s (threshold 90s). model=gpt-5.5 context=~45,779 tokens
API call failed after 3 retries. [Errno 32] Broken pipe
Scheduled job failed: RuntimeError: [Errno 32] Broken pipe

Expected behavior

openai-codex / gpt-5.5 should either:

  • emit a first stream event normally,
  • return a structured error if the backend rejects the request, or
  • fail in a way that allows Hermes to cleanly trigger fallback/retry without repeated first-byte stalls.

Actual behavior

The backend frequently accepts the request but emits no stream events for 45s. Hermes kills the request and retries, but the same first-byte stall can repeat multiple times in a single turn.

When the retry eventually reaches the stale-call detector, the request can end with ReadError: [Errno 32] Broken pipe after Hermes closes the stalled connection.

Environment

  • Hermes checkout: local main
  • Local checkout includes:
    • #31967 (fix(codex): size and propagate timeouts for Responses-API requests; lower stale defaults)
    • #32016 (fix(codex): surface actionable hint when stale-call detector fires on known silent-reject pattern)
  • Platform: macOS, launchd gateway
  • Provider: openai-codex
  • Model: gpt-5.5
  • Base URL: https://chatgpt.com/backend-api/codex
  • Affected surfaces: gateway sessions and scheduled agent runs
  • fallback_providers: none on the affected profile(s)

Why this seems distinct from the fixed issue

#31967 fixed the timeout accounting bug and lowered the default stale timeout from 300s to 90s. That fix appears to be active: logs now show realistic context estimates such as context=~45,779 tokens, not context=~0 tokens.

#32016 also appears active: the checkout contains the gpt-5.5 Codex silent-hang hint path.

The remaining issue is earlier in the request lifecycle: the Codex Responses stream often produces no first byte at all, so the 45s TTFB watchdog fires repeatedly.

Related issues / PRs

  • Related to #21444
  • Related to #32370
  • #31967 is already present locally
  • #32016 is already present locally
  • Possibly related to #29740, which proposes sanitizing reasoning, include, and store for gpt-5.5 on the Codex backend

Notes

This may still be backend-side, but it is severe enough in gateway/scheduled usage that the current retry behavior produces repeated stalls and occasional job failure. A clean structured backend rejection, a payload sanitizer, or a more reliable fallback trigger would make this much easier to operate.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

openai-codex / gpt-5.5 should either:

  • emit a first stream event normally,
  • return a structured error if the backend rejects the request, or
  • fail in a way that allows Hermes to cleanly trigger fallback/retry without repeated first-byte stalls.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING