codex - 💡(How to fix) Fix Feature request: add a provider/API error hook for rate limits, 5xx, timeouts, and stream disconnects

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Codex hooks currently cover prompt/tool/permission/session/stop lifecycle events, but there is no hook for model/provider request failures such as 429, 503, timeout, EOF, connection reset, or stream disconnect.

This makes it hard to build reliable recovery, notification, backoff, or automation workflows around long-running Codex tasks, because Stop is not guaranteed to fire when the turn fails before normal completion.

Error Message

"message": "redacted/safe error summary", This hook should not expose raw credentials, full request bodies, or private provider payloads. A redacted error summary plus status/error kind is enough.

Root Cause

External workflow layers can then implement safe behavior without scraping transcripts:

  • notify the user or monitoring system
  • schedule backoff/retry
  • mark a long-running automation as provider-interrupted
  • safely resume the same session later
  • distinguish provider failure from user cancellation, task completion, or deterministic code/test failure

Code Example

{
  "hook_event_name": "ProviderError",
  "session_id": "...",
  "thread_id": "...",
  "turn_id": "...",
  "cwd": "...",
  "model": "...",
  "provider": "...",
  "status_code": 429,
  "error_kind": "rate_limit",
  "retry_after_ms": 60000,
  "message": "redacted/safe error summary",
  "will_retry": false,
  "attempt": 1
}
RAW_BUFFERClick to expand / collapse

Summary

Codex hooks currently cover prompt/tool/permission/session/stop lifecycle events, but there is no hook for model/provider request failures such as 429, 503, timeout, EOF, connection reset, or stream disconnect.

This makes it hard to build reliable recovery, notification, backoff, or automation workflows around long-running Codex tasks, because Stop is not guaranteed to fire when the turn fails before normal completion.

Proposed event

Add a hook event such as ProviderError or TurnError.

It should fire when a model/provider request fails after Codex has enough context to identify the current thread/session/turn, including cases like:

  • HTTP 429 / rate limit
  • HTTP 5xx / provider unavailable
  • timeout
  • unexpected EOF
  • connection reset
  • stream disconnected before completion

Suggested payload

{
  "hook_event_name": "ProviderError",
  "session_id": "...",
  "thread_id": "...",
  "turn_id": "...",
  "cwd": "...",
  "model": "...",
  "provider": "...",
  "status_code": 429,
  "error_kind": "rate_limit",
  "retry_after_ms": 60000,
  "message": "redacted/safe error summary",
  "will_retry": false,
  "attempt": 1
}

Why this matters

External workflow layers can then implement safe behavior without scraping transcripts:

  • notify the user or monitoring system
  • schedule backoff/retry
  • mark a long-running automation as provider-interrupted
  • safely resume the same session later
  • distinguish provider failure from user cancellation, task completion, or deterministic code/test failure

Important boundary

This hook should not expose raw credentials, full request bodies, or private provider payloads. A redacted error summary plus status/error kind is enough.

I am not asking for this to replace Codex's internal retry behavior. The goal is to expose a reliable lifecycle surface for integrations that need to react when a turn ends because the provider request failed before normal Stop handling.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING