hermes - ✅(Solved) Fix Bug: custom keepalive transport breaks chatgpt codex backend [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

openai-codex requests to https://chatgpt.com/backend-api/codex can fail with:

APIConnectionError: Connection error.

The current keepalive transport injection inside run_agent.py::_create_openai_client() is safe for regular OpenAI-compatible endpoints, but it is not compatible with the ChatGPT Codex backend. When Hermes injects a custom httpx.HTTPTransport(socket_options=...), the TLS handshake to chatgpt.com is reset by peer before the request reaches application-level validation.

This is distinct from the earlier stale-client regression fixed in #11033 / #11070, and also distinct from the proxy-env regression addressed in #12008. The remaining bug is endpoint-specific: the Codex backend fails only on the custom keepalive transport path.

Error Message

APIConnectionError: Connection error.

Root Cause

_create_openai_client() currently injects a custom http_client whenever one is not already present:

if "http_client" not in client_kwargs:
    keepalive_http = self._build_keepalive_http_client()
    if keepalive_http is not None:
        client_kwargs["http_client"] = keepalive_http

That path works for most OpenAI-compatible endpoints, but chatgpt.com/backend-api/codex is more strict. With the keepalive socket options enabled, the TLS handshake is reset ([Errno 54] Connection reset by peer).

Fix Action

Fix / Workaround

I have a local patch + regression tests ready and can open a draft PR that references this issue.

PR fix notes

PR #12953: fix(codex): avoid custom keepalive transport on chatgpt backend

Description (problem / solution / changelog)

Summary

This draft fixes a Codex-specific transport regression in _create_openai_client().

Hermes currently injects a custom httpx.HTTPTransport(socket_options=...) to enable TCP keepalives. That path is desirable for regular OpenAI-compatible endpoints, but it is not compatible with the ChatGPT Codex backend at https://chatgpt.com/backend-api/codex: the TLS handshake gets reset before the request reaches normal API validation.

This PR keeps the existing keepalive behavior for normal OpenAI-compatible endpoints, but skips custom transport injection for the Codex backend.

Closes #12952

Why this PR exists

We already fixed two nearby regressions in this area:

  • stale http_client reuse across requests (#11033 / #11070)
  • proxy env handling when a custom transport is injected (#12008)

What remained was a third, narrower issue: even with fresh clients and correct proxy behavior, the Codex backend itself rejects the injected keepalive transport.

Reproduction / evidence

A minimal live httpx comparison shows the split clearly:

default httpx client  -> 400 {"detail":"Input must be a list"}
custom keepalive path -> ConnectError('[Errno 54] Connection reset by peer')

That matters because the default client is reaching the backend successfully and receiving a normal application-level validation error, while the custom keepalive path fails earlier during connection setup.

Root cause

_create_openai_client() currently injects a keepalive-enabled http_client whenever one is not already present.

That is fine for normal OpenAI-compatible APIs, but it is too aggressive for Codex. When base_url points at chatgpt.com/backend-api/codex (or the provider is openai-codex), we need to let the SDK use its default transport instead of forcing a custom HTTPTransport(socket_options=...).

Fix

Scope the bypass narrowly to Codex:

  • skip keepalive injection when provider == "openai-codex"
  • also skip when base_url starts with https://chatgpt.com/backend-api/codex

This keeps the current behavior unchanged for non-Codex OpenAI-compatible endpoints.

Tests

Focused regression coverage now pins both sides of the branch:

  1. non-Codex OpenAI-compatible endpoints still get a keepalive-enabled httpx.Client
  2. proxy env handling for non-Codex endpoints still works
  3. Codex provider path does not inject a custom http_client
  4. Codex base URL path does not inject a custom http_client
  5. Codex paths do not even call _build_keepalive_http_client()

Run locally with:

source venv/bin/activate
pytest -q \
  tests/run_agent/test_create_openai_client_proxy_env.py \
  tests/run_agent/test_create_openai_client_kwargs_isolation.py \
  tests/run_agent/test_create_openai_client_reuse.py

Local result:

9 passed in 5.57s

Scope / risk

Low risk:

  • behavior change is limited to the Codex backend only
  • no change to Anthropic / Gemini / OpenRouter / generic OpenAI-compatible paths
  • no change to the existing keepalive builder itself
  • no change to the stale-client or proxy regression fixes

Follow-up ideas

If maintainers want, the Codex-backend detection could later be extracted into a small helper so transport selection is easier to reason about and test in one place.

Changed files

  • run_agent.py (modified, +10/-1)
  • tests/run_agent/test_create_openai_client_proxy_env.py (modified, +92/-40)

Code Example

APIConnectionError: Connection error.

---

import json
import httpx
import socket
from pathlib import Path

obj = json.loads((Path.home() / ".hermes" / "auth.json").read_text())
tok = obj["providers"]["openai-codex"]["tokens"]["access_token"]
headers = {
    "Authorization": f"Bearer {tok}",
    "User-Agent": "codex_cli_rs/0.0.0 (Hermes Agent)",
    "originator": "codex_cli_rs",
    "Content-Type": "application/json",
}
url = "https://chatgpt.com/backend-api/codex/responses"
payload = {
    "model": "gpt-5.4",
    "instructions": "Say hello",
    "input": "hello",
    "stream": True,
    "store": False,
}

with httpx.Client(timeout=20.0, http2=False) as c:
    r = c.post(url, headers=headers, json=payload)
    print("default", r.status_code, r.text[:120])

transport = httpx.HTTPTransport(socket_options=[
    (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
])
with httpx.Client(timeout=20.0, transport=transport, http2=False) as c:
    r = c.post(url, headers=headers, json=payload)
    print("custom", r.status_code, r.text[:120])

---

default 400 {"detail":"Input must be a list"}
custom ConnectError('[Errno 54] Connection reset by peer')

---

if "http_client" not in client_kwargs:
    keepalive_http = self._build_keepalive_http_client()
    if keepalive_http is not None:
        client_kwargs["http_client"] = keepalive_http
RAW_BUFFERClick to expand / collapse

Summary

openai-codex requests to https://chatgpt.com/backend-api/codex can fail with:

APIConnectionError: Connection error.

The current keepalive transport injection inside run_agent.py::_create_openai_client() is safe for regular OpenAI-compatible endpoints, but it is not compatible with the ChatGPT Codex backend. When Hermes injects a custom httpx.HTTPTransport(socket_options=...), the TLS handshake to chatgpt.com is reset by peer before the request reaches application-level validation.

This is distinct from the earlier stale-client regression fixed in #11033 / #11070, and also distinct from the proxy-env regression addressed in #12008. The remaining bug is endpoint-specific: the Codex backend fails only on the custom keepalive transport path.

Affected versions / baseline

  • Local runtime version: Hermes Agent v0.10.0 (2026.4.16)
  • Comparison baseline: official tag v2026.4.16
  • Keepalive was introduced after that tag in:
    • 12b109b6640a573abf685d3c881cab2a9fc5c3aafix: enable TCP keepalives to detect dead provider connections (#10324) (#10933)
    • reverted in e07dbde582e6c80f80eb0d3040add8331832a87b
    • re-landed in 8c478983ed0ec5609212950d5044398dd4d27a5afix: enable TCP keepalives to detect dead provider connections (#10324) (#11277)
    • later refactored into _build_keepalive_http_client() in d393104bad62700d8e33de003502b3f74854151a

Symptoms

  • provider=openai-codex
  • model=gpt-5.4
  • base_url=https://chatgpt.com/backend-api/codex
  • Hermes retries with APIConnectionError: Connection error.
  • Switching away from the injected keepalive transport restores reachability immediately

Reproduction

A minimal httpx comparison shows the issue without needing the full gateway stack:

import json
import httpx
import socket
from pathlib import Path

obj = json.loads((Path.home() / ".hermes" / "auth.json").read_text())
tok = obj["providers"]["openai-codex"]["tokens"]["access_token"]
headers = {
    "Authorization": f"Bearer {tok}",
    "User-Agent": "codex_cli_rs/0.0.0 (Hermes Agent)",
    "originator": "codex_cli_rs",
    "Content-Type": "application/json",
}
url = "https://chatgpt.com/backend-api/codex/responses"
payload = {
    "model": "gpt-5.4",
    "instructions": "Say hello",
    "input": "hello",
    "stream": True,
    "store": False,
}

with httpx.Client(timeout=20.0, http2=False) as c:
    r = c.post(url, headers=headers, json=payload)
    print("default", r.status_code, r.text[:120])

transport = httpx.HTTPTransport(socket_options=[
    (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
])
with httpx.Client(timeout=20.0, transport=transport, http2=False) as c:
    r = c.post(url, headers=headers, json=payload)
    print("custom", r.status_code, r.text[:120])

Observed result locally:

default 400 {"detail":"Input must be a list"}
custom ConnectError('[Errno 54] Connection reset by peer')

The key point is that the default client reaches the backend and gets a normal protocol-validation response, while the keepalive transport fails during connection setup.

Root cause

_create_openai_client() currently injects a custom http_client whenever one is not already present:

if "http_client" not in client_kwargs:
    keepalive_http = self._build_keepalive_http_client()
    if keepalive_http is not None:
        client_kwargs["http_client"] = keepalive_http

That path works for most OpenAI-compatible endpoints, but chatgpt.com/backend-api/codex is more strict. With the keepalive socket options enabled, the TLS handshake is reset ([Errno 54] Connection reset by peer).

Proposed fix

Skip keepalive http_client injection for the Codex backend and let the OpenAI SDK construct its default transport:

  • match provider == "openai-codex"
  • or match base_url.startswith("https://chatgpt.com/backend-api/codex")

This keeps the keepalive optimization for normal OpenAI-compatible endpoints while restoring compatibility for Codex.

Why this is safe

  • The change is narrowly scoped to the Codex backend only.
  • Existing keepalive behavior remains unchanged for standard OpenAI-compatible APIs.
  • Proxy-related behavior for non-Codex endpoints remains covered by tests from #12008.
  • New regression tests can pin both behaviors:
    • non-Codex endpoints still inject a keepalive/proxy-aware client
    • Codex endpoints do not inject a custom client

Related

  • #10324
  • #11033
  • #11070
  • #12008

I have a local patch + regression tests ready and can open a draft PR that references this issue.

extent analysis

TL;DR

The proposed fix involves skipping the keepalive http_client injection for the Codex backend by checking if the provider is "openai-codex" or if the base URL starts with "https://chatgpt.com/backend-api/codex", allowing the OpenAI SDK to construct its default transport.

Guidance

  • Identify the Codex backend by checking the provider variable for "openai-codex" or the base_url for "https://chatgpt.com/backend-api/codex".
  • Skip injecting the custom keepalive http_client for the Codex backend.
  • Allow the OpenAI SDK to construct its default transport for the Codex backend.
  • Verify the fix by testing the connection with and without the keepalive transport injection.

Example

if "http_client" not in client_kwargs and not (provider == "openai-codex" or base_url.startswith("https://chatgpt.com/backend-api/codex")):
    keepalive_http = self._build_keepalive_http_client()
    if keepalive_http is not None:
        client_kwargs["http_client"] = keepalive_http

Notes

This fix is narrowly scoped to the Codex backend and does not affect the existing keepalive behavior for standard OpenAI-compatible APIs. New regression tests should be added to cover both behaviors.

Recommendation

Apply the proposed fix by skipping the keepalive http_client injection for the Codex backend, as it restores compatibility with the Codex backend while maintaining the keepalive optimization for normal OpenAI-compatible endpoints.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Bug: custom keepalive transport breaks chatgpt codex backend [1 pull requests]