hermes - 💡(How to fix) Fix [Bug] Codex APIConnectionError retry rate ~8x higher post-v0.13.0; persists with #12953 applied; suspected commit 5533ad764 strict stream-timeout enforcement [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#22986Fetched 2026-05-11 03:31:54
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
0
Participants
Timeline (top)
labeled ×3

After upgrading to v0.13.0 (v2026.5.7), the APIConnectionError: Connection error. retry rate against chatgpt.com/backend-api/codex increased ~8x against the same workload. The retries persist even with PR #12953 (custom keepalive transport bypass) cherry-picked locally. Suspect commit 5533ad764 fix(auxiliary): enforce Codex Responses stream timeout is a contributing cause via a too-tight 120s default.

Error Message

⚠️ API call failed (attempt 1/3): APIConnectionError 🔌 Provider: openai-codex Model: gpt-5.5 🌐 Endpoint: https://chatgpt.com/backend-api/codex 📝 Error: Connection error. ⏱️ Elapsed: 0.13–0.5s Context: varies ⏳ Retrying in 2-5s (attempt 1/3)...

Root Cause

After upgrading to v0.13.0 (v2026.5.7), the APIConnectionError: Connection error. retry rate against chatgpt.com/backend-api/codex increased ~8x against the same workload. The retries persist even with PR #12953 (custom keepalive transport bypass) cherry-picked locally. Suspect commit 5533ad764 fix(auxiliary): enforce Codex Responses stream timeout is a contributing cause via a too-tight 120s default.

Fix Action

Workaround

In ~/.hermes/config.yaml:

auxiliary:
  compression:
    timeout: 300   # was 120

load_config() is mtime-cached so no restart needed. Reduces the compression-driven retry class. Connection-class retries (covered by #12952) are separate.

Code Example

⚠️  API call failed (attempt 1/3): APIConnectionError
   🔌 Provider: openai-codex  Model: gpt-5.5
   🌐 Endpoint: https://chatgpt.com/backend-api/codex
   📝 Error: Connection error.
   ⏱️  Elapsed: 0.130.5s  Context: varies
Retrying in 2-5s (attempt 1/3)...

---

auxiliary:
  compression:
    timeout: 300   # was 120
RAW_BUFFERClick to expand / collapse

Summary

After upgrading to v0.13.0 (v2026.5.7), the APIConnectionError: Connection error. retry rate against chatgpt.com/backend-api/codex increased ~8x against the same workload. The retries persist even with PR #12953 (custom keepalive transport bypass) cherry-picked locally. Suspect commit 5533ad764 fix(auxiliary): enforce Codex Responses stream timeout is a contributing cause via a too-tight 120s default.

Environment

  • Hermes Agent: v0.13.0 (v2026.5.7), commit eeef486 baseline + two local cherry-picks (aaa700c65 = PR #12953 keepalive bypass, 4ce6c96e2 = PR #19485 runtime TLS).
  • Provider: openai-codex (ChatGPT Codex OAuth).
  • Models: gpt-5.5 main, gpt-5.4-mini auxiliary compression.
  • Backend: chatgpt.com/backend-api/codex.
  • Platform: macOS 15.x arm64.

Empirical data

Logs counted across ~/.hermes/logs/agent.log, ~/.hermes/profiles/*/logs/agent.log, and ~/.hermes/kanban/logs/t_*.log:

  • 2026-05-08 (pre-upgrade, full day): 21 retries.
  • 2026-05-09 12:00–22:00 CDT (post-upgrade): 171+ retries.
  • Same user, same workload class (Telegram-driven agent interactions plus an internal multi-agent workload), same network/IP.

Hourly post-upgrade peaks: 33–48 retries/hour. Pre-upgrade comparable: 0–2 retries/hour.

Symptom

Every retry signature in the post-upgrade window:

⚠️  API call failed (attempt 1/3): APIConnectionError
   🔌 Provider: openai-codex  Model: gpt-5.5
   🌐 Endpoint: https://chatgpt.com/backend-api/codex
   📝 Error: Connection error.
   ⏱️  Elapsed: 0.13–0.5s  Context: varies
⏳ Retrying in 2-5s (attempt 1/3)...

Sub-second elapsed means TLS-handshake-time RST. ~95%+ of retries succeed within 1–3 attempts (no max-retries-exhausted events).

Hypothesis

Commit 5533ad764 enforces a hard total-elapsed timeout on the Codex Responses auxiliary stream. The default auxiliary.compression.timeout: 120s is too tight for compression workloads where 200K+ token sessions on gpt-5.4-mini routinely take 60–180s. Pre-commit, slow streams completed; post-commit, they timeout at 120s, raise TimeoutError, classify as retryable, and the outer agent loop fires a retry cycle that often surfaces as a transport-level APIConnectionError due to the forcible client.close() in _close_client_on_timeout.

Workaround

In ~/.hermes/config.yaml:

auxiliary:
  compression:
    timeout: 300   # was 120

load_config() is mtime-cached so no restart needed. Reduces the compression-driven retry class. Connection-class retries (covered by #12952) are separate.

Related

  • #12952 / PR #12953 — custom keepalive transport breaks chatgpt codex backend (cherry-picked locally; reduces but does not eliminate).
  • #16670 / PR #16737 — compression fallback marker after incomplete chunked read.
  • PR #21761 — recover Codex stream drops (auxiliary path).

This issue is for the post-v0.13.0 amplification specifically, not the underlying transport instability that pre-existed.

Asks

  1. Confirm 5533ad764's 120s default is intended for production Codex-OAuth-on-ChatGPT-account workloads or should be tuned higher.
  2. Consider exposing a per-call override or auto-scaling the deadline based on observed stream throughput.
  3. Either way, document the workaround for users hitting the regression.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug] Codex APIConnectionError retry rate ~8x higher post-v0.13.0; persists with #12953 applied; suspected commit 5533ad764 strict stream-timeout enforcement [1 participants]