hermes - 💡(How to fix) Fix [Bug] Codex APIConnectionError retry rate ~8x higher post-v0.13.0; persists with #12953 applied; suspected commit 5533ad764 strict stream-timeout enforcement [1 participants]

hermes2026-05-10 05:20:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

NousResearch/hermes-agent#22986•Fetched 2026-05-11 03:31:54

View on GitHub

Comments

Participants

Timeline

Reactions

Author

QuarkAssistant

Participants

QuarkAssistant

Timeline (top)

labeled ×3

After upgrading to v0.13.0 (v2026.5.7), the APIConnectionError: Connection error. retry rate against chatgpt.com/backend-api/codex increased ~8x against the same workload. The retries persist even with PR #12953 (custom keepalive transport bypass) cherry-picked locally. Suspect commit 5533ad764 fix(auxiliary): enforce Codex Responses stream timeout is a contributing cause via a too-tight 120s default.

Error Message

⚠️ API call failed (attempt 1/3): APIConnectionError 🔌 Provider: openai-codex Model: gpt-5.5 🌐 Endpoint: https://chatgpt.com/backend-api/codex 📝 Error: Connection error. ⏱️ Elapsed: 0.13–0.5s Context: varies ⏳ Retrying in 2-5s (attempt 1/3)...

Root Cause

Fix Action

Workaround

In ~/.hermes/config.yaml:

auxiliary:
  compression:
    timeout: 300   # was 120

load_config() is mtime-cached so no restart needed. Reduces the compression-driven retry class. Connection-class retries (covered by #12952) are separate.

Code Example

⚠️  API call failed (attempt 1/3): APIConnectionError
   🔌 Provider: openai-codex  Model: gpt-5.5
   🌐 Endpoint: https://chatgpt.com/backend-api/codex
   📝 Error: Connection error.
   ⏱️  Elapsed: 0.13–0.5s  Context: varies
⏳ Retrying in 2-5s (attempt 1/3)...

---

auxiliary:
  compression:
    timeout: 300   # was 120

RAW_BUFFERClick to expand / collapse

Summary

Environment

Hermes Agent: v0.13.0 (v2026.5.7), commit eeef486 baseline + two local cherry-picks (aaa700c65 = PR #12953 keepalive bypass, 4ce6c96e2 = PR #19485 runtime TLS).
Provider: openai-codex (ChatGPT Codex OAuth).
Models: gpt-5.5 main, gpt-5.4-mini auxiliary compression.
Backend: chatgpt.com/backend-api/codex.
Platform: macOS 15.x arm64.

Empirical data

Logs counted across ~/.hermes/logs/agent.log, ~/.hermes/profiles/*/logs/agent.log, and ~/.hermes/kanban/logs/t_*.log:

2026-05-08 (pre-upgrade, full day): 21 retries.
2026-05-09 12:00–22:00 CDT (post-upgrade): 171+ retries.
Same user, same workload class (Telegram-driven agent interactions plus an internal multi-agent workload), same network/IP.

Hourly post-upgrade peaks: 33–48 retries/hour. Pre-upgrade comparable: 0–2 retries/hour.

Symptom

Every retry signature in the post-upgrade window:

⚠️  API call failed (attempt 1/3): APIConnectionError
   🔌 Provider: openai-codex  Model: gpt-5.5
   🌐 Endpoint: https://chatgpt.com/backend-api/codex
   📝 Error: Connection error.
   ⏱️  Elapsed: 0.13–0.5s  Context: varies
⏳ Retrying in 2-5s (attempt 1/3)...

Sub-second elapsed means TLS-handshake-time RST. ~95%+ of retries succeed within 1–3 attempts (no max-retries-exhausted events).

Hypothesis

Commit 5533ad764 enforces a hard total-elapsed timeout on the Codex Responses auxiliary stream. The default auxiliary.compression.timeout: 120s is too tight for compression workloads where 200K+ token sessions on gpt-5.4-mini routinely take 60–180s. Pre-commit, slow streams completed; post-commit, they timeout at 120s, raise TimeoutError, classify as retryable, and the outer agent loop fires a retry cycle that often surfaces as a transport-level APIConnectionError due to the forcible client.close() in _close_client_on_timeout.

Workaround

In ~/.hermes/config.yaml:

auxiliary:
  compression:
    timeout: 300   # was 120

load_config() is mtime-cached so no restart needed. Reduces the compression-driven retry class. Connection-class retries (covered by #12952) are separate.

#12952 / PR #12953 — custom keepalive transport breaks chatgpt codex backend (cherry-picked locally; reduces but does not eliminate).
#16670 / PR #16737 — compression fallback marker after incomplete chunked read.
PR #21761 — recover Codex stream drops (auxiliary path).

This issue is for the post-v0.13.0 amplification specifically, not the underlying transport instability that pre-existed.

Asks

Confirm 5533ad764's 120s default is intended for production Codex-OAuth-on-ChatGPT-account workloads or should be tuned higher.
Consider exposing a per-call override or auto-scaling the deadline based on observed stream throughput.
Either way, document the workaround for users hitting the regression.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #database connection #vector store #embedding generation #cache error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix [Bug] Codex APIConnectionError retry rate ~8x higher post-v0.13.0; persists with #12953 applied; suspected commit 5533ad764 strict stream-timeout enforcement [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Summary

Environment

Empirical data

Symptom

Hypothesis

Workaround

Related

Asks

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix [Bug] Codex APIConnectionError retry rate ~8x higher post-v0.13.0; persists with #12953 applied; suspected commit 5533ad764 strict stream-timeout enforcement [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Root Cause

Fix Action

Workaround

Code Example

Summary

Environment

Empirical data

Symptom

Hypothesis

Workaround

Related

Asks

Still need to ship something?

RELATED_DISCOVERY

TRENDING