hermes - 💡(How to fix) Fix [Bug]: Direct Copilot gpt-5.5 large resumes are killed by 12s Codex TTFB watchdog

Error Message

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting. API call failed (attempt 1/3): APIConnectionError Provider: copilot Model: gpt-5.5 Endpoint: https://api.githubcopilot.com Error: Connection error. Elapsed: 12.14s Context: 79 msgs, ~150,813 tokens

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting. API call failed (attempt 2/3): APIConnectionError

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting. API call failed (attempt 3/3): APIConnectionError

API call failed after 3 retries: Connection error.

Fix Action

Fix / Workaround

A local workaround fixed the failing resume immediately:

The local workaround that confirmed this diagnosis was:

Not checking the issue-template checkbox for now, but I can test a patch and can submit a small PR if maintainers agree that direct Copilot should share the large-prefill behavior.

Code Example

HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0

---

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting.
API call failed (attempt 1/3): APIConnectionError
Provider: copilot
Model: gpt-5.5
Endpoint: https://api.githubcopilot.com
Error: Connection error.
Elapsed: 12.14s
Context: 79 msgs, ~150,813 tokens

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting.
API call failed (attempt 2/3): APIConnectionError

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting.
API call failed (attempt 3/3): APIConnectionError

API call failed after 3 retries: Connection error.

---

OpenAI SDK: 2.24.0
Status: Up to date

---

Codex stream produced no bytes within TTFB cutoff (12s > 12s, model=gpt-5.5). Backend accepted the connection but sent no stream events. Killing connection so the retry loop can reconnect.
OpenAI client aborted (codex_ttfb_kill...) provider=copilot base_url=https://api.githubcopilot.com model=gpt-5.5
API call failed (attempt 1/3) error_type=APIConnectionError summary=Connection error.

---

132: def _is_openai_codex_backend(agent) -> bool:
293: _codex_watchdog_enabled = agent.api_mode == "codex_responses"
296: if _codex_watchdog_enabled and _openai_codex_backend:
313: _ttfb_enabled = _codex_watchdog_enabled
314: _ttfb_timeout = _env_float("HERMES_CODEX_TTFB_TIMEOUT_SECONDS", 12.0)
407/413: No first byte from provider ...
418: _close_request_client_once("codex_ttfb_kill")

---

HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0

Bug Description

Direct GitHub Copilot gpt-5.5 large-context resumes can fail because the Codex/Responses stream is killed by the default 12s no-first-byte watchdog before Copilot emits its first SSE event.

This looks similar to the recently fixed large-prefill behavior for openai-codex, but the exemption appears not to apply to the direct copilot provider / https://api.githubcopilot.com path. I searched existing issues and found related openai-codex/gpt-5.5 reports, but did not find a direct copilot provider issue for this specific TTFB watchdog behavior.

A local workaround fixed the failing resume immediately:

HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0

Steps to Reproduce

Configure Hermes to use the direct GitHub Copilot provider:
- provider: copilot
- model: gpt-5.5
- base URL: https://api.githubcopilot.com
Use or resume a large existing session. The failing request in this case was approximately:
- 79 messages
- ~150,813 prompt tokens
- Codex/Responses stream path
Start or resume the session.
Observe Hermes reconnecting after the no-first-byte watchdog fires repeatedly.

Expected Behavior

For large direct Copilot gpt-5.5 Responses requests, Hermes should allow backend prefill to take longer than 12 seconds before the first SSE event, or apply the same large-prefill exemption/scaling used for hosted openai-codex streams.

The request should not be aborted solely because Copilot has not emitted the first event within 12 seconds.

Actual Behavior

Hermes aborts the direct Copilot gpt-5.5 stream after 12 seconds with no first byte, retries three times, and then reports a connection error.

Sanitized excerpt:

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting.
API call failed (attempt 1/3): APIConnectionError
Provider: copilot
Model: gpt-5.5
Endpoint: https://api.githubcopilot.com
Error: Connection error.
Elapsed: 12.14s
Context: 79 msgs, ~150,813 tokens

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting.
API call failed (attempt 2/3): APIConnectionError

No first byte from provider in 12s (codex stream, model: gpt-5.5). Reconnecting.
API call failed (attempt 3/3): APIConnectionError

API call failed after 3 retries: Connection error.

Affected Component

CLI (interactive chat)
Agent Core (conversation loop, context compression, memory)
GitHub Copilot provider

Messaging Platform (if gateway-related)

N/A (CLI only)

Debug Report

I am not attaching raw hermes debug share output because the local report/log bundle may include personal paths, local session identifiers, and environment-specific metadata. The report above includes sanitized version info, configuration shape, source anchors, and log excerpts with user-specific details removed.

If maintainers need more detail, I can provide an additional sanitized local debug excerpt.

Operating System

macOS 26.5

Python Version

Hermes runtime Python: 3.11.15

Hermes Version

Hermes Agent v0.15.0 (2026.5.28)

Additional version details:

OpenAI SDK: 2.24.0
Status: Up to date

Additional Logs / Traceback (optional)

The key log signal is that the stream is aborted by Hermes' TTFB watchdog, not by a Copilot auth failure or an upstream HTTP error:

Codex stream produced no bytes within TTFB cutoff (12s > 12s, model=gpt-5.5). Backend accepted the connection but sent no stream events. Killing connection so the retry loop can reconnect.
OpenAI client aborted (codex_ttfb_kill...) provider=copilot base_url=https://api.githubcopilot.com model=gpt-5.5
API call failed (attempt 1/3) error_type=APIConnectionError summary=Connection error.

A separate copilot/gpt-5.4 session on the same machine and subscription was able to complete requests, and the failing gpt-5.5 resume started working after setting HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0 and restarting Hermes.

Root Cause Analysis (optional)

In agent/chat_completion_helpers.py, the watchdog logic appears to enable the Codex TTFB guard for all codex_responses streams, but only applies the large-prefill exemption to _is_openai_codex_backend(agent).

Relevant source anchors from the current checkout:

132: def _is_openai_codex_backend(agent) -> bool:
293: _codex_watchdog_enabled = agent.api_mode == "codex_responses"
296: if _codex_watchdog_enabled and _openai_codex_backend:
313: _ttfb_enabled = _codex_watchdog_enabled
314: _ttfb_timeout = _env_float("HERMES_CODEX_TTFB_TIMEOUT_SECONDS", 12.0)
407/413: No first byte from provider ...
418: _close_request_client_once("codex_ttfb_kill")

The direct GitHub Copilot provider uses codex_responses for GPT-5+ models, but its backend is api.githubcopilot.com, so _openai_codex_backend is false. That means large Copilot requests still get the fixed 12s TTFB cutoff even when they may be doing long server-side prefill.

Proposed Fix (optional)

Consider generalizing the large-prefill TTFB exemption/scaling to direct GitHub Copilot Responses streams as well. For example:

Treat provider == "copilot" or host api.githubcopilot.com as a hosted Codex/Responses backend for the purpose of large-prefill TTFB behavior.
Or disable/scale the no-first-byte watchdog for large codex_responses requests independent of provider, while leaving the stale/read timeout responsible for truly stuck streams.
Keep HERMES_CODEX_TTFB_TIMEOUT_SECONDS / strict mode as operator overrides for environments that prefer the current behavior.

The local workaround that confirmed this diagnosis was:

HERMES_CODEX_TTFB_TIMEOUT_SECONDS=0

Are you willing to submit a PR for this?

Not checking the issue-template checkbox for now, but I can test a patch and can submit a small PR if maintainers agree that direct Copilot should share the large-prefill behavior.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering