hermes - ✅(Solved) Fix Auxiliary compression timeout can poison cached sync client, causing later auxiliary calls to fail [1 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#23432Fetched 2026-05-11 03:29:29
View on GitHub
Comments
2
Participants
2
Timeline
10
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×3labeled ×3commented ×2closed ×1

When preflight context compression uses the main openai-codex auxiliary route and the Responses stream exceeds the configured auxiliary timeout, the timeout path closes the underlying auxiliary client. The sync auxiliary client can remain in the cache afterward, so later auxiliary calls reuse a closed/poisoned client and fail quickly with Connection error.

This affected context compression and memory/background auxiliary tasks in a long-running Discord gateway session. The main model route continued to work, so this does not look like a global network/auth outage.

Error Message

When preflight context compression uses the main openai-codex auxiliary route and the Responses stream exceeds the configured auxiliary timeout, the timeout path closes the underlying auxiliary client. The sync auxiliary client can remain in the cache afterward, so later auxiliary calls reuse a closed/poisoned client and fail quickly with Connection error. Failed to generate context summary: Connection error.. Further summary attempts paused for 30 seconds. Brainstack explicit capture validation extractor failed: Connection error. After an auxiliary timeout or connection error:

Root Cause

The timeout handler in the Codex auxiliary Responses stream closes the real client on timeout. The sync cache path can still return the cached client later, because sync cache hits do not validate whether the cached client was previously closed/poisoned.

Fix Action

Fixed

PR fix notes

PR #23482: fix(auxiliary): evict cached client on timeout/connection error

Description (problem / solution / changelog)

Closes #23432.

Summary

After an auxiliary Codex timeout, later provider: main aux calls (memory flush, background review, next compaction) stop failing with stale Connection error. Compression no longer drops the user into the static fallback marker after a single timeout in a long-running gateway session.

Root cause

_CodexCompletionsAdapter._close_client_on_timeout closes the inner OpenAI client to unstick a hung Responses stream. The cached wrapper still pointed at that now-dead transport. Sync _get_cached_client has no liveness check (async does, via loop identity), and call_llm's connection-error fallback only fires when is_auto — so an explicit provider (including auxiliary.compression.provider: mainopenai-codex) never evicts.

Changes

  • agent/auxiliary_client.py: new _evict_cached_client_instance(target) walks _client_cache and drops entries whose stored client is target (or wraps it via _real_client for CodexAuxiliaryClient).
  • agent/auxiliary_client.py: _close_client_on_timeout evicts the wrapper after closing the inner client.
  • agent/auxiliary_client.py: call_llm + async_call_llm evict on _is_connection_error(first_err) before re-raising, independent of is_auto.

Validation

BeforeAfter
Codex aux timeoutinner client closed, cache entry retainedinner client closed AND wrapper evicted
Next call_llm (explicit provider)reuses dead client → Connection errorrebuilds via resolve_provider_client → succeeds
Non-connection error (e.g. 400)cache retainedcache retained (no thrash)
  • 6 new tests in TestAuxiliaryClientPoisonedCacheEviction covering helper semantics, the timeout closer eviction, and the explicit-provider call_llm/async_call_llm paths. All pass.
  • Full tests/agent/test_auxiliary_client.py (147) green.
  • E2E with real imports: timeout fires → inner client close() called once → cache entry gone → next call_llm resolves a fresh client and returns the response. Non-connection error path verified to not evict.

Changed files

  • agent/auxiliary_client.py (modified, +60/-0)
  • tests/agent/test_auxiliary_client.py (modified, +185/-0)

Code Example

Preflight compression: ~234,406 tokens >= 217,600 threshold (model gpt-5.5, ctx 272,000)
context compression started: session=20260510_171239_804511 messages=255 tokens=~234,406 model=gpt-5.5 focus=None
Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Connection error.. Further summary attempts paused for 30 seconds.
context compression done: session=20260510_213719_f58dae messages=255->8 tokens=~27,120

---

Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Codex auxiliary Responses stream exceeded 20.0s total timeout. Further summary attempts paused for 30 seconds.

---

Auxiliary flush_memories: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Brainstack explicit capture validation extractor failed: Connection error.

---

model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex
auxiliary:
  compression:
    provider: main
    model: ""
    timeout: 20
RAW_BUFFERClick to expand / collapse

Summary

When preflight context compression uses the main openai-codex auxiliary route and the Responses stream exceeds the configured auxiliary timeout, the timeout path closes the underlying auxiliary client. The sync auxiliary client can remain in the cache afterward, so later auxiliary calls reuse a closed/poisoned client and fail quickly with Connection error.

This affected context compression and memory/background auxiliary tasks in a long-running Discord gateway session. The main model route continued to work, so this does not look like a global network/auth outage.

Observed Behavior

Live log sequence from a long-running gateway session:

Preflight compression: ~234,406 tokens >= 217,600 threshold (model gpt-5.5, ctx 272,000)
context compression started: session=20260510_171239_804511 messages=255 tokens=~234,406 model=gpt-5.5 focus=None
Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Connection error.. Further summary attempts paused for 30 seconds.
context compression done: session=20260510_213719_f58dae messages=255->8 tokens=~27,120

Earlier in the same run there was a timeout on the same auxiliary path:

Auxiliary compression: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Failed to generate context summary: Codex auxiliary Responses stream exceeded 20.0s total timeout. Further summary attempts paused for 30 seconds.

After that timeout, repeated downstream auxiliary tasks using provider: main started failing with:

Auxiliary flush_memories: using main (gpt-5.5) at https://chatgpt.com/backend-api/codex/
Brainstack explicit capture validation extractor failed: Connection error.

Meanwhile normal main-agent calls to the same provider/model were succeeding in the same time window, including large requests around 168k-193k input tokens. That makes a stale/closed auxiliary client more likely than a global provider outage.

Config Shape

Relevant config:

model:
  default: gpt-5.5
  provider: openai-codex
  base_url: https://chatgpt.com/backend-api/codex
auxiliary:
  compression:
    provider: main
    model: ""
    timeout: 20

No stale stepfun/step-3.5-flash route was involved in this observed failure. The compression task inherited the main model route as intended.

Suspected Cause

The timeout handler in the Codex auxiliary Responses stream closes the real client on timeout. The sync cache path can still return the cached client later, because sync cache hits do not validate whether the cached client was previously closed/poisoned.

Relevant live-source areas inspected:

  • agent/auxiliary_client.py: _CodexCompletionsAdapter timeout path calls client close on timeout.
  • agent/auxiliary_client.py: _get_cached_client(...) returns sync cached clients without a liveness check.
  • agent/context_compressor.py: _generate_summary(...) calls call_llm(task="compression", main_runtime=...) and then records _last_summary_error when the auxiliary call fails.
  • run_agent.py: preflight compression emits the user-facing fallback marker when _last_summary_error is set.

Expected Behavior

After an auxiliary timeout or connection error:

  • the poisoned/closed cached client should be evicted;
  • the next auxiliary call should build a fresh client;
  • compression should optionally retry once with a fresh client before inserting a fallback context marker;
  • downstream auxiliary memory/background tasks should not inherit the broken cached client state.

Suggested Fix

At minimum:

  1. On timeout/connection failure from a cached sync auxiliary client, evict that cache entry before raising.
  2. Add a liveness/closed-state guard for sync cached clients, similar in spirit to the existing async loop validation.
  3. Add a regression test where:
    • auxiliary compression times out and closes the wrapped client;
    • a later provider: main auxiliary call is made;
    • the later call must create/use a fresh client rather than reusing the closed one.

Impact

This can cause context compression to drop middle turns into a static fallback marker even though the main model route is still healthy. It can also make memory/background auxiliary tasks appear broken after a single auxiliary timeout.

Notes

I am reporting this as a Hermes auxiliary-client/runtime issue, not as a memory-provider storage issue. The memory provider was only the downstream consumer that made the poisoned cached auxiliary route visible.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING