hermes - ✅(Solved) Fix Compression fallback marker after incomplete chunked read loses useful context in long sessions [1 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
NousResearch/hermes-agent#16670Fetched 2026-04-28 06:51:46
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
labeled ×3commented ×1cross-referenced ×1referenced ×1

Context compression can fail when the auxiliary compression API call is interrupted with an incomplete chunked read. Hermes inserts a fallback context marker instead of a real summary:

⚠️ Compression summary failed: peer closed connection without sending complete message body (incomplete chunked read). Inserted a fallback context marker.

This is especially visible in long Telegram sessions because context compaction is frequent.

Root Cause

This is especially visible in long Telegram sessions because context compaction is frequent.

Fix Action

Fix / Workaround

Local mitigation tried

I applied local config mitigations to reduce frequency/severity:

PR fix notes

PR #16737: fix(compression): retry transient transport errors before fallback marker (#16670)

Description (problem / solution / changelog)

What does this PR do?

Auxiliary compression's call_llm request can hit RemoteProtocolError ("peer closed connection without sending complete message body (incomplete chunked read)") mid-stream when the auxiliary endpoint hiccups. _generate_summary's generic except block in agent/context_compressor.py treated this like any other failure: 60-second cooldown, drop the selected turns, insert a fallback context marker, repeat on the next compaction. Long Telegram sessions surface this often enough that real context is permanently lost.

This PR adds a bounded in-call retry only for fast-fail mid-stream disconnect/protocol classes. Timeout-class errors are deliberately excluded (see Changes Made for why) so a genuinely slow endpoint can't stall a compaction for multiple timeout windows in a row.

Related Issue

Fixes #16670

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 🔒 Security fix
  • 📝 Documentation update
  • ✅ Tests (adding or improving test coverage)
  • ♻️ Refactor (no behavior change)
  • 🎯 New skill (bundled or hub)

Changes Made

  • agent/error_classifier.py — new is_transient_transport_error(error) predicate plus a narrow _DISCONNECT_TRANSPORT_TYPES registry. Returns True only for fast-fail mid-stream disconnect/protocol classes (RemoteProtocolError, ConnectError, ServerDisconnectedError, ConnectionResetError, ConnectionAbortedError, BrokenPipeError, ReadError, APIConnectionError, the SSL transport types). Status-coded errors (4xx/5xx) and all timeout classes (TimeoutError, ReadTimeout, ConnectTimeout, PoolTimeout, APITimeoutError) are explicitly NOT transient — retrying a timeout pays the full timeout window again, and against the 120 s compression default that would turn one missed compaction into a ~6 minute stalled response. Timeouts continue to use the existing cooldown path.
  • agent/context_compressor.py — wraps the single call_llm invocation in _generate_summary with a 1 + 2 retry loop (1 s / 3 s back-offs) gated on is_transient_transport_error. Non-transient errors fall straight through to the existing cooldown / model-fallback logic — 401/404/timeout handling is unchanged.
  • tests/agent/test_error_classifier.py — added TestIsTransientTransportError covering disconnect strings (incomplete chunked read), RemoteProtocolError, ConnectionError, the HTTP-status rejection, and explicit non-retry coverage for TimeoutError, ReadTimeout, APITimeoutError, ConnectTimeout, PoolTimeout.
  • tests/agent/test_context_compressor.py — added TestSummaryTransientRetry covering retry-then-succeed for both incomplete chunked read strings and RemoteProtocolError-named classes, retries-exhausted falling through to cooldown, non-transient HTTP errors not retrying, and TimeoutError not entering the retry loop (single call, cooldown still set).

How to Test

Reproducing the original bug requires a flaky auxiliary endpoint, but the failure mode is identical to the issue's repro: the agent's compaction logs Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read). Further summary attempts paused for 60 seconds. followed by ⚠ Compression summary failed: ... user-visible warnings. After this PR, transient mid-stream disconnects retry up to two times before that warning fires.

Automated:

pytest tests/agent/test_context_compressor.py tests/agent/test_error_classifier.py -q

Result on macOS 15.6.1 / Python 3.14.2: 190 passed. Reviewer also ran the same command in a separate checkout and reported 190 passed.

Checklist

Code

  • I've read the Contributing Guide
  • My commit messages follow Conventional Commits (fix(scope):, feat(scope):, etc.)
  • I searched for existing PRs to make sure this isn't a duplicate
  • My PR contains only changes related to this fix/feature (no unrelated commits)
  • I've run pytest tests/ -q and all tests pass
  • I've added tests for my changes (required for bug fixes, strongly encouraged for features)
  • I've tested on my platform: macOS 15.6.1 (Python 3.14.2)

Documentation & Housekeeping

  • I've updated relevant documentation (README, docs/, docstrings) — or N/A (docstring on the new helper documents the timeout-exclusion rationale)
  • I've updated cli-config.yaml.example if I added/changed config keys — or N/A (N/A — no config keys touched; retry budget tuned via existing module-level constants)
  • I've updated CONTRIBUTING.md or AGENTS.md if I changed architecture or workflows — or N/A (N/A)
  • I've considered cross-platform impact (Windows, macOS) per the compatibility guide — or N/A (no platform-specific syscalls; relies only on time.sleep and existing transport-error registries)
  • I've updated tool descriptions/schemas if I changed tool behavior — or N/A (N/A — internal compressor, not a tool)

Screenshots / Logs

$ pytest tests/agent/test_context_compressor.py tests/agent/test_error_classifier.py -q
........................................................................ [ 37%]
........................................................................ [ 75%]
...............................................                          [100%]
190 passed, 190 warnings in 15.26s

Changed files

  • agent/context_compressor.py (modified, +40/-1)
  • agent/error_classifier.py (modified, +57/-0)
  • tests/agent/test_context_compressor.py (modified, +118/-0)
  • tests/agent/test_error_classifier.py (modified, +80/-0)

Code Example

⚠️ Compression summary failed: peer closed connection without sending complete message body (incomplete chunked read). Inserted a fallback context marker.

---

agent.auxiliary_client: Auxiliary compression: using auto (gpt-5.5) at https://chatgpt.com/backend-api/codex/
WARNING root: Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read). Further summary attempts paused for 60 seconds.

---

2026-04-28 01:52:55 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:14:57 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:20:45 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:23:20 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).

---

auxiliary:
  compression:
    timeout: 360

compression:
  threshold: 0.55
RAW_BUFFERClick to expand / collapse

Summary

Context compression can fail when the auxiliary compression API call is interrupted with an incomplete chunked read. Hermes inserts a fallback context marker instead of a real summary:

⚠️ Compression summary failed: peer closed connection without sending complete message body (incomplete chunked read). Inserted a fallback context marker.

This is especially visible in long Telegram sessions because context compaction is frequent.

Observed log evidence

Local logs show repeated failures from auxiliary compression:

agent.auxiliary_client: Auxiliary compression: using auto (gpt-5.5) at https://chatgpt.com/backend-api/codex/
WARNING root: Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read). Further summary attempts paused for 60 seconds.

Recent examples occurred repeatedly in one long-running Telegram workflow, e.g.:

2026-04-28 01:52:55 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:14:57 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:20:45 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).
2026-04-28 02:23:20 WARNING Failed to generate context summary: peer closed connection without sending complete message body (incomplete chunked read).

User impact

Not usually data-destructive, but it is operationally serious for long sessions:

  • context is compacted without a useful generated summary;
  • the fallback marker preserves that something happened, but useful prior-turn details can be lost;
  • long Telegram sessions become less reliable exactly when compaction is needed most.

Local mitigation tried

I applied local config mitigations to reduce frequency/severity:

auxiliary:
  compression:
    timeout: 360

compression:
  threshold: 0.55

This should give the compression call more time and trigger compaction earlier with smaller context chunks. It does not address the underlying bug.

Suggested fix direction

Compression should handle incomplete chunked read/peer-closed transport failures more robustly:

  1. Treat incomplete chunked read as retryable for auxiliary compression, not as immediate fallback-marker finalization.
  2. Retry with backoff before inserting fallback marker.
  3. If the primary auxiliary provider fails, try configured fallback provider/model if available.
  4. Consider a smaller emergency compression prompt/chunked summarization fallback before giving up.
  5. Improve the fallback marker to include a minimal deterministic local summary such as message count, timestamp range, and last N user/assistant snippets, so continuity loss is less severe.

Environment notes

  • Gateway platform: Telegram
  • Auxiliary compression provider: auto, resolving to the main openai-codex provider against https://chatgpt.com/backend-api/codex/
  • Model observed: gpt-5.5

extent analysis

TL;DR

Implement retry logic with backoff for auxiliary compression to handle incomplete chunked reads and peer-closed transport failures.

Guidance

  • Review the suggested fix direction to handle incomplete chunked read failures more robustly, focusing on retrying auxiliary compression with backoff.
  • Consider implementing a fallback strategy, such as trying a configured fallback provider/model if the primary auxiliary provider fails.
  • Evaluate the current local mitigation configuration (auxiliary.compression.timeout and compression.threshold) to ensure it is optimal for the specific use case.
  • Investigate the feasibility of improving the fallback marker to include a minimal deterministic local summary.

Example

No specific code example is provided due to the lack of implementation details, but the suggested fix direction outlines key steps for improvement.

Notes

The provided information suggests that the issue is related to the auxiliary compression API call being interrupted, and a more robust handling of transport failures is needed. However, the exact implementation details are not provided, so the guidance is focused on the suggested fix direction.

Recommendation

Apply workaround: Implement retry logic with backoff for auxiliary compression to handle incomplete chunked reads and peer-closed transport failures, as this is a concrete step that can be taken to mitigate the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - ✅(Solved) Fix Compression fallback marker after incomplete chunked read loses useful context in long sessions [1 pull requests, 1 comments, 2 participants]