hermes - 💡(How to fix) Fix Context compression can be interrupted by gateway messages, causing fallback summary marker [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Root Cause Hypothesis

agent/auxiliary_client.py checks the global/per-thread interrupt flag while streaming Codex auxiliary responses:

Fix Action

Fixed

Code Example

2026-05-11 21:18:13,000 INFO gateway.run: inbound message: platform=telegram user=... msg='[IMPORTANT: Background process proc_11508d9d1e67 matched watch pattern "DevTools'
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: Preflight compression: ~136,264 tokens >= 136,000 threshold (model gpt-5.5, ctx 272,000)
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: context compression started: session=20260511_205207_8a7dc8 messages=169 tokens=~136,264 model=gpt-5.5 focus=None
2026-05-11 21:18:13,099 INFO [20260511_205207_8a7dc8] agent.auxiliary_client: Auxiliary compression: using openai-codex (gpt-5.4-mini) at https://chatgpt.com/backend-api/codex/
2026-05-11 21:18:43,092 WARNING [20260511_205207_8a7dc8] root: Failed to generate context summary: Codex auxiliary Responses stream interrupted. Further summary attempts paused for 60 seconds.
2026-05-11 21:18:43,131 INFO [20260511_205207_8a7dc8] run_agent: context compression done: session=20260511_211843_103fc8 messages=169->8 tokens=~22,523
2026-05-11 21:18:43,139 INFO [20260511_205207_8a7dc8] run_agent: Turn ended: reason=interrupted_by_user model=gpt-5.5 api_calls=0/90 budget=0/90 tool_turns=2 last_msg_role=user response_len=0 session=20260511_211843_103fc8
2026-05-11 21:18:43,246 INFO [20260511_211843_103fc8] run_agent: conversation turn: session=20260511_211843_103fc8 model=gpt-5.5 provider=openai-codex platform=telegram history=8 msg='...next user message...'

---

Compression summary failed: Codex auxiliary Responses stream interrupted. Inserted a fallback context marker.

---

from tools.interrupt import is_interrupted
if is_interrupted():
    raise InterruptedError("Codex auxiliary Responses stream interrupted")

---

compression:
  enabled: true
  threshold: 0.5
  target_ratio: 0.2
  protect_last_n: 20

auxiliary:
  compression:
    provider: openai-codex
    model: gpt-5.4-mini
    timeout: 360
RAW_BUFFERClick to expand / collapse

Bug Description

Context compression can fail with Codex auxiliary Responses stream interrupted when a new gateway message / process watch-pattern notification arrives while the auxiliary compression summary is running.

The active conversation then continues with a fallback context marker instead of a useful compression summary, so the middle of the session history is effectively lost from the model context even though raw logs remain on disk.

Observed Logs

From a Telegram gateway session using provider: openai-codex, main model gpt-5.5, auxiliary compression openai-codex/gpt-5.4-mini:

2026-05-11 21:18:13,000 INFO gateway.run: inbound message: platform=telegram user=... msg='[IMPORTANT: Background process proc_11508d9d1e67 matched watch pattern "DevTools'
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: Preflight compression: ~136,264 tokens >= 136,000 threshold (model gpt-5.5, ctx 272,000)
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: context compression started: session=20260511_205207_8a7dc8 messages=169 tokens=~136,264 model=gpt-5.5 focus=None
2026-05-11 21:18:13,099 INFO [20260511_205207_8a7dc8] agent.auxiliary_client: Auxiliary compression: using openai-codex (gpt-5.4-mini) at https://chatgpt.com/backend-api/codex/
2026-05-11 21:18:43,092 WARNING [20260511_205207_8a7dc8] root: Failed to generate context summary: Codex auxiliary Responses stream interrupted. Further summary attempts paused for 60 seconds.
2026-05-11 21:18:43,131 INFO [20260511_205207_8a7dc8] run_agent: context compression done: session=20260511_211843_103fc8 messages=169->8 tokens=~22,523
2026-05-11 21:18:43,139 INFO [20260511_205207_8a7dc8] run_agent: Turn ended: reason=interrupted_by_user model=gpt-5.5 api_calls=0/90 budget=0/90 tool_turns=2 last_msg_role=user response_len=0 session=20260511_211843_103fc8
2026-05-11 21:18:43,246 INFO [20260511_211843_103fc8] run_agent: conversation turn: session=20260511_211843_103fc8 model=gpt-5.5 provider=openai-codex platform=telegram history=8 msg='...next user message...'

The user-facing marker was:

⚠ Compression summary failed: Codex auxiliary Responses stream interrupted. Inserted a fallback context marker.

Root Cause Hypothesis

agent/auxiliary_client.py checks the global/per-thread interrupt flag while streaming Codex auxiliary responses:

from tools.interrupt import is_interrupted
if is_interrupted():
    raise InterruptedError("Codex auxiliary Responses stream interrupted")

For normal model/tool turns this makes sense. For context compression it is brittle: compression is infrastructure needed to preserve continuity. If Telegram receives another user message or an injected watch-pattern notification while the summarizer is running, the interrupt aborts the summary and Hermes falls back to a generic context marker.

In this case the compression timeout was already set to 360s, and the failure happened after ~30s, so this was not a timeout. Auth was also healthy. It was an interrupt.

Expected Behavior

Context compression should be robust against user/gateway interrupts:

  • Once preflight compression starts, the summary generation should complete atomically, or
  • incoming gateway messages should be queued/deferred until compression finishes, or
  • compression auxiliary calls should ignore/defer interrupt checks specifically for the compression task.

The next user message should be processed after the compressed session has a real summary, not after a fallback marker.

Actual Behavior

A message/watch notification arriving during compression interrupts the auxiliary Codex Responses stream. Hermes inserts a fallback context marker and proceeds with only a generic compaction reference.

Proposed Fix Direction

A few possible approaches:

  1. Treat compression as a critical section in the gateway/session runner: queue new messages until compression returns.
  2. Add an auxiliary-client option like allow_interrupt=False for task="compression" and keep interrupt behavior for other auxiliary tasks.
  3. Special-case watch-pattern/process notifications so they don't interrupt a preflight compression turn.
  4. If compression is interrupted, retry once after clearing/defering the interrupt before falling back to the marker.

I lean toward (1) or (2): compression is not optional UX output; it protects conversation continuity.

Environment

  • Platform: Telegram gateway
  • Provider: openai-codex
  • Main model: gpt-5.5
  • Auxiliary compression provider/model: openai-codex / gpt-5.4-mini
  • Compression config at the time:
compression:
  enabled: true
  threshold: 0.5
  target_ratio: 0.2
  protect_last_n: 20

auxiliary:
  compression:
    provider: openai-codex
    model: gpt-5.4-mini
    timeout: 360

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix Context compression can be interrupted by gateway messages, causing fallback summary marker [1 pull requests]