hermes - 💡(How to fix) Fix Context compression can be interrupted by gateway messages, causing fallback summary marker [1 pull requests]

Code Example

2026-05-11 21:18:13,000 INFO gateway.run: inbound message: platform=telegram user=... msg='[IMPORTANT: Background process proc_11508d9d1e67 matched watch pattern "DevTools'
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: Preflight compression: ~136,264 tokens >= 136,000 threshold (model gpt-5.5, ctx 272,000)
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: context compression started: session=20260511_205207_8a7dc8 messages=169 tokens=~136,264 model=gpt-5.5 focus=None
2026-05-11 21:18:13,099 INFO [20260511_205207_8a7dc8] agent.auxiliary_client: Auxiliary compression: using openai-codex (gpt-5.4-mini) at https://chatgpt.com/backend-api/codex/
2026-05-11 21:18:43,092 WARNING [20260511_205207_8a7dc8] root: Failed to generate context summary: Codex auxiliary Responses stream interrupted. Further summary attempts paused for 60 seconds.
2026-05-11 21:18:43,131 INFO [20260511_205207_8a7dc8] run_agent: context compression done: session=20260511_211843_103fc8 messages=169->8 tokens=~22,523
2026-05-11 21:18:43,139 INFO [20260511_205207_8a7dc8] run_agent: Turn ended: reason=interrupted_by_user model=gpt-5.5 api_calls=0/90 budget=0/90 tool_turns=2 last_msg_role=user response_len=0 session=20260511_211843_103fc8
2026-05-11 21:18:43,246 INFO [20260511_211843_103fc8] run_agent: conversation turn: session=20260511_211843_103fc8 model=gpt-5.5 provider=openai-codex platform=telegram history=8 msg='...next user message...'

---

⚠ Compression summary failed: Codex auxiliary Responses stream interrupted. Inserted a fallback context marker.

---

from tools.interrupt import is_interrupted
if is_interrupted():
    raise InterruptedError("Codex auxiliary Responses stream interrupted")

---

compression:
  enabled: true
  threshold: 0.5
  target_ratio: 0.2
  protect_last_n: 20

auxiliary:
  compression:
    provider: openai-codex
    model: gpt-5.4-mini
    timeout: 360

Bug Description

Context compression can fail with Codex auxiliary Responses stream interrupted when a new gateway message / process watch-pattern notification arrives while the auxiliary compression summary is running.

The active conversation then continues with a fallback context marker instead of a useful compression summary, so the middle of the session history is effectively lost from the model context even though raw logs remain on disk.

Observed Logs

From a Telegram gateway session using provider: openai-codex, main model gpt-5.5, auxiliary compression openai-codex/gpt-5.4-mini:

2026-05-11 21:18:13,000 INFO gateway.run: inbound message: platform=telegram user=... msg='[IMPORTANT: Background process proc_11508d9d1e67 matched watch pattern "DevTools'
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: Preflight compression: ~136,264 tokens >= 136,000 threshold (model gpt-5.5, ctx 272,000)
2026-05-11 21:18:13,075 INFO [20260511_205207_8a7dc8] run_agent: context compression started: session=20260511_205207_8a7dc8 messages=169 tokens=~136,264 model=gpt-5.5 focus=None
2026-05-11 21:18:13,099 INFO [20260511_205207_8a7dc8] agent.auxiliary_client: Auxiliary compression: using openai-codex (gpt-5.4-mini) at https://chatgpt.com/backend-api/codex/
2026-05-11 21:18:43,092 WARNING [20260511_205207_8a7dc8] root: Failed to generate context summary: Codex auxiliary Responses stream interrupted. Further summary attempts paused for 60 seconds.
2026-05-11 21:18:43,131 INFO [20260511_205207_8a7dc8] run_agent: context compression done: session=20260511_211843_103fc8 messages=169->8 tokens=~22,523
2026-05-11 21:18:43,139 INFO [20260511_205207_8a7dc8] run_agent: Turn ended: reason=interrupted_by_user model=gpt-5.5 api_calls=0/90 budget=0/90 tool_turns=2 last_msg_role=user response_len=0 session=20260511_211843_103fc8
2026-05-11 21:18:43,246 INFO [20260511_211843_103fc8] run_agent: conversation turn: session=20260511_211843_103fc8 model=gpt-5.5 provider=openai-codex platform=telegram history=8 msg='...next user message...'

The user-facing marker was:

⚠ Compression summary failed: Codex auxiliary Responses stream interrupted. Inserted a fallback context marker.

Root Cause Hypothesis

agent/auxiliary_client.py checks the global/per-thread interrupt flag while streaming Codex auxiliary responses:

from tools.interrupt import is_interrupted
if is_interrupted():
    raise InterruptedError("Codex auxiliary Responses stream interrupted")

For normal model/tool turns this makes sense. For context compression it is brittle: compression is infrastructure needed to preserve continuity. If Telegram receives another user message or an injected watch-pattern notification while the summarizer is running, the interrupt aborts the summary and Hermes falls back to a generic context marker.

In this case the compression timeout was already set to 360s, and the failure happened after ~30s, so this was not a timeout. Auth was also healthy. It was an interrupt.

Expected Behavior

Context compression should be robust against user/gateway interrupts:

Once preflight compression starts, the summary generation should complete atomically, or
incoming gateway messages should be queued/deferred until compression finishes, or
compression auxiliary calls should ignore/defer interrupt checks specifically for the compression task.

The next user message should be processed after the compressed session has a real summary, not after a fallback marker.

Actual Behavior

A message/watch notification arriving during compression interrupts the auxiliary Codex Responses stream. Hermes inserts a fallback context marker and proceeds with only a generic compaction reference.

Proposed Fix Direction

A few possible approaches:

Treat compression as a critical section in the gateway/session runner: queue new messages until compression returns.
Add an auxiliary-client option like allow_interrupt=False for task="compression" and keep interrupt behavior for other auxiliary tasks.
Special-case watch-pattern/process notifications so they don't interrupt a preflight compression turn.
If compression is interrupted, retry once after clearing/defering the interrupt before falling back to the marker.

I lean toward (1) or (2): compression is not optional UX output; it protects conversation continuity.

Environment

Platform: Telegram gateway
Provider: openai-codex
Main model: gpt-5.5
Auxiliary compression provider/model: openai-codex / gpt-5.4-mini
Compression config at the time:

compression:
  enabled: true
  threshold: 0.5
  target_ratio: 0.2
  protect_last_n: 20

auxiliary:
  compression:
    provider: openai-codex
    model: gpt-5.4-mini
    timeout: 360

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Context compression can be interrupted by gateway messages, causing fallback summary marker [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root Cause Hypothesis

Fix Action

Fixed

Code Example

Bug Description

Observed Logs

Root Cause Hypothesis

Expected Behavior

Actual Behavior

Proposed Fix Direction

Environment

Still need to ship something?

TRENDING

hermes - 💡(How to fix) Fix Context compression can be interrupted by gateway messages, causing fallback summary marker [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Root Cause Hypothesis

Fix Action

Fixed

Code Example

Bug Description

Observed Logs

Root Cause Hypothesis

Expected Behavior

Actual Behavior

Proposed Fix Direction

Environment

Still need to ship something?

RELATED_DISCOVERY

TRENDING