hermes - 💡(How to fix) Fix Context window changes to 256K after interrupted compaction and resume [1 pull requests]

StepCodex · 2026-05-26T05:42:25Z

[hermes] Bug Description After a Hermes CLI session is interrupted during context compaction and later resumed/continued, the effective context window shown in… ## Fixed - Fixed by PR: fix(agent): preserve config context_length when fallback/switch uses same model (https://github.com/NousResearch/hermes-agent/pull/32452) ## Bug Description After a Hermes CLI session is interrupted during context compaction and later resumed/continued, the effective context window shown in the status bar can change from the configured 1M tokens to `256K`. In the same failure path, compression summary generation can fail with HTTP 502 and Hermes inserts a fallback context marker, which may degrade the resumed session's accuracy. ## Steps to Reproduce 1. Configure a custom OpenAI-compatible provider with a 1M context length: ```yaml model: default: gpt-5.5 provider: custom base_url: https://redacted.example/openai/v1 context_length: 1000000 custom_providers: - name: private_codex base_url: https://redacted.example/openai/v1 model: gpt-5.5 models: gpt-5.5: context_length: 1000000 compression: enabled: true threshold: 0.8 target_ratio: 0.2 ``` 2. Start a long Hermes CLI session and let it approach/trigger context compaction. 3. Have the custom endpoint return HTTP 502 during compression summary generation. 4. Continue or resume the interrupted session. 5. Observe the CLI status bar and compaction warnings. ## Expected Behavior - The resumed/continued session should preserve the configured context length (`1000000`, displayed as roughly `1M`). - If summary generation fails, Hermes should not silently degrade context by inserting a fallback marker that loses important middle turns, or it should at least make the session state and recovery options explicit. - The configured context length should remain stable across interruption, compaction, resume, and continue flows. ## Actual Behavior - The status bar showed `97.9K/256K` even though the active config has `model.context_length: 1000000` and the custom provider entry also sets `context_length: 1000000`. - The CLI reported: ```text compression summary failed: Error code: 502. Inserted a fallback context marker. Session compressed 3 times — accuracy may degrade. Consider /new to start fresh. API call failed (attempt 1/3): InternalServerError [HTTP 502] Provider: custom Model: gpt-5.5 Endpoint: https://redacted.example/openai/v1 Error: HTTP 502: Error code: 502 ... Max retries (3) exhausted — trying fallback... API failed after 3 retries — HTTP 502: Error code: 502 Final error: HTTP 502: Error code: 502 ``` ## Additional Observations - Running `hermes config` after the incident confirms the config still contains `context_length: 1000000`. - Directly calling Hermes' model metadata resolver with `config_context_length=1000000` returns `1000000`, so a fresh initialization should resolve to 1M. - The `256K` value appears to come from the running/resumed agent's `ContextCompressor.context_length`, not from the current config file. - This suggests a resume/continue or compression failure path may restore or retain a stale/default context length instead of the configured value. ## Environment - OS: Windows 10 - Shell: Git Bash/MSYS via Hermes terminal backend - Hermes profile: default - Provider: custom OpenAI-compatible endpoint - Model: `gpt-5.5` - Configured context length: `1000000` - Observed status bar context length after interruption/continue: `256K` ## Possible Fix Direction - Ensure resumed/continued sessions rehydrate `ContextCompressor.context_length` from the active model/custom provider config, not from stale runtime state or fallback metadata. - Consider making `compression.abort_on_summary_failure` default to safer behavior for transient provider failures, or avoid inserting a fallback marker that drops middle context when the summary request fails. - Surface a clearer warning when the runtime context length differs from the configured `model.context_length` for the active provider/model.

Error Message

compression summary failed: Error code: 502. Inserted a fallback context marker. Session compressed 3 times — accuracy may degrade. Consider /new to start fresh. API call failed (attempt 1/3): InternalServerError [HTTP 502] Provider: custom Model: gpt-5.5 Endpoint: https://redacted.example/openai/v1 Error: HTTP 502: Error code: 502 ... Max retries (3) exhausted — trying fallback... API failed after 3 retries — HTTP 502: Error code: 502 Final error: HTTP 502: Error code: 502

Code Example

model:
  default: gpt-5.5
  provider: custom
  base_url: https://redacted.example/openai/v1
  context_length: 1000000

custom_providers:
  - name: private_codex
    base_url: https://redacted.example/openai/v1
    model: gpt-5.5
    models:
      gpt-5.5:
        context_length: 1000000

compression:
  enabled: true
  threshold: 0.8
  target_ratio: 0.2

---

compression summary failed: Error code: 502. Inserted a fallback context marker.
Session compressed 3 times — accuracy may degrade. Consider /new to start fresh.
API call failed (attempt 1/3): InternalServerError [HTTP 502]
Provider: custom  Model: gpt-5.5
Endpoint: https://redacted.example/openai/v1
Error: HTTP 502: Error code: 502
...
Max retries (3) exhausted — trying fallback...
API failed after 3 retries — HTTP 502: Error code: 502
Final error: HTTP 502: Error code: 502

Bug Description

After a Hermes CLI session is interrupted during context compaction and later resumed/continued, the effective context window shown in the status bar can change from the configured 1M tokens to 256K. In the same failure path, compression summary generation can fail with HTTP 502 and Hermes inserts a fallback context marker, which may degrade the resumed session's accuracy.

Steps to Reproduce

Configure a custom OpenAI-compatible provider with a 1M context length:

model:
  default: gpt-5.5
  provider: custom
  base_url: https://redacted.example/openai/v1
  context_length: 1000000

custom_providers:
  - name: private_codex
    base_url: https://redacted.example/openai/v1
    model: gpt-5.5
    models:
      gpt-5.5:
        context_length: 1000000

compression:
  enabled: true
  threshold: 0.8
  target_ratio: 0.2

Start a long Hermes CLI session and let it approach/trigger context compaction.
Have the custom endpoint return HTTP 502 during compression summary generation.
Continue or resume the interrupted session.
Observe the CLI status bar and compaction warnings.

Expected Behavior

The resumed/continued session should preserve the configured context length (1000000, displayed as roughly 1M).
If summary generation fails, Hermes should not silently degrade context by inserting a fallback marker that loses important middle turns, or it should at least make the session state and recovery options explicit.
The configured context length should remain stable across interruption, compaction, resume, and continue flows.

Actual Behavior

The status bar showed 97.9K/256K even though the active config has model.context_length: 1000000 and the custom provider entry also sets context_length: 1000000.
The CLI reported:

compression summary failed: Error code: 502. Inserted a fallback context marker.
Session compressed 3 times — accuracy may degrade. Consider /new to start fresh.
API call failed (attempt 1/3): InternalServerError [HTTP 502]
Provider: custom  Model: gpt-5.5
Endpoint: https://redacted.example/openai/v1
Error: HTTP 502: Error code: 502
...
Max retries (3) exhausted — trying fallback...
API failed after 3 retries — HTTP 502: Error code: 502
Final error: HTTP 502: Error code: 502

Additional Observations

Running hermes config after the incident confirms the config still contains context_length: 1000000.
Directly calling Hermes' model metadata resolver with config_context_length=1000000 returns 1000000, so a fresh initialization should resolve to 1M.
The 256K value appears to come from the running/resumed agent's ContextCompressor.context_length, not from the current config file.
This suggests a resume/continue or compression failure path may restore or retain a stale/default context length instead of the configured value.

Environment

OS: Windows 10
Shell: Git Bash/MSYS via Hermes terminal backend
Hermes profile: default
Provider: custom OpenAI-compatible endpoint
Model: gpt-5.5
Configured context length: 1000000
Observed status bar context length after interruption/continue: 256K

Possible Fix Direction

Ensure resumed/continued sessions rehydrate ContextCompressor.context_length from the active model/custom provider config, not from stale runtime state or fallback metadata.
Consider making compression.abort_on_summary_failure default to safer behavior for transient provider failures, or avoid inserting a fallback marker that drops middle context when the summary request fails.
Surface a clearer warning when the runtime context length differs from the configured model.context_length for the active provider/model.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix Context window changes to 256K after interrupted compaction and resume [1 pull requests]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Fixed