openclaw - 💡(How to fix) Fix Feature Request: Dual-Layer Context Compression [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#56907Fetched 2026-04-08 01:46:15
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Error Message

  • Preserves task-critical state: current goals, file paths, error context.

Code Example

compression:
  gateway:
    threshold: 0.85
    model: claude-sonnet-4  # cheaper model for summarization
    preserve_recent: 10     # keep last 10 turns verbatim
  agent:
    threshold: 0.50
    auto: true
    preserve_tool_state: true
RAW_BUFFERClick to expand / collapse

Feature Request: Dual-Layer Context Compression

Problem Statement

OpenClaw conversations degrade as the context window fills up. Users must manually run /compress or start new sessions, breaking flow and losing continuity. For the core use case — long-running persistent agents (the AGENTS.md + SOUL.md + MEMORY.md pattern) — this is a hard ceiling on usefulness.

A typical power-user session hits context limits within 2-4 hours of active use. At that point, the agent loses older context silently or errors out, forcing a manual restart.

Proposed Solution

Implement two independent compression layers:

Gateway Pre-Compression (~85% context usage)

  • Triggered automatically when context reaches ~85% of the model's window.
  • Summarizes older conversation turns while preserving:
    • Recent turns (last N messages, configurable)
    • Active tool call state and pending operations
    • Key facts and decisions from the session
  • Uses a configurable summarization model (can be cheaper than the active model).

Agent-Level Compression (~50% context usage, configurable)

  • More aggressive summarization targeting ~50% reduction.
  • Preserves task-critical state: current goals, file paths, error context.
  • Generates a structured "session state" object rather than prose summary.
  • Can be triggered manually or automatically.

Configuration

compression:
  gateway:
    threshold: 0.85
    model: claude-sonnet-4  # cheaper model for summarization
    preserve_recent: 10     # keep last 10 turns verbatim
  agent:
    threshold: 0.50
    auto: true
    preserve_tool_state: true

Transparency

  • Log compression events with metadata (tokens before/after, what was compressed).
  • Optional user notification: "Context compressed — older messages summarized."
  • Never lose active tool state or pending operations.

User Impact

  • Power users: Indefinite conversation sessions without manual intervention.
  • Agent workflows: AGENTS.md patterns (memory, heartbeats, background tasks) work reliably over 8+ hour sessions.
  • All users: No more "context full" errors or mysterious quality degradation.

This is the single biggest UX gap between OpenClaw and Hermes Agent. Hermes ships dual-layer compression and it fundamentally changes the experience — conversations feel unlimited.

Technical Considerations

  • Summarization quality: Compression is lossy. Important details can be lost. The structured state approach (agent-level) mitigates this.
  • Tool state preservation: Active exec sessions, file edit contexts, and multi-step workflows must survive compression.
  • Model-aware thresholds: Different models have different context windows; thresholds should be absolute token counts or percentages.
  • Cost: Summarization calls add cost, but less than restarting sessions or using larger context windows.
  • Testing: Need integration tests that verify state preservation across compression boundaries.

Priority

HIGHEST. This is table-stakes for any agent platform targeting persistent, long-running sessions. Without it, OpenClaw's most powerful patterns (AGENTS.md, heartbeats, memory systems) hit a hard wall every few hours. Hermes Agent has shipped this and users cite it as a primary reason for switching.

extent analysis

Fix Plan

To implement dual-layer context compression, follow these steps:

  • Gateway Pre-Compression:
    1. Introduce a new summarization model (claude-sonnet-4) for gateway pre-compression.
    2. Set up a threshold (0.85) to trigger automatic compression when context reaches 85% of the model's window.
    3. Implement a function to summarize older conversation turns while preserving recent turns, active tool call state, and key facts.
  • Agent-Level Compression:
    1. Develop a more aggressive summarization algorithm targeting 50% reduction.
    2. Preserve task-critical state, such as current goals, file paths, and error context.
    3. Generate a structured "session state" object instead of a prose summary.
  • Configuration: Update the configuration file (compression.yaml) to include the following settings:

compression: gateway: threshold: 0.85 model: claude-sonnet-4 preserve_recent: 10 agent: threshold: 0.50 auto: true preserve_tool_state: true

* **Transparency**:
1. Log compression events with metadata (tokens before/after, what was compressed).
2. Implement optional user notification: "Context compressed — older messages summarized."

### Example Code
Here's an example of how the gateway pre-compression function could be implemented in Python:
```python
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

def gateway_pre_compression(context, threshold, model, preserve_recent):
  # Load the summarization model and tokenizer
  summarization_model = AutoModelForSeq2SeqLM.from_pretrained(model)
  tokenizer = AutoTokenizer.from_pretrained(model)

  # Calculate the number of tokens to compress
  num_tokens_to_compress = int(len(context) * (1 - threshold))

  # Summarize the older conversation turns
  input_ids = tokenizer.encode(context[:-preserve_recent], return_tensors='pt')
  output = summarization_model.generate(input_ids, max_length=128)

  # Preserve recent turns, active tool call state, and key facts
  preserved_context = context[-preserve_recent:]
  preserved_context += tokenizer.decode(output[0], skip_special_tokens=True)

  return preserved_context

Verification

To verify that the fix worked, test the following scenarios:

  • Compression is triggered automatically when context reaches 85% of the model's window.
  • Compression preserves recent turns, active tool call state, and key facts.
  • Agent-level compression reduces context by 50% while preserving task-critical state.
  • Log compression events with metadata and optional user notification.

Extra Tips

  • Monitor the performance of the summarization models and adjust the thresholds as needed.
  • Consider implementing a feedback mechanism to improve the

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Feature Request: Dual-Layer Context Compression [1 participants]