openclaw - 💡(How to fix) Fix [Bug] Checkpoint/compaction creates exponential duplicate message growth in session .jsonl files [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72780Fetched 2026-04-28 06:32:15
View on GitHub
Comments
1
Participants
2
Timeline
2
Reactions
0
Timeline (top)
closed ×1commented ×1

Session .jsonl files accumulate massive numbers of duplicate user messages through the checkpoint/compaction cycle. In the worst case, a single session file grew to 452 MB with 12,250 duplicate user messages (out of 12,401 total — 98.8% duplicates).

Error Message

  1. Monitor duplicate ratio: Warn when a session exceeds a threshold (e.g., >50% duplicate user messages)

Root Cause

This cleaned 47,885 duplicates and 187,527 metadata blocks across 77 files. However, the duplicates reaccumulate because the root cause (checkpoint preservation of duplicates) is not addressed.

Fix Action

Workaround

We built an external cleaner script (memory_poison_cleaner.py) that:

  • Parses OpenClaw event-based .jsonl format
  • Deduplicates by MD5 content hash with configurable timestamp window
  • Strips inline metadata blocks from user messages
  • Removes system-noise-only messages

This cleaned 47,885 duplicates and 187,527 metadata blocks across 77 files. However, the duplicates reaccumulate because the root cause (checkpoint preservation of duplicates) is not addressed.

RAW_BUFFERClick to expand / collapse

Summary

Session .jsonl files accumulate massive numbers of duplicate user messages through the checkpoint/compaction cycle. In the worst case, a single session file grew to 452 MB with 12,250 duplicate user messages (out of 12,401 total — 98.8% duplicates).

Evidence

Analysis across 7 agents on a single deployment:

MetricValue
Agents affected7 (cybera, cyberlogis, cylena, descartes, main, miku, sysauxilia)
Files with duplicates75
Total duplicate user messages50,474
Total size of affected files2,676 MB

Worst offenders (all from same session tree)

FileDuplicatesTotal User MsgsUnique Dup ContentSize
a6bd5515...jsonl (live)12,25012,401149452.6 MB
a6bd5515...checkpoint.cd1f572e10,90811,053143396.3 MB
a6bd5515...checkpoint.3341b05b3,3303,42973107.2 MB
034ba722...checkpoint.* (×8 files)~1,900 each~2,400 each~80 each~140 MB each

The checkpoint files appear to carry forward the full duplicate history, and each new checkpoint preserves all prior duplicates.

Pattern

The duplication follows this pattern:

  1. A user message is injected into the session (possibly via the already-reported duplicate delivery bug #72702)
  2. The session is checkpointed/compacted, preserving all messages including duplicates
  3. New messages arrive and are also duplicated
  4. The cycle repeats, causing linear-to-exponential growth in duplicate count
  5. Each checkpoint file is a snapshot of this growing duplicate tree

Impact

  • Context window waste: Duplicates consume token budget without adding information
  • Model confusion: Seeing the same message 3-4+ times degrades comprehension and response quality
  • Disk usage: 2.6 GB of duplicate-laden session files across 7 agents
  • Memory pressure: Large session files slow down session loading and compaction
  • Cascading failures: Bloated sessions contribute to timeout and model errors

Related Issues

  • #72702 — Telegram messages delivered multiple times (the initial duplicate injection)
  • #72703 — Background task updates injected as user-role instead of system-role
  • #72704 — Excessive inline JSON metadata in Telegram user messages

Suggested Fix

  1. Deduplicate during compaction: When creating a checkpoint, deduplicate user messages by content hash + timestamp proximity (within 60s)
  2. Deduplicate on session load: Strip duplicates when loading a session for inference
  3. Add a --deduplicate CLI command: Allow operators to clean existing session files without external tools
  4. Monitor duplicate ratio: Warn when a session exceeds a threshold (e.g., >50% duplicate user messages)

Workaround

We built an external cleaner script (memory_poison_cleaner.py) that:

  • Parses OpenClaw event-based .jsonl format
  • Deduplicates by MD5 content hash with configurable timestamp window
  • Strips inline metadata blocks from user messages
  • Removes system-noise-only messages

This cleaned 47,885 duplicates and 187,527 metadata blocks across 77 files. However, the duplicates reaccumulate because the root cause (checkpoint preservation of duplicates) is not addressed.

Environment

  • OpenClaw version: latest (as of 2026-04-27)
  • Agents: 7 (cybera, cyberlogis, cylena, descartes, main, miku, sysauxilia)
  • Primary channel: Telegram
  • Models: Gemini 3.1, GLM-5.1, MiniMax-M2.7

extent analysis

TL;DR

Implement deduplication during compaction and session load to prevent exponential growth of duplicate user messages in session files.

Guidance

  • Identify and address the root cause of duplicate message injection, potentially related to the already-reported issue #72702.
  • Implement deduplication during compaction by content hash and timestamp proximity (within 60s) to prevent preserving duplicates in checkpoint files.
  • Consider adding a --deduplicate CLI command to allow operators to clean existing session files without external tools.
  • Monitor duplicate ratios and warn when a session exceeds a threshold (e.g., >50% duplicate user messages) to prevent cascading failures.

Example

No code snippet is provided as the issue does not imply a specific code change, but rather a design or implementation adjustment.

Notes

The provided memory_poison_cleaner.py script can be used as a temporary workaround to clean existing session files, but it does not address the root cause of the issue.

Recommendation

Apply the suggested fix by implementing deduplication during compaction and session load, as this addresses the root cause of the issue and prevents exponential growth of duplicate user messages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug] Checkpoint/compaction creates exponential duplicate message growth in session .jsonl files [1 comments, 2 participants]