openclaw - 💡(How to fix) Fix Memory dreaming: corpus pre-filtering, weighted scoring, and session scan stall [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71656Fetched 2026-04-26 05:10:14
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

The memory-core dreaming pipeline runs nightly but has three scaling/quality issues that prevent meaningful promotion from light sleep to REM as the corpus grows.

Root Cause

There's no weighting by content type. A decision ("we're switching the auth middleware") scores identically to a heartbeat ping. Lowering minPatternStrength from 0.6 to 0.45 had no effect because the pattern detection itself isn't differentiating.

Code Example

# Check phase signals — session corpus entries frozen at 3 light hits
cat memory/.dreams/phase-signals.json | python3 -c "
import json, sys
d = json.load(sys.stdin)
session = [e for k, e in d['entries'].items() if 'session-corpus' in k]
print(f'Session corpus entries: {len(session)}')
print(f'Sample light hits: {session[0][\"lightHits\"] if session else \"N/A\"}')
print(f'Sample lastLightAt: {session[0][\"lastLightAt\"] if session else \"N/A\"}')
"
RAW_BUFFERClick to expand / collapse

Summary

The memory-core dreaming pipeline runs nightly but has three scaling/quality issues that prevent meaningful promotion from light sleep to REM as the corpus grows.

Environment

  • OpenClaw 2026.4.22 (00bd2cf)
  • memory-core plugin, dreaming enabled
  • 19 successful nightly runs since deployment

Issues

1. No corpus pre-filtering — noise drowns signal (~60% of ingested content is noise)

Session corpus files (memory/.dreams/session-corpus/*.txt) ingest raw session lines without any filtering. This includes:

  • Repeated HEARTBEAT_OK ping/pong exchanges (zero signal)
  • Duplicate context blocks (same content refinement prompt re-dumped 8+ times)
  • MC diagnostic one-word test messages
  • Repeated full task-state dumps (same 50+ task list pasted across multiple chat turns)
  • Subagent boilerplate context blocks (build instructions repeated verbatim)

Expected: Pre-filter or deduplicate corpus entries before scoring. At minimum, strip heartbeat pings and identical repeated blocks.

2. Flat scoring — all entries get the same score regardless of content quality

The scoring engine assigns uniform scores:

  • 0.58 to all session corpus entries (1,094 entries)
  • 0.62 to all daily log entries (222 entries)
  • Only entries that already reached REM have differentiated scores (2.48, 1.86, etc.)

There's no weighting by content type. A decision ("we're switching the auth middleware") scores identically to a heartbeat ping. Lowering minPatternStrength from 0.6 to 0.45 had no effect because the pattern detection itself isn't differentiating.

Expected: Content-type-aware scoring — decisions, architectural changes, user preferences, and time-sensitive context should score higher than status pings. Recency should also factor in.

3. Session corpus scan stalled after April 9th

All 377 session corpus entries have exactly 3 light hits, all last touched on 2026-04-09. After that date, the nightly cycle continues scanning memory/*.md daily logs but never re-visits the session corpus. This appears to be a scan budget or pagination issue — as the daily log pool grew, it consumed the entire scan window.

Expected: The scan should rotate across all corpus sources, or prioritize unscored/under-scored entries over re-scanning entries with 100+ light hits.

Reproduction

# Check phase signals — session corpus entries frozen at 3 light hits
cat memory/.dreams/phase-signals.json | python3 -c "
import json, sys
d = json.load(sys.stdin)
session = [e for k, e in d['entries'].items() if 'session-corpus' in k]
print(f'Session corpus entries: {len(session)}')
print(f'Sample light hits: {session[0][\"lightHits\"] if session else \"N/A\"}')
print(f'Sample lastLightAt: {session[0][\"lastLightAt\"] if session else \"N/A\"}')
"

Suggested improvements

  1. Corpus pre-filtering — strip noise before indexing (heartbeats, duplicate blocks, test messages)
  2. Weighted scoring — differentiate by content type (decisions > status > pings)
  3. Recency bias — recent entries should get priority scan budget over old high-hit entries
  4. Scan rotation — ensure all sources get scanned each cycle, not just whichever fills the budget first

extent analysis

TL;DR

Implement corpus pre-filtering and weighted scoring to improve the quality of the memory-core dreaming pipeline.

Guidance

  • Apply corpus pre-filtering to remove noise from session corpus entries, such as heartbeat pings, duplicate blocks, and test messages.
  • Introduce weighted scoring to differentiate between content types, assigning higher scores to decisions, architectural changes, and user preferences.
  • Consider implementing recency bias to prioritize recent entries in the scan budget.
  • Review the scan rotation mechanism to ensure all corpus sources are scanned each cycle.

Example

import json

# Load phase signals
with open('memory/.dreams/phase-signals.json') as f:
    data = json.load(f)

# Filter out noise from session corpus entries
session_corpus = [e for k, e in data['entries'].items() if 'session-corpus' in k and not e['content'].startswith('HEARTBEAT_OK')]

# Assign weighted scores based on content type
for entry in session_corpus:
    if 'decision' in entry['content']:
        entry['score'] = 1.0
    elif 'architectural change' in entry['content']:
        entry['score'] = 0.8
    else:
        entry['score'] = 0.2

Notes

The provided example is a simplified illustration of corpus pre-filtering and weighted scoring. The actual implementation may require more complex logic and fine-tuning of scoring weights.

Recommendation

Apply the suggested improvements, starting with corpus pre-filtering and weighted scoring, to address the scaling and quality issues in the memory-core dreaming pipeline.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Memory dreaming: corpus pre-filtering, weighted scoring, and session scan stall [1 participants]