openclaw - ✅(Solved) Fix [Bug]: Memory-core session indexer skips .jsonl.reset and .jsonl.deleted files — most past sessions invisible to search [4 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#57334Fetched 2026-04-08 01:50:51
View on GitHub
Comments
0
Participants
1
Timeline
14
Reactions
0
Participants
Timeline (top)
referenced ×7cross-referenced ×5labeled ×2

The listSessionFilesForAgent function in query-expansion-CeNhqo71.js filters session files with .filter((name) => name.endsWith(".jsonl")), which excludes .jsonl.reset.* and .jsonl.deleted.* files. After the daily 4 AM session reset, main session transcripts are renamed from <id>.jsonl to <id>.jsonl.reset.<timestamp> — making them invisible to the session indexer. Only small cron/subagent sessions (which aren't reset) and the current live session remain as .jsonl. Result: 211 .jsonl files indexed out of 485 total, with all large main conversation sessions excluded.

Root Cause

Only 211 out of 485 session files are indexed. The 274 excluded files include all main conversation sessions (the largest and most valuable transcripts). A 13.7MB main session with 416 user/assistant messages (~214K chars of conversation text, estimated ~133 chunks) produces 0 indexed chunks because it was renamed to .jsonl.reset.* after the 4 AM session reset.

Fix Action

Fix / Workaround

Our workaround: a custom transcript-to-md.py script that converts ALL session files (including .reset and .deleted) to markdown, which memory-core then indexes as source="memory" files with proper chunking (~24 chunks per file). This works but should not be necessary — the native session indexer should handle renamed files.

PR fix notes

PR #57341: fix(memory): include .jsonl.reset and .jsonl.deleted files in session indexer

Description (problem / solution / changelog)

lobster-biscuit

Closes #57334

Problem

The memory session indexer skips .jsonl.reset.* and .jsonl.deleted.* files because listSessionFilesForAgent() filters with name.endsWith(".jsonl"). After the daily 4AM session reset, main transcripts are renamed to .jsonl.reset.<timestamp> — making them invisible to memory search.

Reporter measured: 211/485 files indexed, all large main sessions excluded.

Root cause

packages/memory-host-sdk/src/host/session-files.ts:28endsWith(".jsonl") is too strict. Reset and deleted session files contain valid JSONL transcript data that should be indexed for memory search.

User impact

Memory search returns empty or sparse results because the majority of conversation history (main sessions, post-reset) is invisible to the indexer. Users think memory is broken when it's just not seeing their data.

Fix

Change endsWith(".jsonl") to includes(".jsonl") && !endsWith(".lock") — includes reset and deleted transcripts while excluding lock files.

1 file, 1 line changed.

How to verify

  1. Run several sessions, let daily 4AM reset rename them to .jsonl.reset.*
  2. openclaw memory index --force
  3. Before: only small cron sessions indexed. After: all sessions including reset ones.

Tests

The filter change is in a leaf utility function. Existing memory indexer tests exercise listSessionFilesForAgent with .jsonl files — the fix broadens the filter to also match .jsonl.reset.* and .jsonl.deleted.* suffixes.

🤖 Generated with Claude Code

Changed files

  • CHANGELOG.md (modified, +1/-0)

PR #57373: fix(memory): index archived session transcripts

Description (problem / solution / changelog)

Summary

  • Problem: session indexing only kept files whose names ended with .jsonl, so archived transcripts like .jsonl.reset.* and .jsonl.deleted.* never entered the candidate set.
  • Why it matters: users with daily session resets lose most past session history from memory-core search even though those transcripts still exist on disk.
  • What changed: listSessionFilesForAgent now reuses the canonical usage-counted session artifact helper, and session-files.test.ts adds a regression test that includes reset/deleted transcripts while still excluding .bak artifacts.
  • What did NOT change (scope boundary): session content parsing, chunking, hash generation, and sync/indexing strategy all stay the same.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #57334
  • Related #44028
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: packages/memory-host-sdk/src/host/session-files.ts filtered candidates with name.endsWith(".jsonl"), which silently excluded .jsonl.reset.* and .jsonl.deleted.* transcripts before the memory indexer could read them.
  • Missing detection / guardrail: there was no regression test covering archived transcript enumeration in listSessionFilesForAgent.
  • Prior context (git blame, prior PR, issue, or refactor if known): the repo already had canonical session artifact helpers in src/config/sessions/artifacts.ts, but this listing helper duplicated the rule with a narrower filename check.
  • Why this regressed now: the memory indexing path drifted from the canonical session artifact classification rules.
  • If unknown, what was ruled out: N/A

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: packages/memory-host-sdk/src/host/session-files.test.ts
  • Scenario the test should lock in: active .jsonl, archived .jsonl.reset.*, and archived .jsonl.deleted.* files are listed for session indexing, while .jsonl.bak.* remains excluded.
  • Why this is the smallest reliable guardrail: the bug is in the transcript file selection helper, so a focused unit test catches the regression at the exact seam where archived sessions were dropped.
  • Existing test that already covers this (if any): None.
  • If no new test is added, why not: N/A

User-visible / Behavior Changes

Users with session memory enabled can now have archived reset/deleted session transcripts included in memory-core indexing, so past sessions remain searchable after resets.

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation: N/A

Repro + Verification

Environment

  • OS: Windows 11
  • Runtime/container: local pnpm/vitest checkout
  • Model/provider: N/A
  • Integration/channel (if any): memory-core session indexing
  • Relevant config (redacted): sources: ["sessions"], experimental.sessionMemory: true

Steps

  1. Place transcripts in the agent sessions directory with names ending in .jsonl, .jsonl.reset.<timestamp>, and .jsonl.deleted.<timestamp>.
  2. Run the session listing/indexing path.
  3. Observe which files are eligible for indexing.

Expected

  • Primary and archived usage-counted transcripts are included.
  • .bak artifacts remain excluded.

Actual

  • Before this change, only names ending exactly in .jsonl were included.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: ran pnpm test -- packages/memory-host-sdk/src/host/session-files.test.ts after adding a regression case for reset/deleted transcript enumeration; also verified pnpm build on the working tree from a bash shell on Windows after fixing local shell PATH visibility for node.
  • Edge cases checked: .jsonl.bak.* remains excluded while .jsonl.reset.* and .jsonl.deleted.* are included.
  • What you did not verify: I did not run a full memory-core end-to-end indexing repro in the temp PR clone.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps: N/A

Risks and Mitigations

  • Risk: archived transcript inclusion could accidentally pull in unsupported session artifacts.
    • Mitigation: the new filter reuses isUsageCountedSessionTranscriptFileName, which already excludes .bak artifacts and only allows primary/reset/deleted transcript forms.

Changed files

  • packages/memory-host-sdk/src/host/session-files.test.ts (modified, +29/-31)

PR #57375: fix(session-indexer): include .jsonl.reset.* and .jsonl.deleted.* files

Description (problem / solution / changelog)

Summary

After daily session compaction, main session transcripts are renamed to <id>.jsonl.reset.<timestamp> or <id>.jsonl.deleted.<timestamp>. The listSessionFilesForAgent function only matched files ending in .jsonl, excluding all main conversation sessions after the first daily reset.

listSessionFilesForAgent now matches:

  • <id>.jsonl — active sessions
  • <id>.jsonl.reset.<timestamp> — compacted sessions
  • <id>.jsonl.deleted.<timestamp> — deleted sessions

Files Changed

  • packages/memory-host-sdk/src/host/session-files.ts: Extended filter
  • packages/memory-host-sdk/src/host/session-files.test.ts: Added 3 test cases

Linked Issue

Fixes #57334

Changed files

  • extensions/discord/src/monitor/provider.lifecycle.test.ts (modified, +39/-0)
  • extensions/discord/src/monitor/provider.lifecycle.ts (modified, +6/-0)
  • packages/memory-host-sdk/src/host/session-files.test.ts (modified, +36/-0)
  • packages/memory-host-sdk/src/host/session-files.ts (modified, +6/-1)
  • src/agents/pi-tools.ts (modified, +5/-1)
  • src/agents/tool-policy.test.ts (modified, +24/-0)
  • src/agents/tool-policy.ts (modified, +19/-2)

PR #57445: fix(memory): include reset and deleted session files in indexer

Description (problem / solution / changelog)

🐛 Problem

The memory-core session indexer was missing 85% of historical session files because the file filter used .endsWith('.jsonl') which excluded renamed files after daily session resets.

📋 Root Cause

After the daily 4 AM session reset, main session transcript files are renamed with suffixes:

  • .jsonl.reset.<timestamp>
  • .jsonl.deleted.<timestamp>

The current filter .endsWith('.jsonl') only matches files ending exactly with ".jsonl", missing all the renamed session files.

✅ Solution

Changed the filter from:

  • .endsWith('.jsonl').includes('.jsonl')

This ensures all session transcript variants are included in the indexer:

  • session.jsonl (active sessions)
  • session.jsonl.reset.1234567890 (reset sessions)
  • session.jsonl.deleted.1234567890 (deleted sessions)

🔍 Files Changed

  • packages/memory-host-sdk/src/host/session-files.ts: Updated listSessionFilesForAgent() function
  • extensions/memory-core/src/cli.runtime.ts: Updated scanSessionFiles() function

🧪 Impact

This fix makes historical sessions searchable again, improving the search indexing coverage from ~15% to 100% of session files.

Fixes #57334

Changed files

  • extensions/memory-core/src/cli.runtime.ts (modified, +1/-1)

Code Example

.filter((name) => name.endsWith(".jsonl"))

---

# Our deployment:
Total files in sessions dir: 485
Files ending in .jsonl: 211 (small cron/subagent sessions)
Files ending in .jsonl.reset.*: 57 (main conversation sessions — EXCLUDED)
Files ending in .jsonl.deleted.*: 217 (archived sessions — EXCLUDED)

# Database shows:
Current live session: 173 chunks (correctly indexed)
Second largest: 10 chunks (small cron session)
All main past sessions: 0 chunks (invisible)

---
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

The listSessionFilesForAgent function in query-expansion-CeNhqo71.js filters session files with .filter((name) => name.endsWith(".jsonl")), which excludes .jsonl.reset.* and .jsonl.deleted.* files. After the daily 4 AM session reset, main session transcripts are renamed from <id>.jsonl to <id>.jsonl.reset.<timestamp> — making them invisible to the session indexer. Only small cron/subagent sessions (which aren't reset) and the current live session remain as .jsonl. Result: 211 .jsonl files indexed out of 485 total, with all large main conversation sessions excluded.

Steps to reproduce

  1. Configure an agent with sources: ["memory", "sessions"] and experimental.sessionMemory: true.
  2. Use the agent actively for several days with the daily 4 AM session reset enabled.
  3. Run ls ~/.openclaw/agents/main/sessions/ | grep -c '\.jsonl$' (active files) vs ls | wc -l (total files).
  4. Run openclaw memory index --force --verbose and note the session chunk count.
  5. Run sqlite3 ~/.openclaw/memory/main.sqlite "SELECT path, COUNT(*) FROM chunks WHERE source='sessions' GROUP BY path ORDER BY COUNT(*) DESC LIMIT 10" — only small cron sessions appear.

Source code reference: query-expansion-CeNhqo71.js, line ~413:

.filter((name) => name.endsWith(".jsonl"))

Expected behavior

All session transcript files — including .jsonl.reset.* and .jsonl.deleted.* — should be included in session indexing. These are completed session transcripts that represent the bulk of conversation history. The file filter should match .jsonl anywhere in the filename, not just at the end. e.g. .filter((name) => name.includes(".jsonl"))

Actual behavior

Only 211 out of 485 session files are indexed. The 274 excluded files include all main conversation sessions (the largest and most valuable transcripts). A 13.7MB main session with 416 user/assistant messages (~214K chars of conversation text, estimated ~133 chunks) produces 0 indexed chunks because it was renamed to .jsonl.reset.* after the 4 AM session reset.

The indexed sessions are almost entirely small cron/subagent sessions (1-10 chunks each), giving the false impression that the chunker is "sparse" when in reality the large sessions simply aren't being read.

# Our deployment:
Total files in sessions dir: 485
Files ending in .jsonl: 211 (small cron/subagent sessions)
Files ending in .jsonl.reset.*: 57 (main conversation sessions — EXCLUDED)
Files ending in .jsonl.deleted.*: 217 (archived sessions — EXCLUDED)

# Database shows:
Current live session: 173 chunks (correctly indexed)
Second largest: 10 chunks (small cron session)
All main past sessions: 0 chunks (invisible)

OpenClaw version

2026.3.24 (cff6dc9) — also confirmed in 2026.3.28 source

Operating system

macOS 25.3.0 (Mac mini M4, arm64)

Install method

npm global

Model

N/A — indexing bug, not model-specific

Provider / routing chain

N/A — indexing bug

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Affected: Any user with daily session resets enabled (the default behavior). All main conversation sessions become invisible to memory-core search after the reset renames them. Severity: High — effectively disables cross-session memory recall via the native session indexer. Users believe session indexing is working (status shows sources: ["memory", "sessions"]) but past conversations are silently excluded. Frequency: 100% — every session reset renames the file, every renamed file is excluded. Consequence: memory_search cannot find content from any past main session. Only cron/subagent sessions and the current live session are searchable. This defeats the purpose of sources: ["sessions"].

Additional information

Related to #44028 (shouldSyncSessions bug) — we initially attributed the sparse session chunk count (380 chunks from 204 files) to aggressive chunking. The real cause is that the 204 .jsonl files are mostly tiny cron sessions. The large main sessions were never in the candidate pool because they'd been renamed to .reset or .deleted.

Our workaround: a custom transcript-to-md.py script that converts ALL session files (including .reset and .deleted) to markdown, which memory-core then indexes as source="memory" files with proper chunking (~24 chunks per file). This works but should not be necessary — the native session indexer should handle renamed files.

Suggested fix: Change the file filter from name.endsWith(".jsonl") to name.includes(".jsonl") (or a regex like /\.jsonl(\.|$)/), so .jsonl.reset.* and .jsonl.deleted.* files are included in session indexing.

extent analysis

Fix Plan

To fix the issue, we need to modify the file filter in the listSessionFilesForAgent function to include files that contain .jsonl anywhere in the filename, not just at the end.

  • Update the query-expansion-CeNhqo71.js file, line ~413, to use the includes() method:
.filter((name) => name.includes(".jsonl"))

Alternatively, you can use a regular expression to achieve the same result:

.filter((name) => /\.jsonl(\.|$)/.test(name))
  • Save the changes and restart the OpenClaw service to apply the update.

Verification

To verify that the fix worked, follow these steps:

  1. Run openclaw memory index --force --verbose to re-index the session files.
  2. Check the session chunk count using sqlite3 ~/.openclaw/memory/main.sqlite "SELECT path, COUNT(*) FROM chunks WHERE source='sessions' GROUP BY path ORDER BY COUNT(*) DESC LIMIT 10".
  3. Verify that the main conversation sessions are now included in the indexing results.

Extra Tips

  • Make sure to test the fix thoroughly to ensure that it does not introduce any new issues.
  • Consider adding additional logging or monitoring to detect similar issues in the future.
  • If you encounter any problems during the update process, refer to the OpenClaw documentation or seek support from the community.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

All session transcript files — including .jsonl.reset.* and .jsonl.deleted.* — should be included in session indexing. These are completed session transcripts that represent the bulk of conversation history. The file filter should match .jsonl anywhere in the filename, not just at the end. e.g. .filter((name) => name.includes(".jsonl"))

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING