openclaw - ✅(Solved) Fix memory-wiki: wiki ingest source slugs silently overwrite CJK titles (ASCII-only slug + no unique suffix) [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#65965Fetched 2026-04-14 05:39:31
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

openclaw wiki ingest can silently overwrite existing Wiki/sources/*.md pages when source titles contain CJK or other non-ASCII characters, because source identity is derived from an ASCII-only slug.

Root Cause

Summary

openclaw wiki ingest can silently overwrite existing Wiki/sources/*.md pages when source titles contain CJK or other non-ASCII characters, because source identity is derived from an ASCII-only slug.

Fix Action

Fixed

PR fix notes

PR #65992: fix(memory-wiki): prevent ingested source page collisions

Description (problem / solution / changelog)

Summary

This PR fixes the source-page collision issue described in #65965.

Previously, ingestMemoryWikiSource(...) derived both pageId and pageRelativePath directly from the slugified title. As a result, different source files could silently overwrite each other when their titles normalized to the same slug, especially for non-ASCII titles such as CJK content.

This change keeps the readable slug, but adds a stable short suffix derived from the resolved source path when building source page identity.

What changed

  • keep the readable slug for ingested source pages
  • add a stable short hash suffix derived from the resolved source path
  • use the same identity consistently for both:
    • pageId
    • pageRelativePath
  • preserve stability for repeated ingest of the same source

Regression coverage

Added coverage for:

  • two pure CJK titles no longer collapsing to the same source page
  • similar titles with the same slug no longer overwriting each other
  • repeated ingest of the same source remaining stable
  • plain ASCII titles remaining readable and stable
  • identical titles with different sourcePath values remaining distinct

Scope

This PR prevents new ingest identity collisions for source pages. It does not redesign the broader wiki identity system, and it does not attempt to migrate or repair historical collisions already present in existing vaults.

Version context

The issue was reported against OpenClaw 2026.4.11, but this fix is developed against current main.

Changed files

  • extensions/memory-wiki/src/ingest.test.ts (modified, +239/-4)
  • extensions/memory-wiki/src/ingest.ts (modified, +61/-5)

Code Example

function slugifyWikiSegment(raw) {
  return normalizeLowercaseStringOrEmpty(raw)
    .replace(/[^a-z0-9]+/g, "-")
    .replace(/-+/g, "-")
    .replace(/^-+|-+$/g, "") || "page";
}

---

const title = resolveSourceTitle(sourcePath, params.title);
const slug = slugifyWikiSegment(title);
const pageId = `source.${slug}`;
const pageRelativePath = path.join("sources", `${slug}.md`);

---

.replace(/[^\p{L}\p{N}]+/gu, "-")
RAW_BUFFERClick to expand / collapse

Summary

openclaw wiki ingest can silently overwrite existing Wiki/sources/*.md pages when source titles contain CJK or other non-ASCII characters, because source identity is derived from an ASCII-only slug.

Current implementation

In installed 2026.4.11 (dist/cli-fw2V8Mig.js):

function slugifyWikiSegment(raw) {
  return normalizeLowercaseStringOrEmpty(raw)
    .replace(/[^a-z0-9]+/g, "-")
    .replace(/-+/g, "-")
    .replace(/^-+|-+$/g, "") || "page";
}

ingestMemoryWikiSource() uses that slug directly for both page id and source path:

const title = resolveSourceTitle(sourcePath, params.title);
const slug = slugifyWikiSegment(title);
const pageId = `source.${slug}`;
const pageRelativePath = path.join("sources", `${slug}.md`);

Why this is dangerous

  • Pure CJK titles can collapse to page
  • Mixed titles can collapse to a short shared prefix (OpenClaw:先驱准备成先烈 -> openclaw)
  • Subsequent ingests then overwrite the previous source page at the same path
  • The overwrite is silent from the user's perspective because the path/id is treated as canonical

Real example

Current vault already contains a source page created as:

  • sources/page.md
  • id: source.page
  • title: 🌟 高效能人士的七个习惯

This happened because the title slugified to the fallback page.

That means any later CJK-only title that also slugifies to empty would overwrite sources/page.md.

Expected behavior

Source identity for wiki ingest should be stable and collision-resistant.

Suggested fix

Minimum fix

Make slugification Unicode-aware, for example:

.replace(/[^\p{L}\p{N}]+/gu, "-")

This preserves CJK/Cyrillic/Arabic letters instead of stripping everything to empty.

Better fix

Do not rely on slug alone for source identity. Add a stable suffix, e.g.:

  • sources/<slug>-<hash8>.md
  • id: source.<slug>.<hash8>

Possible hash inputs:

  • sourcePath
  • source URL
  • explicit source id
  • content hash

Unicode-aware slugging reduces damage; a unique suffix actually prevents silent overwrite for same/similar titles.

Impact area

This affects wiki source ingestion workflows, especially reading/article pipelines that ingest many human-titled documents in Chinese or other non-ASCII languages.

extent analysis

TL;DR

Implement a Unicode-aware slugification to prevent silent overwrites of existing wiki source pages.

Guidance

  • Update the slugifyWikiSegment function to use a Unicode-aware regular expression, such as .replace(/[^\p{L}\p{N}]+/gu, "-"), to preserve non-ASCII characters.
  • Consider adding a stable suffix to the source identity, such as a hash of the source path or content, to prevent collisions.
  • Verify the fix by testing with titles containing CJK or other non-ASCII characters to ensure that they are correctly slugified and do not overwrite existing pages.
  • Review the impact area, specifically wiki source ingestion workflows, to ensure that the fix does not introduce any new issues.

Example

function slugifyWikiSegment(raw) {
  return normalizeLowercaseStringOrEmpty(raw)
    .replace(/[^\p{L}\p{N}]+/gu, "-")
    .replace(/-+/g, "-")
    .replace(/^-+|-+$/g, "") || "page";
}

Notes

The suggested fix assumes that the slugifyWikiSegment function is the only place where slugification is performed. If slugification is performed elsewhere, those instances may also need to be updated.

Recommendation

Apply the workaround by updating the slugifyWikiSegment function to use a Unicode-aware regular expression, as this will prevent silent overwrites of existing wiki source pages.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Source identity for wiki ingest should be stable and collision-resistant.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING