openclaw - ✅(Solved) Fix [Bug]: normalizeHyphenSlug strips all CJK characters, breaking group display names for non-Latin languages [4 pull requests, 2 comments, 3 participants]

openclaw2026-04-01 10:09:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#58932•Fetched 2026-04-08 02:31:03

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Participants

Timeline (top)

cross-referenced ×4commented ×2referenced ×2labeled ×1

normalizeHyphenSlug regex /[^a-z0-9#@._+-]+/g strips all non-ASCII characters, causing buildGroupDisplayName to return only the provider key for group names containing CJK/Cyrillic/Arabic characters.

Root Cause

Group session display name shows only the provider key (e.g. "telegram" or "whatsapp"). The group name is completely lost because normalizeHyphenSlug in src/shared/string-normalization.ts line 15 uses regex /[^a-z0-9#@._+-]+/g which strips all non-ASCII characters to empty string.

Fix Action

Fix / Workaround

Root cause: normalizeHyphenSlug in src/shared/string-normalization.ts line 15 regex /[^a-z0-9#@._+-]+/g only preserves ASCII Latin characters. All Unicode characters (Chinese, Japanese, Korean, Cyrillic, Arabic, etc.) are replaced with -, then cleaned to empty string. Code path: normalizeHyphenSlug → normalizeGroupLabel → buildGroupDisplayName → deriveGroupSessionPatch → session displayName Suggested fix: Use Unicode-aware regex /[^\p{L}\p{N}#@._+-]+/gu to preserve non-Latin scripts, or add a fallback in deriveGroupSessionPatch (metadata.ts) to use ConversationLabel when slug normalization produces only the provider key. This is not a regression — non-Latin group names have never worked correctly.

PR fix notes

PR #58942: fix: preserve non-Latin characters in normalizeHyphenSlug

Repository: openclaw/openclaw
Author: fengqing-git
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/58942

Description (problem / solution / changelog)

normalizeHyphenSlug strips all non-ASCII characters via /[^a-z0-9#@._+-]+/g, causing buildGroupDisplayName to lose CJK/Cyrillic/Arabic group names and fall back to just the provider key.

Switch to Unicode-aware character classes (\p{L}, \p{N}) so non-Latin scripts are preserved in group display names.

Fixes #58932

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

Problem:
Why it matters:
What changed:
What did NOT change (scope boundary):

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related #
This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

Root cause:
Missing detection / guardrail:
Prior context (git blame, prior PR, issue, or refactor if known):
Why this regressed now:
If unknown, what was ruled out:

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should have caught this. Otherwise write N/A.

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file:
Scenario the test should lock in:
Why this is the smallest reliable guardrail:
Existing test that already covers this (if any):
If no new test is added, why not:

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Diagram (if applicable)

For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write N/A.

Before:
[user action] -> [old state]

After:
[user action] -> [new state] -> [result]

Security Impact (required)

New permissions/capabilities? (Yes/No)
Secrets/tokens handling changed? (Yes/No)
New/changed network calls? (Yes/No)
Command/tool execution surface changed? (Yes/No)
Data access scope changed? (Yes/No)
If any Yes, explain risk + mitigation:

Repro + Verification

Environment

OS:
Runtime/container:
Model/provider:
Integration/channel (if any):
Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

Verified scenarios:
Edge cases checked:
What you did not verify:

Review Conversations

I replied to or resolved every bot review conversation I addressed in this PR.
I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

Backward compatible? (Yes/No)
Config/env changes? (Yes/No)
Migration needed? (Yes/No)
If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

Risk:
- Mitigation:

Changed files

src/shared/string-normalization.test.ts (modified, +13/-0)
src/shared/string-normalization.ts (modified, +1/-1)

PR #58973: Fix: preserve non-ASCII group names in normalizeHyphenSlug (Resolves #58932)

Repository: openclaw/openclaw
Author: Mlightsnow
State: closed | merged: False
Link: https://github.com/openclaw/openclaw/pull/58973

Description (problem / solution / changelog)

Fixes #58932.

normalizeHyphenSlug used /[^a-z0-9#@._+-]+/g which stripped all non-ASCII characters (CJK, Cyrillic, Arabic, etc.) to empty string. For groups with non-Latin names (e.g. Telegram "技术讨论组"), this caused buildGroupDisplayName to return just the provider key (e.g. telegram) since the normalized token was empty.

Fix: Updated the regex to use Unicode property escapes (\p{L}\p{N} with the u flag) so non-ASCII letters and digits are preserved:

/[^a-z0-9#@._+\-\p{L}\p{N}]+/gu

Tests added covering CJK (Chinese, Japanese), Cyrillic, Arabic, and mixed ASCII/Unicode group names.

Changed files

src/shared/string-normalization.test.ts (modified, +8/-0)
src/shared/string-normalization.ts (modified, +2/-1)

PR #58995: fix: preserve non-Latin characters in normalizeHyphenSlug and normalizeAtHashSlug

Repository: openclaw/openclaw
Author: Starhappysh
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/58995

Description (problem / solution / changelog)

Summary

Fix normalizeHyphenSlug regex /[^a-z0-9#@._+-]+/g which strips all non-ASCII characters, replacing it with Unicode-aware /[^\p{L}\p{N}#@._+-]+/gu
Apply the same fix to normalizeAtHashSlug (/[^a-z0-9-]+/g -> /[^\p{L}\p{N}-]+/gu)
Add test cases for CJK (Chinese, Japanese), Cyrillic, and mixed-script inputs

This preserves non-Latin group names (e.g. Telegram/WhatsApp groups named in Chinese, Japanese, Korean, Cyrillic, Arabic) so they display correctly in the Sessions UI instead of being reduced to just the provider key.

Fixes #58932

Test plan

Added unit tests for CJK characters in normalizeHyphenSlug
Added unit tests for CJK characters in normalizeAtHashSlug
Existing tests still pass (no behavioral change for ASCII inputs)
Verify group display names with non-Latin names in Sessions UI

🤖 Generated with Claude Code

Changed files

src/shared/string-normalization.test.ts (modified, +12/-0)
src/shared/string-normalization.ts (modified, +2/-2)

PR #59068: fix(shared): preserve unicode group labels in slug normalization

Repository: openclaw/openclaw
Author: koen666
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/59068

Description (problem / solution / changelog)

fix(shared): preserve Unicode group names in slug normalization

Group display names with non-Latin scripts were being stripped during slug normalization, which made session subtitles collapse to provider-only labels (for example, only telegram).

Fixes #58932

Summary

Problem: normalizeHyphenSlug and normalizeAtHashSlug used ASCII-only regex and removed CJK/Cyrillic/Arabic characters
Why it matters: group session labels become indistinguishable when names are non-Latin
What changed: switched to Unicode-aware character classes (\p{L} / \p{N}) and added regression tests for non-Latin scripts
What did NOT change (scope boundary): no changes to session routing, metadata pipeline, provider logic, or UI rendering

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #58932
Related # N/A
This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

Root cause: slug normalization regex only allowed [a-z0-9...], so Unicode letters/numbers were replaced and often collapsed away
Missing detection / guardrail: tests covered ASCII behavior but not multilingual group names
Prior context: issue #58932 reported session subtitle degradation for non-Latin group names
Why this regressed now: not a new regression; this appears to be a long-standing behavior gap

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/shared/string-normalization.test.ts
Scenario the test should lock in: CJK/Cyrillic/Arabic characters are preserved in slug normalization paths
Why this is the smallest reliable guardrail: the bug is in shared normalization helpers; direct unit tests pin exact behavior

User-visible / Behavior Changes

Group display names with non-Latin scripts are now preserved in normalized labels
Sessions list can distinguish multilingual groups on the same provider

Diagram (if applicable)

Before:
"技术讨论组" -> normalizeHyphenSlug -> "" -> fallback display "telegram"

After:
"技术讨论组" -> normalizeHyphenSlug -> "技术讨论组" -> display keeps group name

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Repro + Verification

Environment

OS: macOS
Runtime/container: Node 22+, pnpm workspace
Model/provider: N/A
Integration/channel (if any): N/A
Relevant config (redacted): N/A

Steps

Run pnpm test -- src/shared/string-normalization.test.ts
Verify Unicode test cases pass for CJK/Cyrillic/Arabic inputs

Expected

Non-Latin group names remain in normalized output

Actual

After fix, normalization preserves Unicode letters/numbers and tests pass

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

pnpm test -- src/shared/string-normalization.test.ts
# Test Files  1 passed (1)
# Tests       9 passed (9)

Human Verification (required)

Verified scenarios:
- normalizeHyphenSlug preserves Chinese/Japanese/Korean/Cyrillic/Arabic strings
- normalizeAtHashSlug preserves CJK strings with #/@ prefixes
- existing ASCII normalization expectations remain green
What you did NOT verify:
- full end-to-end session subtitle rendering in a live multi-channel environment

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No

Risks and Mitigations

Risk: broader Unicode acceptance may retain more characters than before in edge-case labels
- Mitigation: normalization still collapses unsupported separators and trims boundary punctuation; tests lock intended behavior

Changed files

src/shared/string-normalization.test.ts (modified, +30/-0)
src/shared/string-normalization.ts (modified, +2/-2)

Code Example

Affected: Any channel (Telegram, WhatsApp, Discord, etc.) where group names use non-Latin scripts (CJK, Cyrillic, Arabic, etc.)
Severity: Annoying (UI display issue, does not block messaging)
Frequency: Always — 100% reproducible for any non-ASCII group name
Consequence: Group sessions are indistinguishable in the Sessions UI when multiple groups exist on the same provider, as they all show the same provider key (e.g. "telegram")

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

Connect a Telegram or WhatsApp group that has a non-Latin name (e.g. Chinese "技术讨论组", Japanese "友達グループ")
Send a message in the group to trigger OpenClaw
Open Control UI → Sessions page
Observe the group session subtitle

Expected behavior

Group session display name should show the actual group name (e.g. "技术讨论组")

Actual behavior

OpenClaw version

2026.3.28

Operating system

all

Install method

No response

Model

pnpm install

Provider / routing chain

openclaw -> google

Additional provider/model setup details

NOT_ENOUGH_INFO

Logs, screenshots, and evidence

Affected: Any channel (Telegram, WhatsApp, Discord, etc.) where group names use non-Latin scripts (CJK, Cyrillic, Arabic, etc.)
Severity: Annoying (UI display issue, does not block messaging)
Frequency: Always — 100% reproducible for any non-ASCII group name
Consequence: Group sessions are indistinguishable in the Sessions UI when multiple groups exist on the same provider, as they all show the same provider key (e.g. "telegram")

Impact and severity

No response

Additional information

extent analysis

TL;DR

Update the normalizeHyphenSlug regex to a Unicode-aware pattern to preserve non-Latin characters in group names.

Guidance

Identify the normalizeHyphenSlug function in src/shared/string-normalization.ts and update the regex to /[^\p{L}\p{N}#@._+-]+/gu to match Unicode characters.
Alternatively, consider adding a fallback in deriveGroupSessionPatch (metadata.ts) to use ConversationLabel when slug normalization produces only the provider key.
Verify the fix by testing with group names containing non-Latin characters (e.g., Chinese, Japanese, Cyrillic, Arabic) and checking that the group session display name shows the actual group name.
Review the code path normalizeHyphenSlug → normalizeGroupLabel → buildGroupDisplayName → deriveGroupSessionPatch → session displayName to ensure the updated regex or fallback is correctly applied.

Example

// Updated normalizeHyphenSlug function
function normalizeHyphenSlug(str) {
  return str.replace(/[^\p{L}\p{N}#@._+-]+/gu, '-');
}

Notes

The suggested fix assumes that the Unicode-aware regex pattern will correctly preserve non-Latin characters in group names. However, additional testing may be necessary to ensure the fix works for all supported languages and scripts.

Recommendation

Apply the workaround by updating the normalizeHyphenSlug regex to a Unicode-aware pattern, as this is a more targeted and efficient solution than adding a fallback in deriveGroupSessionPatch.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Group session display name should show the actual group name (e.g. "技术讨论组")

#output truncation #response parsing #generation error #database connection #vector store

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.