openclaw - ✅(Solved) Fix [Bug]: normalizeHyphenSlug strips all CJK characters, breaking group display names for non-Latin languages [4 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58932Fetched 2026-04-08 02:31:03
View on GitHub
Comments
2
Participants
3
Timeline
9
Reactions
0
Timeline (top)
cross-referenced ×4commented ×2referenced ×2labeled ×1

normalizeHyphenSlug regex /[^a-z0-9#@._+-]+/g strips all non-ASCII characters, causing buildGroupDisplayName to return only the provider key for group names containing CJK/Cyrillic/Arabic characters.

Root Cause

Group session display name shows only the provider key (e.g. "telegram" or "whatsapp"). The group name is completely lost because normalizeHyphenSlug in src/shared/string-normalization.ts line 15 uses regex /[^a-z0-9#@._+-]+/g which strips all non-ASCII characters to empty string.

Fix Action

Fix / Workaround

Root cause: normalizeHyphenSlug in src/shared/string-normalization.ts line 15 regex /[^a-z0-9#@._+-]+/g only preserves ASCII Latin characters. All Unicode characters (Chinese, Japanese, Korean, Cyrillic, Arabic, etc.) are replaced with -, then cleaned to empty string. Code path: normalizeHyphenSlug → normalizeGroupLabel → buildGroupDisplayName → deriveGroupSessionPatch → session displayName Suggested fix: Use Unicode-aware regex /[^\p{L}\p{N}#@._+-]+/gu to preserve non-Latin scripts, or add a fallback in deriveGroupSessionPatch (metadata.ts) to use ConversationLabel when slug normalization produces only the provider key. This is not a regression — non-Latin group names have never worked correctly.

PR fix notes

PR #58942: fix: preserve non-Latin characters in normalizeHyphenSlug

Description (problem / solution / changelog)

normalizeHyphenSlug strips all non-ASCII characters via /[^a-z0-9#@._+-]+/g, causing buildGroupDisplayName to lose CJK/Cyrillic/Arabic group names and fall back to just the provider key.

Switch to Unicode-aware character classes (\p{L}, \p{N}) so non-Latin scripts are preserved in group display names.

Fixes #58932

Summary

Describe the problem and fix in 2–5 bullets:

If this PR fixes a plugin beta-release blocker, title it fix(<plugin-id>): beta blocker - <summary> and link the matching Beta blocker: <plugin-name> - <summary> issue labeled beta-blocker. Contributors cannot label PRs, so the title is the PR-side signal for maintainers and automation.

  • Problem:
  • Why it matters:
  • What changed:
  • What did NOT change (scope boundary):

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #
  • Related #
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write N/A. If the cause is unclear, write Unknown.

  • Root cause:
  • Missing detection / guardrail:
  • Prior context (git blame, prior PR, issue, or refactor if known):
  • Why this regressed now:
  • If unknown, what was ruled out:

Regression Test Plan (if applicable)

For bug fixes or regressions, name the smallest reliable test coverage that should have caught this. Otherwise write N/A.

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file:
  • Scenario the test should lock in:
  • Why this is the smallest reliable guardrail:
  • Existing test that already covers this (if any):
  • If no new test is added, why not:

User-visible / Behavior Changes

List user-visible changes (including defaults/config).
If none, write None.

Diagram (if applicable)

For UI changes or non-trivial logic flows, include a small ASCII diagram reviewers can scan quickly. Otherwise write N/A.

Before:
[user action] -> [old state]

After:
[user action] -> [new state] -> [result]

Security Impact (required)

  • New permissions/capabilities? (Yes/No)
  • Secrets/tokens handling changed? (Yes/No)
  • New/changed network calls? (Yes/No)
  • Command/tool execution surface changed? (Yes/No)
  • Data access scope changed? (Yes/No)
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS:
  • Runtime/container:
  • Model/provider:
  • Integration/channel (if any):
  • Relevant config (redacted):

Steps

Expected

Actual

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

What you personally verified (not just CI), and how:

  • Verified scenarios:
  • Edge cases checked:
  • What you did not verify:

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

If a bot review conversation is addressed by this PR, resolve that conversation yourself. Do not leave bot review conversation cleanup for maintainers.

Compatibility / Migration

  • Backward compatible? (Yes/No)
  • Config/env changes? (Yes/No)
  • Migration needed? (Yes/No)
  • If yes, exact upgrade steps:

Risks and Mitigations

List only real risks for this PR. Add/remove entries as needed. If none, write None.

  • Risk:
    • Mitigation:

Changed files

  • src/shared/string-normalization.test.ts (modified, +13/-0)
  • src/shared/string-normalization.ts (modified, +1/-1)

PR #58973: Fix: preserve non-ASCII group names in normalizeHyphenSlug (Resolves #58932)

Description (problem / solution / changelog)

Fixes #58932.

normalizeHyphenSlug used /[^a-z0-9#@._+-]+/g which stripped all non-ASCII characters (CJK, Cyrillic, Arabic, etc.) to empty string. For groups with non-Latin names (e.g. Telegram "技术讨论组"), this caused buildGroupDisplayName to return just the provider key (e.g. telegram) since the normalized token was empty.

Fix: Updated the regex to use Unicode property escapes (\p{L}\p{N} with the u flag) so non-ASCII letters and digits are preserved:

/[^a-z0-9#@._+\-\p{L}\p{N}]+/gu

Tests added covering CJK (Chinese, Japanese), Cyrillic, Arabic, and mixed ASCII/Unicode group names.

Changed files

  • src/shared/string-normalization.test.ts (modified, +8/-0)
  • src/shared/string-normalization.ts (modified, +2/-1)

PR #58995: fix: preserve non-Latin characters in normalizeHyphenSlug and normalizeAtHashSlug

Description (problem / solution / changelog)

Summary

  • Fix normalizeHyphenSlug regex /[^a-z0-9#@._+-]+/g which strips all non-ASCII characters, replacing it with Unicode-aware /[^\p{L}\p{N}#@._+-]+/gu
  • Apply the same fix to normalizeAtHashSlug (/[^a-z0-9-]+/g -> /[^\p{L}\p{N}-]+/gu)
  • Add test cases for CJK (Chinese, Japanese), Cyrillic, and mixed-script inputs

This preserves non-Latin group names (e.g. Telegram/WhatsApp groups named in Chinese, Japanese, Korean, Cyrillic, Arabic) so they display correctly in the Sessions UI instead of being reduced to just the provider key.

Fixes #58932

Test plan

  • Added unit tests for CJK characters in normalizeHyphenSlug
  • Added unit tests for CJK characters in normalizeAtHashSlug
  • Existing tests still pass (no behavioral change for ASCII inputs)
  • Verify group display names with non-Latin names in Sessions UI

🤖 Generated with Claude Code

Changed files

  • src/shared/string-normalization.test.ts (modified, +12/-0)
  • src/shared/string-normalization.ts (modified, +2/-2)

PR #59068: fix(shared): preserve unicode group labels in slug normalization

Description (problem / solution / changelog)

fix(shared): preserve Unicode group names in slug normalization

Group display names with non-Latin scripts were being stripped during slug normalization, which made session subtitles collapse to provider-only labels (for example, only telegram).

Fixes #58932

Summary

  • Problem: normalizeHyphenSlug and normalizeAtHashSlug used ASCII-only regex and removed CJK/Cyrillic/Arabic characters
  • Why it matters: group session labels become indistinguishable when names are non-Latin
  • What changed: switched to Unicode-aware character classes (\p{L} / \p{N}) and added regression tests for non-Latin scripts
  • What did NOT change (scope boundary): no changes to session routing, metadata pipeline, provider logic, or UI rendering

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #58932
  • Related # N/A
  • This PR fixes a bug or regression

Root Cause / Regression History (if applicable)

  • Root cause: slug normalization regex only allowed [a-z0-9...], so Unicode letters/numbers were replaced and often collapsed away
  • Missing detection / guardrail: tests covered ASCII behavior but not multilingual group names
  • Prior context: issue #58932 reported session subtitle degradation for non-Latin group names
  • Why this regressed now: not a new regression; this appears to be a long-standing behavior gap

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/shared/string-normalization.test.ts
  • Scenario the test should lock in: CJK/Cyrillic/Arabic characters are preserved in slug normalization paths
  • Why this is the smallest reliable guardrail: the bug is in shared normalization helpers; direct unit tests pin exact behavior

User-visible / Behavior Changes

  • Group display names with non-Latin scripts are now preserved in normalized labels
  • Sessions list can distinguish multilingual groups on the same provider

Diagram (if applicable)

Before:
"技术讨论组" -> normalizeHyphenSlug -> "" -> fallback display "telegram"

After:
"技术讨论组" -> normalizeHyphenSlug -> "技术讨论组" -> display keeps group name

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: Node 22+, pnpm workspace
  • Model/provider: N/A
  • Integration/channel (if any): N/A
  • Relevant config (redacted): N/A

Steps

  1. Run pnpm test -- src/shared/string-normalization.test.ts
  2. Verify Unicode test cases pass for CJK/Cyrillic/Arabic inputs

Expected

  • Non-Latin group names remain in normalized output

Actual

  • After fix, normalization preserves Unicode letters/numbers and tests pass

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)
pnpm test -- src/shared/string-normalization.test.ts
# Test Files  1 passed (1)
# Tests       9 passed (9)

Human Verification (required)

  • Verified scenarios:
    • normalizeHyphenSlug preserves Chinese/Japanese/Korean/Cyrillic/Arabic strings
    • normalizeAtHashSlug preserves CJK strings with #/@ prefixes
    • existing ASCII normalization expectations remain green
  • What you did NOT verify:
    • full end-to-end session subtitle rendering in a live multi-channel environment

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: broader Unicode acceptance may retain more characters than before in edge-case labels
    • Mitigation: normalization still collapses unsupported separators and trims boundary punctuation; tests lock intended behavior

Changed files

  • src/shared/string-normalization.test.ts (modified, +30/-0)
  • src/shared/string-normalization.ts (modified, +2/-2)

Code Example

Affected: Any channel (Telegram, WhatsApp, Discord, etc.) where group names use non-Latin scripts (CJK, Cyrillic, Arabic, etc.)
Severity: Annoying (UI display issue, does not block messaging)
Frequency: Always100% reproducible for any non-ASCII group name
Consequence: Group sessions are indistinguishable in the Sessions UI when multiple groups exist on the same provider, as they all show the same provider key (e.g. "telegram")
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

normalizeHyphenSlug regex /[^a-z0-9#@._+-]+/g strips all non-ASCII characters, causing buildGroupDisplayName to return only the provider key for group names containing CJK/Cyrillic/Arabic characters.

Steps to reproduce

  1. Connect a Telegram or WhatsApp group that has a non-Latin name (e.g. Chinese "技术讨论组", Japanese "友達グループ")
  2. Send a message in the group to trigger OpenClaw
  3. Open Control UI → Sessions page
  4. Observe the group session subtitle

Expected behavior

Group session display name should show the actual group name (e.g. "技术讨论组")

Actual behavior

Group session display name shows only the provider key (e.g. "telegram" or "whatsapp"). The group name is completely lost because normalizeHyphenSlug in src/shared/string-normalization.ts line 15 uses regex /[^a-z0-9#@._+-]+/g which strips all non-ASCII characters to empty string.

OpenClaw version

2026.3.28

Operating system

all

Install method

No response

Model

pnpm install

Provider / routing chain

openclaw -> google

Additional provider/model setup details

NOT_ENOUGH_INFO

Logs, screenshots, and evidence

Affected: Any channel (Telegram, WhatsApp, Discord, etc.) where group names use non-Latin scripts (CJK, Cyrillic, Arabic, etc.)
Severity: Annoying (UI display issue, does not block messaging)
Frequency: Always — 100% reproducible for any non-ASCII group name
Consequence: Group sessions are indistinguishable in the Sessions UI when multiple groups exist on the same provider, as they all show the same provider key (e.g. "telegram")

Impact and severity

No response

Additional information

Root cause: normalizeHyphenSlug in src/shared/string-normalization.ts line 15 regex /[^a-z0-9#@._+-]+/g only preserves ASCII Latin characters. All Unicode characters (Chinese, Japanese, Korean, Cyrillic, Arabic, etc.) are replaced with -, then cleaned to empty string. Code path: normalizeHyphenSlug → normalizeGroupLabel → buildGroupDisplayName → deriveGroupSessionPatch → session displayName Suggested fix: Use Unicode-aware regex /[^\p{L}\p{N}#@._+-]+/gu to preserve non-Latin scripts, or add a fallback in deriveGroupSessionPatch (metadata.ts) to use ConversationLabel when slug normalization produces only the provider key. This is not a regression — non-Latin group names have never worked correctly.

extent analysis

TL;DR

Update the normalizeHyphenSlug regex to a Unicode-aware pattern to preserve non-Latin characters in group names.

Guidance

  • Identify the normalizeHyphenSlug function in src/shared/string-normalization.ts and update the regex to /[^\p{L}\p{N}#@._+-]+/gu to match Unicode characters.
  • Alternatively, consider adding a fallback in deriveGroupSessionPatch (metadata.ts) to use ConversationLabel when slug normalization produces only the provider key.
  • Verify the fix by testing with group names containing non-Latin characters (e.g., Chinese, Japanese, Cyrillic, Arabic) and checking that the group session display name shows the actual group name.
  • Review the code path normalizeHyphenSlug → normalizeGroupLabel → buildGroupDisplayName → deriveGroupSessionPatch → session displayName to ensure the updated regex or fallback is correctly applied.

Example

// Updated normalizeHyphenSlug function
function normalizeHyphenSlug(str) {
  return str.replace(/[^\p{L}\p{N}#@._+-]+/gu, '-');
}

Notes

The suggested fix assumes that the Unicode-aware regex pattern will correctly preserve non-Latin characters in group names. However, additional testing may be necessary to ensure the fix works for all supported languages and scripts.

Recommendation

Apply the workaround by updating the normalizeHyphenSlug regex to a Unicode-aware pattern, as this is a more targeted and efficient solution than adding a fallback in deriveGroupSessionPatch.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Group session display name should show the actual group name (e.g. "技术讨论组")

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: normalizeHyphenSlug strips all CJK characters, breaking group display names for non-Latin languages [4 pull requests, 2 comments, 3 participants]