openclaw - 💡(How to fix) Fix [Bug] stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers [1 participants]

openclaw2026-04-01 11:20:02

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#58969•Fetched 2026-04-08 02:30:31

View on GitHub

Comments

Participants

Timeline

Reactions

Author

nxajh

Participants

nxajh

The stripMarkdown() function in OpenClaw's core text runtime contains a regex bug that incorrectly removes underscores from text. This breaks all programming language code (C, C++, Python, etc.) sent through channels that call this function, such as Weixin channel.

Expected Behavior:
Underscores in text should only be removed when they are part of valid Markdown italic syntax (_text_). Programming language identifiers should be preserved.

Actual Behavior:
All underscores are removed from text, destroying programming language code:

some_variable_name → somevariablename
RATE_LIMITER_H → RATELIMITERH
client_id → clientid

Root Cause

Problematic regex in stripMarkdown():

result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");

Analysis:

This regex attempts to remove Markdown italic syntax: _text_
The non-greedy pattern (.+?) combined with lookahead/lookbehind assertions causes unexpected backtracking
For strings with multiple underscores, it matches first _ and last _ (not intended adjacent pair)
Global flag g causes repeated replacement, eventually deleting ALL underscores

Test Results:

Input	Output	Expected	Status
`some_variable_name`	`somevariablename`	`some_variable_name`	❌ Broken
`RATE_LIMITER_H`	`RATELIMITERH`	`RATE_LIMITER_H`	❌ Broken
`limiter_create`	`limiter_create`	`limiter_create`	✅ Works (edge case)
`size_t`	`size_t`	`size_t`	✅ Works (edge case)

Note: Short identifiers are sometimes preserved due to boundary conditions, but longer identifiers are consistently broken.

Code Example

result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");

---

RateLimiter* limiter_create(int max_requests, int window_ms);

---

RateLimiter* limitercreate(int maxrequests, int window_ms);

---

function stripMarkdown(text) {
  let result = text;
  result = result.replace(/\*\*(.+?)\*\*/g, "$1");
  result = result.replace(/__(.+?)__/g, "$1");
  result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
  result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");  // BUG
  result = result.replace(/~~(.+?)~~/g, "$1");
  result = result.replace(/^#{1,6}\s+(.+)$/gm, "$1");
  result = result.replace(/^>\s?(.*)$/gm, "$1");
  result = result.replace(/^[-*_]{3,}$/gm, "");
  result = result.replace(/`([^`]+)`/g, "$1");
  result = result.replace(/\n{3,}/g, "\n\n");
  return result.trim();
}

console.log(stripMarkdown("RATE_LIMITER_H"));  // Output: "RATELIMITERH"

---

// Only remove Markdown italic with word boundaries
result = result.replace(/\b_([a-zA-Z]+)_\b/g, "$1");

---

// Don't process content inside code blocks
const codeBlocks = [];
result = result.replace(/

---

**Option 3: Use a proper Markdown parser**

Replace regex-based approach with a proper Markdown AST parser that can distinguish between Markdown syntax and programming code.

## Environment

---

## Related Code

**Weixin channel usage:**

---

**Telegram channel (NOT affected):**

RAW_BUFFERClick to expand / collapse

Bug Report: stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers

Description

Expected Behavior:
Underscores in text should only be removed when they are part of valid Markdown italic syntax (_text_). Programming language identifiers should be preserved.

Actual Behavior:
All underscores are removed from text, destroying programming language code:

some_variable_name → somevariablename
RATE_LIMITER_H → RATELIMITERH
client_id → clientid

Affected Components

File: /home/user/.npm-global/lib/node_modules/openclaw/dist/text-runtime-B-kOpuLv.js
Function: stripMarkdown(text)
Channels Affected: Any channel that uses stripMarkdown() to convert Markdown to plain text (confirmed: openclaw-weixin)

Note: Telegram and WhatsApp channels are not affected because they use markdownToTelegramHtml() / markdownToWhatsApp() instead of stripMarkdown().

Root Cause

Problematic regex in stripMarkdown():

result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");

Analysis:

This regex attempts to remove Markdown italic syntax: _text_
The non-greedy pattern (.+?) combined with lookahead/lookbehind assertions causes unexpected backtracking
For strings with multiple underscores, it matches first _ and last _ (not intended adjacent pair)
Global flag g causes repeated replacement, eventually deleting ALL underscores

Test Results:

Input	Output	Expected	Status
`some_variable_name`	`somevariablename`	`some_variable_name`	❌ Broken
`RATE_LIMITER_H`	`RATELIMITERH`	`RATE_LIMITER_H`	❌ Broken
`limiter_create`	`limiter_create`	`limiter_create`	✅ Works (edge case)
`size_t`	`size_t`	`size_t`	✅ Works (edge case)

Note: Short identifiers are sometimes preserved due to boundary conditions, but longer identifiers are consistently broken.

Reproduction Steps

Send a message containing code with underscores through Weixin channel

Example C code:

RateLimiter* limiter_create(int max_requests, int window_ms);

Received message in Weixin shows:

RateLimiter* limitercreate(int maxrequests, int window_ms);

All underscores are deleted, making code invalid

Or reproduce programmatically:

function stripMarkdown(text) {
  let result = text;
  result = result.replace(/\*\*(.+?)\*\*/g, "$1");
  result = result.replace(/__(.+?)__/g, "$1");
  result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
  result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");  // BUG
  result = result.replace(/~~(.+?)~~/g, "$1");
  result = result.replace(/^#{1,6}\s+(.+)$/gm, "$1");
  result = result.replace(/^>\s?(.*)$/gm, "$1");
  result = result.replace(/^[-*_]{3,}$/gm, "");
  result = result.replace(/`([^`]+)`/g, "$1");
  result = result.replace(/\n{3,}/g, "\n\n");
  return result.trim();
}

console.log(stripMarkdown("RATE_LIMITER_H"));  // Output: "RATELIMITERH"

Impact

Code sharing impossible: Cannot share C/C++/Python code via Weixin channel
Technical discussions broken: Variable names, function names, constants are corrupted
Severity: High - Complete failure for programming language use cases

Suggested Fix

Option 1: Improve regex with word boundaries

// Only remove Markdown italic with word boundaries
result = result.replace(/\b_([a-zA-Z]+)_\b/g, "$1");

Option 2: Protect code blocks

// Don't process content inside code blocks
const codeBlocks = [];
result = result.replace(/```[\s\S]*?```/g, (match) => {
  codeBlocks.push(match);
  return `__CODE_BLOCK_${codeBlocks.length - 1}__`;
});
// ... process markdown ...
result = result.replace(/__CODE_BLOCK_(\d+)__/g, (_, idx) => codeBlocks[idx]);

Option 3: Use a proper Markdown parser

Replace regex-based approach with a proper Markdown AST parser that can distinguish between Markdown syntax and programming code.

Environment

OpenClaw: 2026.3.24 (cff6dc9)
Node.js: v22.22.2
OS: Oracle Linux Server 10.0
Plugin: @tencent-weixin/openclaw-weixin v2.1.1

Related Code

Weixin channel usage:

// /home/user/.openclaw/extensions/openclaw-weixin/src/messaging/send.ts
import { stripMarkdown } from "openclaw/plugin-sdk/text-runtime";

export function markdownToPlainText(text: string): string {
  let result = text;
  result = result.replace(/```[^\n]*\n?([\s\S]*?)```/g, (_, code: string) => code.trim());
  result = result.replace(/!\[[^\]]*\]\([^)]*\)/g, "");
  result = result.replace(/\[([^\]]+)\]\([^)]*\)/g, "$1");
  result = result.replace(/^\|[\s:|-]+\|$/gm, "");
  result = result.replace(/^\|(.+)\|$/gm, (_, inner: string) =>
    inner.split("|").map((cell) => cell.trim()).join("  "),
  );
  result = stripMarkdown(result);  // ❌ Problematic call
  return result;
}

Telegram channel (NOT affected):

// Telegram uses markdownToTelegramHtml() instead of stripMarkdown()
function renderTelegramHtmlText(text, options = {}) {
  if ((options.textMode ?? "markdown") === "html") return text;
  return markdownToTelegramHtml(text, { tableMode: options.tableMode });
}

Additional Context

Issue discovered when sharing C language code via Weixin
Telegram and WhatsApp channels work correctly (they preserve Markdown instead of stripping it)
This is a core function that may affect other channels that use stripMarkdown()

extent analysis

TL;DR

The stripMarkdown() function can be fixed by improving the regex to correctly handle Markdown italic syntax and preserve programming language identifiers.

Guidance

Improve the regex: Update the regex in stripMarkdown() to use word boundaries, as suggested in Option 1, to only remove Markdown italic syntax.
Protect code blocks: Consider implementing Option 2, which protects code blocks from being processed by the Markdown removal logic.
Verify the fix: Test the updated stripMarkdown() function with various programming language code snippets to ensure that underscores are preserved correctly.
Consider using a Markdown parser: Evaluate the feasibility of replacing the regex-based approach with a proper Markdown AST parser, as mentioned in Option 3, for more accurate handling of Markdown syntax.

Example

// Improved regex with word boundaries
result = result.replace(/\b_([a-zA-Z]+)_\b/g, "$1");

Notes

The provided Options 1, 2, and 3 offer different approaches to fixing the issue. The choice of solution depends on the specific requirements and constraints of the OpenClaw project.

Recommendation

Apply the improved regex with word boundaries (Option 1) as a temporary workaround, and consider upgrading to a proper Markdown parser (Option 3) for a more robust and long-term solution. This approach balances the need for a quick fix with the goal of improving the overall quality and accuracy of the Markdown processing logic.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#autograd error #model save/load #optimization #mixed precision #training loop

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug] stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug Report: stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers

Description

Affected Components

Root Cause

Reproduction Steps

Impact

Suggested Fix

Environment

Related Code

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug] stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug Report: stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers

Description

Affected Components

Root Cause

Reproduction Steps

Impact

Suggested Fix

Environment

Related Code

Additional Context

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING