openclaw - 💡(How to fix) Fix [Bug] stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58969Fetched 2026-04-08 02:30:31
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

The stripMarkdown() function in OpenClaw's core text runtime contains a regex bug that incorrectly removes underscores from text. This breaks all programming language code (C, C++, Python, etc.) sent through channels that call this function, such as Weixin channel.

Expected Behavior:
Underscores in text should only be removed when they are part of valid Markdown italic syntax (_text_). Programming language identifiers should be preserved.

Actual Behavior:
All underscores are removed from text, destroying programming language code:

  • some_variable_namesomevariablename
  • RATE_LIMITER_HRATELIMITERH
  • client_idclientid

Root Cause

Problematic regex in stripMarkdown():

result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");

Analysis:

  • This regex attempts to remove Markdown italic syntax: _text_
  • The non-greedy pattern (.+?) combined with lookahead/lookbehind assertions causes unexpected backtracking
  • For strings with multiple underscores, it matches first _ and last _ (not intended adjacent pair)
  • Global flag g causes repeated replacement, eventually deleting ALL underscores

Test Results:

InputOutputExpectedStatus
some_variable_namesomevariablenamesome_variable_name❌ Broken
RATE_LIMITER_HRATELIMITERHRATE_LIMITER_H❌ Broken
limiter_createlimiter_createlimiter_create✅ Works (edge case)
size_tsize_tsize_t✅ Works (edge case)

Note: Short identifiers are sometimes preserved due to boundary conditions, but longer identifiers are consistently broken.

Code Example

result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");

---

RateLimiter* limiter_create(int max_requests, int window_ms);

---

RateLimiter* limitercreate(int maxrequests, int window_ms);

---

function stripMarkdown(text) {
  let result = text;
  result = result.replace(/\*\*(.+?)\*\*/g, "$1");
  result = result.replace(/__(.+?)__/g, "$1");
  result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
  result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");  // BUG
  result = result.replace(/~~(.+?)~~/g, "$1");
  result = result.replace(/^#{1,6}\s+(.+)$/gm, "$1");
  result = result.replace(/^>\s?(.*)$/gm, "$1");
  result = result.replace(/^[-*_]{3,}$/gm, "");
  result = result.replace(/`([^`]+)`/g, "$1");
  result = result.replace(/\n{3,}/g, "\n\n");
  return result.trim();
}

console.log(stripMarkdown("RATE_LIMITER_H"));  // Output: "RATELIMITERH"

---

// Only remove Markdown italic with word boundaries
result = result.replace(/\b_([a-zA-Z]+)_\b/g, "$1");

---

// Don't process content inside code blocks
const codeBlocks = [];
result = result.replace(/

---

**Option 3: Use a proper Markdown parser**

Replace regex-based approach with a proper Markdown AST parser that can distinguish between Markdown syntax and programming code.

## Environment

---

## Related Code

**Weixin channel usage:**

---

**Telegram channel (NOT affected):**
RAW_BUFFERClick to expand / collapse

Bug Report: stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers

Description

The stripMarkdown() function in OpenClaw's core text runtime contains a regex bug that incorrectly removes underscores from text. This breaks all programming language code (C, C++, Python, etc.) sent through channels that call this function, such as Weixin channel.

Expected Behavior:
Underscores in text should only be removed when they are part of valid Markdown italic syntax (_text_). Programming language identifiers should be preserved.

Actual Behavior:
All underscores are removed from text, destroying programming language code:

  • some_variable_namesomevariablename
  • RATE_LIMITER_HRATELIMITERH
  • client_idclientid

Affected Components

  • File: /home/user/.npm-global/lib/node_modules/openclaw/dist/text-runtime-B-kOpuLv.js
  • Function: stripMarkdown(text)
  • Channels Affected: Any channel that uses stripMarkdown() to convert Markdown to plain text (confirmed: openclaw-weixin)

Note: Telegram and WhatsApp channels are not affected because they use markdownToTelegramHtml() / markdownToWhatsApp() instead of stripMarkdown().

Root Cause

Problematic regex in stripMarkdown():

result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");

Analysis:

  • This regex attempts to remove Markdown italic syntax: _text_
  • The non-greedy pattern (.+?) combined with lookahead/lookbehind assertions causes unexpected backtracking
  • For strings with multiple underscores, it matches first _ and last _ (not intended adjacent pair)
  • Global flag g causes repeated replacement, eventually deleting ALL underscores

Test Results:

InputOutputExpectedStatus
some_variable_namesomevariablenamesome_variable_name❌ Broken
RATE_LIMITER_HRATELIMITERHRATE_LIMITER_H❌ Broken
limiter_createlimiter_createlimiter_create✅ Works (edge case)
size_tsize_tsize_t✅ Works (edge case)

Note: Short identifiers are sometimes preserved due to boundary conditions, but longer identifiers are consistently broken.

Reproduction Steps

  1. Send a message containing code with underscores through Weixin channel
  2. Example C code:
    RateLimiter* limiter_create(int max_requests, int window_ms);
  3. Received message in Weixin shows:
    RateLimiter* limitercreate(int maxrequests, int window_ms);
  4. All underscores are deleted, making code invalid

Or reproduce programmatically:

function stripMarkdown(text) {
  let result = text;
  result = result.replace(/\*\*(.+?)\*\*/g, "$1");
  result = result.replace(/__(.+?)__/g, "$1");
  result = result.replace(/(?<!\*)\*(?!\*)(.+?)(?<!\*)\*(?!\*)/g, "$1");
  result = result.replace(/(?<!_)_(?!_)(.+?)(?<!_)_(?!_)/g, "$1");  // BUG
  result = result.replace(/~~(.+?)~~/g, "$1");
  result = result.replace(/^#{1,6}\s+(.+)$/gm, "$1");
  result = result.replace(/^>\s?(.*)$/gm, "$1");
  result = result.replace(/^[-*_]{3,}$/gm, "");
  result = result.replace(/`([^`]+)`/g, "$1");
  result = result.replace(/\n{3,}/g, "\n\n");
  return result.trim();
}

console.log(stripMarkdown("RATE_LIMITER_H"));  // Output: "RATELIMITERH"

Impact

  • Code sharing impossible: Cannot share C/C++/Python code via Weixin channel
  • Technical discussions broken: Variable names, function names, constants are corrupted
  • Severity: High - Complete failure for programming language use cases

Suggested Fix

Option 1: Improve regex with word boundaries

// Only remove Markdown italic with word boundaries
result = result.replace(/\b_([a-zA-Z]+)_\b/g, "$1");

Option 2: Protect code blocks

// Don't process content inside code blocks
const codeBlocks = [];
result = result.replace(/```[\s\S]*?```/g, (match) => {
  codeBlocks.push(match);
  return `__CODE_BLOCK_${codeBlocks.length - 1}__`;
});
// ... process markdown ...
result = result.replace(/__CODE_BLOCK_(\d+)__/g, (_, idx) => codeBlocks[idx]);

Option 3: Use a proper Markdown parser

Replace regex-based approach with a proper Markdown AST parser that can distinguish between Markdown syntax and programming code.

Environment

OpenClaw: 2026.3.24 (cff6dc9)
Node.js: v22.22.2
OS: Oracle Linux Server 10.0
Plugin: @tencent-weixin/openclaw-weixin v2.1.1

Related Code

Weixin channel usage:

// /home/user/.openclaw/extensions/openclaw-weixin/src/messaging/send.ts
import { stripMarkdown } from "openclaw/plugin-sdk/text-runtime";

export function markdownToPlainText(text: string): string {
  let result = text;
  result = result.replace(/```[^\n]*\n?([\s\S]*?)```/g, (_, code: string) => code.trim());
  result = result.replace(/!\[[^\]]*\]\([^)]*\)/g, "");
  result = result.replace(/\[([^\]]+)\]\([^)]*\)/g, "$1");
  result = result.replace(/^\|[\s:|-]+\|$/gm, "");
  result = result.replace(/^\|(.+)\|$/gm, (_, inner: string) =>
    inner.split("|").map((cell) => cell.trim()).join("  "),
  );
  result = stripMarkdown(result);  // ❌ Problematic call
  return result;
}

Telegram channel (NOT affected):

// Telegram uses markdownToTelegramHtml() instead of stripMarkdown()
function renderTelegramHtmlText(text, options = {}) {
  if ((options.textMode ?? "markdown") === "html") return text;
  return markdownToTelegramHtml(text, { tableMode: options.tableMode });
}

Additional Context

  • Issue discovered when sharing C language code via Weixin
  • Telegram and WhatsApp channels work correctly (they preserve Markdown instead of stripping it)
  • This is a core function that may affect other channels that use stripMarkdown()

extent analysis

TL;DR

The stripMarkdown() function can be fixed by improving the regex to correctly handle Markdown italic syntax and preserve programming language identifiers.

Guidance

  1. Improve the regex: Update the regex in stripMarkdown() to use word boundaries, as suggested in Option 1, to only remove Markdown italic syntax.
  2. Protect code blocks: Consider implementing Option 2, which protects code blocks from being processed by the Markdown removal logic.
  3. Verify the fix: Test the updated stripMarkdown() function with various programming language code snippets to ensure that underscores are preserved correctly.
  4. Consider using a Markdown parser: Evaluate the feasibility of replacing the regex-based approach with a proper Markdown AST parser, as mentioned in Option 3, for more accurate handling of Markdown syntax.

Example

// Improved regex with word boundaries
result = result.replace(/\b_([a-zA-Z]+)_\b/g, "$1");

Notes

The provided Options 1, 2, and 3 offer different approaches to fixing the issue. The choice of solution depends on the specific requirements and constraints of the OpenClaw project.

Recommendation

Apply the improved regex with word boundaries (Option 1) as a temporary workaround, and consider upgrading to a proper Markdown parser (Option 3) for a more robust and long-term solution. This approach balances the need for a quick fix with the goal of improving the overall quality and accuracy of the Markdown processing logic.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug] stripMarkdown() function incorrectly deletes underscores, breaking programming language identifiers [1 participants]