Code Example

// In Bun (Claude Code's runtime):
Bun.stringWidth("ٱللَّهِ");                          // → 7 (should be 4)
Bun.stringWidth("بِ",  { ambiguousIsNarrow: true });  // → 2 (should be 1)
Bun.stringWidth("لَّ", { ambiguousIsNarrow: true });  // → 3 (should be 1)
Bun.stringWidth("ِ",   { ambiguousIsNarrow: true });  // → 1 (should be 0)

---

for (const { segment } of new Intl.Segmenter(undefined, { granularity: "grapheme" }).segment(text)) {
  tokens.push({
    value: segment,
    width: Bun.stringWidth(segment, { ambiguousIsNarrow: true }),  // ← buggy
    styleId,
    hyperlink,
  });
}

---

const D = j.width;            // width Bun reported
if (D === 0) continue;        // skip true zero-width
const f = D >= 2;             // is this a "wide" cell?
// ... draw j.value into column A ...
A += f ? D : 1;               // advance A by D for wide chars, 1 otherwise

---

function graphemeWidth(grapheme) {
  // Width of a grapheme cluster = width of its first codepoint
  // (per Unicode TR#11). Combining marks don't add width.
  const cp = grapheme.codePointAt(0);
  if (cp === undefined) return 0;
  return wcwidth(cp);           // 0, 1, or 2
}

// then:
for (const { segment } of segmenter.segment(text)) {
  tokens.push({ value: segment, width: graphemeWidth(segment), ... });
}

Bug: Arabic (and other combining-mark scripts) render with gaps between every letter

Summary

Inside Claude Code, Arabic text that contains diacritics (harakat / tashkeel) — Quranic verses, vocalized prose, etc. — is rendered with visible whitespace between every letter, breaking contextual joining and making the text unreadable as Arabic. The same input rendered by plain cat in the same terminal looks correct.

The cause is upstream: Bun.stringWidth reports the wrong column width for any grapheme that contains a combining mark. Claude Code's renderer trusts that width and allocates extra empty cells per grapheme. The terminal then dutifully draws those empty cells.

Reproduction

// In Bun (Claude Code's runtime):
Bun.stringWidth("ٱللَّهِ");                          // → 7 (should be 4)
Bun.stringWidth("بِ",  { ambiguousIsNarrow: true });  // → 2 (should be 1)
Bun.stringWidth("لَّ", { ambiguousIsNarrow: true });  // → 3 (should be 1)
Bun.stringWidth("ِ",   { ambiguousIsNarrow: true });  // → 1 (should be 0)

Standard wcwidth (and string-width, wcwidth.js, unicode-eaw, etc.) all return 0 for non-spacing combining marks — that's per Unicode general category Mn / East Asian Width N. Bun's stringWidth is counting them as 1.

This is technically a Bun bug, but Claude Code can work around it without waiting for Bun to fix it.

Where it lands in Claude Code

In the bundled renderer (cli.js in the agent SDK, equivalent path in the compiled binary), the per-grapheme token build looks like:

for (const { segment } of new Intl.Segmenter(undefined, { granularity: "grapheme" }).segment(text)) {
  tokens.push({
    value: segment,
    width: Bun.stringWidth(segment, { ambiguousIsNarrow: true }),  // ← buggy
    styleId,
    hyperlink,
  });
}

The placement loop (jd1 in the minified binary) then iterates those tokens:

const D = j.width;            // width Bun reported
if (D === 0) continue;        // skip true zero-width
const f = D >= 2;             // is this a "wide" cell?
// ... draw j.value into column A ...
A += f ? D : 1;               // advance A by D for wide chars, 1 otherwise

For a grapheme like "لَّ" (lam + fatha + shadda, should be width 1), Bun returns 3. D === 3 so f = true, the cell is treated as a wide grapheme spanning 3 columns. The letter glyph draws in column A; columns A+1 and A+2 get empty placeholder cells. The next Arabic letter starts in column A+3.

Repeat that for every vocalized letter in a verse and you get the rendering in the screenshots: every letter visibly separated by empty cells.

(Note: scripts that don't lead with a base codepoint + combining marks aren't affected because their graphemes are single codepoints. That's why headings without harakat render correctly inside Claude Code while verses with full tashkeel don't.)

Affected scripts

Any combining-mark script:

Arabic with tashkeel (kasra / fatha / damma / sukun / shadda / superscript alef): U+064B–U+065F, U+0670, U+06D6–U+06ED
Hebrew with nikkud: U+05B0–U+05BD, U+05BF, U+05C1–U+05C5, U+05C7
Devanagari / Tamil / Telugu / Kannada / Malayalam / Bengali vowel signs and viramas
Thai / Lao / Khmer tone & vowel marks
Vietnamese when written with decomposed tone marks (NFD)
Korean Hangul written as decomposed jamos
Latin + combining diacritics (NFD-form é, ñ, etc.) — same bug, just usually not noticed because most Latin text is NFC

Proposed fix in Claude Code (does not require waiting for Bun)

Replace the per-grapheme width call with a wcwidth-style scan that respects combining marks. Conceptually:

function graphemeWidth(grapheme) {
  // Width of a grapheme cluster = width of its first codepoint
  // (per Unicode TR#11). Combining marks don't add width.
  const cp = grapheme.codePointAt(0);
  if (cp === undefined) return 0;
  return wcwidth(cp);           // 0, 1, or 2
}

// then:
for (const { segment } of segmenter.segment(text)) {
  tokens.push({ value: segment, width: graphemeWidth(segment), ... });
}

Any of these npm packages does this correctly already, and they're tiny:

string-width (3 kB) — most popular, well-tested, used by ink/chalk/etc.
wcwidth.js (2 kB) — direct port of Markus Kuhn's wcwidth
@xterm/headless's unicode service — the most thorough, includes Unicode 11 wide-char tables

Even an inline if (cp >= 0x0300 && cp <= 0x036F || cp >= 0x0590 && cp <= 0x05CF || cp >= 0x0610 && cp <= 0x06FF && isCombiningMark(cp) || ...) return 0 would close the issue for the most common scripts.

Why this doesn't get caught by `Bun.stringWidth` having `ambiguousIsNarrow: true`

ambiguousIsNarrow only affects East Asian ambiguous width characters — it doesn't touch combining marks at all. Combining marks are unambiguously zero-width per Unicode; Bun's table apparently doesn't encode that.

Reproducible across terminals

The same input bytes rendered by plain cat look correct in any standard terminal. The same bytes rendered by claude show the gaps in every terminal tested (Apple Terminal, Ghostty, iTerm2 — anything that draws what was written). That places the bug upstream of the terminal: Claude Code is emitting extra empty cells, and the terminals are correctly drawing them.

TL;DR: Bun.stringWidth is wrong for combining marks. Claude Code trusts it. Result: Arabic verses with tashkeel render with empty cells between every letter. Fix is one function swap in the renderer; doesn't need Bun to ship a fix first.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix Arabic with diacritics renders with gap between every letter (Bun.stringWidth treats combining marks as width 1) [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug: Arabic (and other combining-mark scripts) render with gaps between every letter

Summary

Reproduction

Where it lands in Claude Code

Affected scripts

Proposed fix in Claude Code (does not require waiting for Bun)

Why this doesn't get caught by `Bun.stringWidth` having `ambiguousIsNarrow: true`

Reproducible across terminals

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix Arabic with diacritics renders with gap between every letter (Bun.stringWidth treats combining marks as width 1) [2 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Code Example

Bug: Arabic (and other combining-mark scripts) render with gaps between every letter

Summary

Reproduction

Where it lands in Claude Code

Affected scripts

Proposed fix in Claude Code (does not require waiting for Bun)

Why this doesn't get caught by Bun.stringWidth having ambiguousIsNarrow: true

Reproducible across terminals

Still need to ship something?

RELATED_DISCOVERY

TRENDING

Why this doesn't get caught by `Bun.stringWidth` having `ambiguousIsNarrow: true`