claude-code - 💡(How to fix) Fix Arabic with diacritics renders with gap between every letter (Bun.stringWidth treats combining marks as width 1) [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#60701Fetched 2026-05-20 03:51:44
View on GitHub
Comments
2
Participants
2
Timeline
5
Reactions
0
Author
Timeline (top)
labeled ×3commented ×2

Inside Claude Code, Arabic text that contains diacritics (harakat / tashkeel) — Quranic verses, vocalized prose, etc. — is rendered with visible whitespace between every letter, breaking contextual joining and making the text unreadable as Arabic. The same input rendered by plain cat in the same terminal looks correct.

The cause is upstream: Bun.stringWidth reports the wrong column width for any grapheme that contains a combining mark. Claude Code's renderer trusts that width and allocates extra empty cells per grapheme. The terminal then dutifully draws those empty cells.

Root Cause

(Note: scripts that don't lead with a base codepoint + combining marks aren't affected because their graphemes are single codepoints. That's why headings without harakat render correctly inside Claude Code while verses with full tashkeel don't.)

Code Example

// In Bun (Claude Code's runtime):
Bun.stringWidth("ٱللَّهِ");                          // → 7 (should be 4)
Bun.stringWidth("بِ",  { ambiguousIsNarrow: true });  // → 2 (should be 1)
Bun.stringWidth("لَّ", { ambiguousIsNarrow: true });  // → 3 (should be 1)
Bun.stringWidth("ِ",   { ambiguousIsNarrow: true });  // → 1 (should be 0)

---

for (const { segment } of new Intl.Segmenter(undefined, { granularity: "grapheme" }).segment(text)) {
  tokens.push({
    value: segment,
    width: Bun.stringWidth(segment, { ambiguousIsNarrow: true }),  // ← buggy
    styleId,
    hyperlink,
  });
}

---

const D = j.width;            // width Bun reported
if (D === 0) continue;        // skip true zero-width
const f = D >= 2;             // is this a "wide" cell?
// ... draw j.value into column A ...
A += f ? D : 1;               // advance A by D for wide chars, 1 otherwise

---

function graphemeWidth(grapheme) {
  // Width of a grapheme cluster = width of its first codepoint
  // (per Unicode TR#11). Combining marks don't add width.
  const cp = grapheme.codePointAt(0);
  if (cp === undefined) return 0;
  return wcwidth(cp);           // 0, 1, or 2
}

// then:
for (const { segment } of segmenter.segment(text)) {
  tokens.push({ value: segment, width: graphemeWidth(segment), ... });
}
RAW_BUFFERClick to expand / collapse

Bug: Arabic (and other combining-mark scripts) render with gaps between every letter

Summary

Inside Claude Code, Arabic text that contains diacritics (harakat / tashkeel) — Quranic verses, vocalized prose, etc. — is rendered with visible whitespace between every letter, breaking contextual joining and making the text unreadable as Arabic. The same input rendered by plain cat in the same terminal looks correct.

The cause is upstream: Bun.stringWidth reports the wrong column width for any grapheme that contains a combining mark. Claude Code's renderer trusts that width and allocates extra empty cells per grapheme. The terminal then dutifully draws those empty cells.

Reproduction

// In Bun (Claude Code's runtime):
Bun.stringWidth("ٱللَّهِ");                          // → 7 (should be 4)
Bun.stringWidth("بِ",  { ambiguousIsNarrow: true });  // → 2 (should be 1)
Bun.stringWidth("لَّ", { ambiguousIsNarrow: true });  // → 3 (should be 1)
Bun.stringWidth("ِ",   { ambiguousIsNarrow: true });  // → 1 (should be 0)

Standard wcwidth (and string-width, wcwidth.js, unicode-eaw, etc.) all return 0 for non-spacing combining marks — that's per Unicode general category Mn / East Asian Width N. Bun's stringWidth is counting them as 1.

This is technically a Bun bug, but Claude Code can work around it without waiting for Bun to fix it.

Where it lands in Claude Code

In the bundled renderer (cli.js in the agent SDK, equivalent path in the compiled binary), the per-grapheme token build looks like:

for (const { segment } of new Intl.Segmenter(undefined, { granularity: "grapheme" }).segment(text)) {
  tokens.push({
    value: segment,
    width: Bun.stringWidth(segment, { ambiguousIsNarrow: true }),  // ← buggy
    styleId,
    hyperlink,
  });
}

The placement loop (jd1 in the minified binary) then iterates those tokens:

const D = j.width;            // width Bun reported
if (D === 0) continue;        // skip true zero-width
const f = D >= 2;             // is this a "wide" cell?
// ... draw j.value into column A ...
A += f ? D : 1;               // advance A by D for wide chars, 1 otherwise

For a grapheme like "لَّ" (lam + fatha + shadda, should be width 1), Bun returns 3. D === 3 so f = true, the cell is treated as a wide grapheme spanning 3 columns. The letter glyph draws in column A; columns A+1 and A+2 get empty placeholder cells. The next Arabic letter starts in column A+3.

Repeat that for every vocalized letter in a verse and you get the rendering in the screenshots: every letter visibly separated by empty cells.

(Note: scripts that don't lead with a base codepoint + combining marks aren't affected because their graphemes are single codepoints. That's why headings without harakat render correctly inside Claude Code while verses with full tashkeel don't.)

Affected scripts

Any combining-mark script:

  • Arabic with tashkeel (kasra / fatha / damma / sukun / shadda / superscript alef): U+064B–U+065F, U+0670, U+06D6–U+06ED
  • Hebrew with nikkud: U+05B0–U+05BD, U+05BF, U+05C1–U+05C5, U+05C7
  • Devanagari / Tamil / Telugu / Kannada / Malayalam / Bengali vowel signs and viramas
  • Thai / Lao / Khmer tone & vowel marks
  • Vietnamese when written with decomposed tone marks (NFD)
  • Korean Hangul written as decomposed jamos
  • Latin + combining diacritics (NFD-form é, ñ, etc.) — same bug, just usually not noticed because most Latin text is NFC

Proposed fix in Claude Code (does not require waiting for Bun)

Replace the per-grapheme width call with a wcwidth-style scan that respects combining marks. Conceptually:

function graphemeWidth(grapheme) {
  // Width of a grapheme cluster = width of its first codepoint
  // (per Unicode TR#11). Combining marks don't add width.
  const cp = grapheme.codePointAt(0);
  if (cp === undefined) return 0;
  return wcwidth(cp);           // 0, 1, or 2
}

// then:
for (const { segment } of segmenter.segment(text)) {
  tokens.push({ value: segment, width: graphemeWidth(segment), ... });
}

Any of these npm packages does this correctly already, and they're tiny:

Even an inline if (cp >= 0x0300 && cp <= 0x036F || cp >= 0x0590 && cp <= 0x05CF || cp >= 0x0610 && cp <= 0x06FF && isCombiningMark(cp) || ...) return 0 would close the issue for the most common scripts.

Why this doesn't get caught by Bun.stringWidth having ambiguousIsNarrow: true

ambiguousIsNarrow only affects East Asian ambiguous width characters — it doesn't touch combining marks at all. Combining marks are unambiguously zero-width per Unicode; Bun's table apparently doesn't encode that.

Reproducible across terminals

The same input bytes rendered by plain cat look correct in any standard terminal. The same bytes rendered by claude show the gaps in every terminal tested (Apple Terminal, Ghostty, iTerm2 — anything that draws what was written). That places the bug upstream of the terminal: Claude Code is emitting extra empty cells, and the terminals are correctly drawing them.


TL;DR: Bun.stringWidth is wrong for combining marks. Claude Code trusts it. Result: Arabic verses with tashkeel render with empty cells between every letter. Fix is one function swap in the renderer; doesn't need Bun to ship a fix first.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix Arabic with diacritics renders with gap between every letter (Bun.stringWidth treats combining marks as width 1) [2 comments, 2 participants]