claude-code - 💡(How to fix) Fix Streaming output corrupts CJK characters at UTF-8 chunk boundaries (displays ���) [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#45508Fetched 2026-04-09 08:03:47
View on GitHub
Comments
2
Participants
3
Timeline
7
Reactions
0
Timeline (top)
labeled ×5commented ×2

When Claude Code streams long Chinese (CJK) text output in the terminal, some characters are occasionally replaced with (U+FFFD replacement characters).

Example: 大厂 renders as ���厂

Root Cause

CJK characters occupy 3 bytes in UTF-8 (e.g., = E5 A4 A7). When the SSE streaming layer splits output into chunks and the chunk boundary falls in the middle of a multi-byte character, the terminal receives incomplete byte sequences and renders them as replacement characters.

Normal:    [...E5 A4 A7...] → 大
Truncated: [...E5] | [A4 A7...] → ���

Code Example

Normal:    [...E5 A4 A7...] → 大
Truncated: [...E5] | [A4 A7...] → ���
RAW_BUFFERClick to expand / collapse

Description

When Claude Code streams long Chinese (CJK) text output in the terminal, some characters are occasionally replaced with (U+FFFD replacement characters).

Example: 大厂 renders as ���厂

Root Cause Analysis

CJK characters occupy 3 bytes in UTF-8 (e.g., = E5 A4 A7). When the SSE streaming layer splits output into chunks and the chunk boundary falls in the middle of a multi-byte character, the terminal receives incomplete byte sequences and renders them as replacement characters.

Normal:    [...E5 A4 A7...] → 大
Truncated: [...E5] | [A4 A7...] → ���

Steps to Reproduce

  1. Use Claude Code CLI in a terminal (macOS, zsh)
  2. Ask Claude to generate a long response in Chinese (e.g., a multi-section research report with tables)
  3. Observe occasional characters in the streaming output

Expected Behavior

All CJK characters should render correctly during streaming output.

Actual Behavior

Random CJK characters are corrupted to ��� during streaming. The corruption is intermittent and depends on where chunk boundaries fall.

Environment

  • OS: macOS (Darwin 25.3.0, Apple Silicon)
  • Terminal: iTerm2 / Terminal.app / tmux
  • Claude Code: Latest version
  • Shell: zsh

Notes

  • Writing to files (via the Write tool) is NOT affected — the content is correct
  • Only terminal streaming display is affected
  • This appears to be a missing UTF-8 boundary alignment in the streaming output layer
  • The fix would be to buffer incomplete multi-byte sequences at chunk boundaries before flushing to stdout

Suggested Fix

Ensure the streaming output layer aligns chunk boundaries to complete UTF-8 character boundaries. When a chunk ends with an incomplete multi-byte sequence, buffer those trailing bytes and prepend them to the next chunk before writing to stdout.

extent analysis

TL;DR

Buffer incomplete multi-byte UTF-8 sequences at chunk boundaries to prevent corruption of CJK characters during terminal streaming output.

Guidance

  • Identify the chunk size used by the SSE streaming layer to understand how often chunk boundaries may fall within multi-byte characters.
  • Modify the streaming output layer to detect and buffer incomplete UTF-8 sequences at chunk boundaries.
  • Prepend buffered bytes to the next chunk before writing to stdout to ensure complete character rendering.
  • Test with various CJK character sequences and chunk sizes to verify the fix.

Example

def align_utf8_chunk_boundary(chunk):
    # Simplified example, actual implementation depends on the streaming layer
    if len(chunk) > 0 and (chunk[-1] & 0xC0) == 0x80:  # Incomplete UTF-8 sequence
        return chunk  # Buffer this chunk, do not write to stdout yet
    else:
        return b''  # No incomplete sequence, write to stdout

# Usage
buffered_bytes = b''
while streaming:
    chunk = get_next_chunk()
    incomplete_seq = align_utf8_chunk_boundary(chunk)
    if incomplete_seq:
        buffered_bytes += incomplete_seq
    else:
        if buffered_bytes:
            write_to_stdout(buffered_bytes + chunk)
            buffered_bytes = b''
        else:
            write_to_stdout(chunk)

Notes

The provided example is a simplified illustration and may require adjustments based on the actual implementation of the SSE streaming layer and the language used.

Recommendation

Apply workaround by modifying the streaming output layer to handle UTF-8 chunk boundary alignment, as this directly addresses the identified root cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING