claude-code - 💡(How to fix) Fix Edit tool corrupts multi-byte UTF-8 characters at replacement string boundary [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#46712Fetched 2026-04-12 13:35:02
View on GitHub
Comments
2
Participants
2
Timeline
7
Reactions
0
Author
Timeline (top)
labeled ×4commented ×2closed ×1

The Edit tool occasionally corrupts multi-byte UTF-8 characters (e.g., CJK characters) when they appear in the new_string parameter. The last multi-byte character in a string gets truncated at a byte boundary, producing replacement characters (��).

Error Message

  • Silently corrupts file content — no error is raised

Root Cause

The Edit tool occasionally corrupts multi-byte UTF-8 characters (e.g., CJK characters) when they appear in the new_string parameter. The last multi-byte character in a string gets truncated at a byte boundary, producing replacement characters (��).

Code Example

**進行中**

---

**進行��**
RAW_BUFFERClick to expand / collapse

Description

The Edit tool occasionally corrupts multi-byte UTF-8 characters (e.g., CJK characters) when they appear in the new_string parameter. The last multi-byte character in a string gets truncated at a byte boundary, producing replacement characters (��).

Steps to Reproduce

  1. Use the Edit tool to replace text in a file containing CJK (Chinese/Japanese/Korean) characters
  2. Include multi-byte UTF-8 characters in the new_string parameter
  3. The final multi-byte character in the replacement string may be truncated

Example

Expected new_string content:

**進行中**

Actual result written to file:

**進行��**

The character (U+4E2D, 3-byte UTF-8 sequence E4 B8 AD) was truncated — only the first byte(s) were written, producing the replacement character U+FFFD (��).

Environment

  • OS: macOS (Darwin 25.4.0)
  • Model: claude-opus-4-6 (1M context)
  • Tool: Edit (exact string replacement mode)

Impact

  • Silently corrupts file content — no error is raised
  • Particularly affects CJK languages where most characters are 3-byte UTF-8 sequences
  • Requires manual detection and a follow-up commit to fix

extent analysis

TL;DR

The issue can likely be resolved by ensuring proper handling of UTF-8 encoding in the Edit tool, specifically when processing multi-byte characters in the new_string parameter.

Guidance

  • Verify that the Edit tool is configured to support UTF-8 encoding and that the new_string parameter is being processed as a UTF-8 encoded string.
  • Check the Edit tool's documentation for any specific requirements or limitations related to handling multi-byte characters.
  • Test the Edit tool with different input strings containing multi-byte characters to determine if the issue is specific to certain characters or sequences.
  • Consider adding error handling or logging to detect and report any encoding-related issues during the editing process.

Example

No specific code snippet can be provided without more information about the Edit tool's implementation, but ensuring that strings are handled as UTF-8 encoded can be crucial:

# Hypothetical example of ensuring UTF-8 encoding in Python
new_string = "**進行中**"
new_string_encoded = new_string.encode('utf-8')

Notes

The provided information suggests a potential issue with the Edit tool's handling of UTF-8 encoding, but without more details about the tool's implementation, it's difficult to provide a definitive solution. The issue may be specific to the macOS environment or the claude-opus-4-6 model.

Recommendation

Apply a workaround by manually verifying and correcting the output of the Edit tool, especially when working with CJK characters, until a more permanent solution can be implemented to ensure proper UTF-8 encoding handling.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING