claude-code - 💡(How to fix) Fix [BUG] Model drifts full-width CJK punctuation to half-width in Edit's old_string, causing silent "String to replace not found" [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
anthropics/claude-code#52482Fetched 2026-04-24 06:06:01
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Timeline (top)
labeled ×4commented ×1mentioned ×1subscribed ×1

Error Message

Error Messages/Logs

Root Cause

When editing a file that contains full-width CJK punctuation (e.g. , , , , , ), the Edit tool frequently fails with String to replace not found — even though the content visibly matches. The root cause is not the Edit tool itself, which byte-matches correctly. It is the model silently generating half-width ASCII punctuation ((, ), ,, ., :, ;) when it intended full-width, producing an old_string whose bytes do not match the file.

Fix Action

Fix / Workaround

  • Verification that Edit is innocent: I ran a small experiment with short test strings (全角(カッコ)テスト). Edit matched byte-for-byte when old_string contained full-width (), and correctly failed when old_string used half-width () while the file had full-width. The tool does no normalization — the issue is entirely upstream in the model output.
  • Workaround: For bulk updates I gave up on having the model re-emit Japanese-laden old_string verbatim. Instead I wrote a small Python script that does line-level regex replacement (e.g. ^| item \|[^\n]*$) with only the new content. This bypasses the drift since the original line never needs to be regenerated.
  • Observed affected punctuation (same drift direction as #50975):
    • U+FF08 ↔ ( U+0028
    • U+FF09 ↔ ) U+0029
    • U+FF1A ↔ : U+003A
    • Likely also 、。;?! based on priors, though I only verified (): directly.
  • Unaffected: 「」『』, per #50975's observation.
  • User-visible impact: Any workflow that mixes Japanese (or Chinese/Korean) prose with English identifiers — technical writing, PRDs, LaTeX sources, localized docs — hits this often enough that Edit becomes unreliable without a workaround.

Code Example

<tool_use_error>String to replace not found in file.
String: | 項目 ||| 説明(前提条件)と技術用語(ID / 状態 / 時刻)が交じる行 |
</tool_use_error>

---

| item ||| 説明(前提条件)と技術用語(id / state / timestamp)が交じる行 |
RAW_BUFFERClick to expand / collapse

What's Wrong?

When editing a file that contains full-width CJK punctuation (e.g. , , , , , ), the Edit tool frequently fails with String to replace not found — even though the content visibly matches. The root cause is not the Edit tool itself, which byte-matches correctly. It is the model silently generating half-width ASCII punctuation ((, ), ,, ., :, ;) when it intended full-width, producing an old_string whose bytes do not match the file.

This is especially destructive because:

  • The model's self-report is "I'm writing full-width punctuation" — it doesn't know it drifted.
  • Debugging looks like a tool bug: the user sees a mismatch between the visible text and the reported old_string, and assumes Edit has a normalization pass.
  • The drift is context-dependent: short pure-Japanese sentences tend to keep full-width; long technical paragraphs that interleave Japanese with English tech terms drift to half-width at a much higher rate.
  • It is specific to generation in the old_string parameter. Characters copied via a Python helper using \uff08 / \uff09 escapes reproduce fine; characters generated token-by-token as part of a long Japanese sentence do not.

The same drift direction exists in #50975 but that is a Write-tool bug (tool silently half-widthing on overwrite). This report is a separate issue about model-level drift affecting Edit's old_string, not tool behavior. #50975's author initially conflated the two and then corrected themselves in a follow-up comment, which is the exact same misdiagnosis path users will walk on first encounter.

What Should Happen?

The model should generate the full-width punctuation it intends. When the target file contains () (U+FF08 / U+FF09), the generated old_string should contain (), not () (U+0028 / U+0029). At minimum, the model's self-narration and its old_string bytes should agree.

Error Messages/Logs

<tool_use_error>String to replace not found in file.
String: | 項目 | 未 | — | 説明(前提条件)と技術用語(ID / 状態 / 時刻)が交じる行 |
</tool_use_error>

The file actually contained full-width (U+FF08) and (U+FF09), but the model emitted the half-width ( / ) you see above while claiming to output full-width. Edit correctly rejected the mismatch.

Steps to Reproduce

  1. Create a file with a line that mixes Japanese prose with English identifiers, using full-width CJK parentheses in the Japanese prose. Example line (ASCII () below are placeholders — substitute U+FF08 / U+FF09 so the file actually contains full-width):
    | item | 未 | — | 説明(前提条件)と技術用語(id / state / timestamp)が交じる行 |
  2. Ask Claude Code to edit that row: Edit old_string=<row verbatim>, new_string=<anything>.
  3. Observe that the model's generated old_string contains half-width () instead of the file's full-width (), producing String to replace not found.
  4. Verify by inspecting the bytes the model actually emitted (e.g. by instrumenting the transport, or by running an Edit with old_string constructed via Python \uFF08 / \uFF09 escapes — that one matches correctly).

Reproduction is probabilistic — the shorter and more purely Japanese the text, the more reliably the model emits full-width; the longer and more technical (with English identifiers interleaved), the more reliably it drifts.

Additional Information

  • Verification that Edit is innocent: I ran a small experiment with short test strings (全角(カッコ)テスト). Edit matched byte-for-byte when old_string contained full-width (), and correctly failed when old_string used half-width () while the file had full-width. The tool does no normalization — the issue is entirely upstream in the model output.
  • Workaround: For bulk updates I gave up on having the model re-emit Japanese-laden old_string verbatim. Instead I wrote a small Python script that does line-level regex replacement (e.g. ^| item \|[^\n]*$) with only the new content. This bypasses the drift since the original line never needs to be regenerated.
  • Observed affected punctuation (same drift direction as #50975):
    • U+FF08 ↔ ( U+0028
    • U+FF09 ↔ ) U+0029
    • U+FF1A ↔ : U+003A
    • Likely also 、。;?! based on priors, though I only verified (): directly.
  • Unaffected: 「」『』, per #50975's observation.
  • User-visible impact: Any workflow that mixes Japanese (or Chinese/Korean) prose with English identifiers — technical writing, PRDs, LaTeX sources, localized docs — hits this often enough that Edit becomes unreliable without a workaround.

Related but distinct: #50975 (Write silently half-widths on overwrite).

✍️ Author: Claude Code with @carrotRakko (AI-written, human-approved)

extent analysis

TL;DR

The model should be adjusted to correctly generate full-width CJK punctuation in the old_string parameter to match the intended output.

Guidance

  • Verify the model's output by inspecting the bytes emitted for old_string to confirm the presence of half-width punctuation instead of full-width.
  • Use a Python script with regex replacement as a temporary workaround to bypass the model's drift issue.
  • Test the model with short, purely Japanese sentences to observe the reliable emission of full-width punctuation, and longer technical paragraphs to see the drift to half-width.
  • Check for other affected punctuation characters, such as (U+FF1A), (U+FF0C), (U+3002), (U+3001), in addition to (U+FF08) and (U+FF09).

Example

No code snippet is provided as the issue is related to the model's output, but a Python script using \uFF08 and \uFF09 escapes can be used to construct old_string and verify the correct output.

Notes

The issue is specific to the model's generation of old_string and does not affect the Edit tool itself. The drift is context-dependent and more pronounced in longer technical paragraphs with interleaved English identifiers.

Recommendation

Apply a workaround using a Python script with regex replacement until the model is adjusted to correctly generate full-width CJK punctuation. This will ensure reliable output and prevent errors when using the Edit tool.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [BUG] Model drifts full-width CJK punctuation to half-width in Edit's old_string, causing silent "String to replace not found" [1 comments, 2 participants]