claude-code - 💡(How to fix) Fix [MCP/Drive] read_file_content drops 4th UTF-8 byte of emojis whose continuation byte is in the C1 control range

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Likely root cause

Fix Action

Workaround

mcp__claude_ai_Google_Drive__download_file_content with exportMimeType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" (XLSX) or "text/csv" returns the raw bytes in base64. Once decoded, all emojis are intact — the corruption is specifically inside the natural-language conversion path of read_file_content.

RAW_BUFFERClick to expand / collapse

Bug

The mcp__claude_ai_Google_Drive__read_file_content tool (the Google Drive integration exposed via Claude.ai connectors) silently corrupts UTF-8 emojis whose 4th byte falls in the C1 control range (0x800x9F). The byte is dropped before the response is returned, which makes distinct emojis indistinguishable in the output.

Affected emojis

Most of the U+1F9XX Supplemental Symbols and Pictographs block, plus any emoji whose 4th UTF-8 byte happens to be 0x800x9F. Examples that all collapse to ð§ or ð in the response:

EmojiCodepointUTF-8 bytesWhat read_file_content returns
🧈U+1F9C8F0 9F A7 88ð§
🧊U+1F9CAF0 9F A7 8Að§ (same! 4th byte dropped)
🌑U+1F311F0 9F 8C 91ð
🍏U+1F34FF0 9F 8D 8Fð
🏆U+1F3C6F0 9F 8F 86ð

Emojis whose 4th byte is outside the C1 range come through correctly:

EmojiCodepointUTF-8 bytesWhat read_file_content returns
🧱U+1F9F1F0 9F A7 B1ð§± (B1 = ± survives)

Repro

  1. Create a Google Sheet with one cell 🧈 Oro and another cell 🧊 Hierro.
  2. Call mcp__claude_ai_Google_Drive__read_file_content with that sheet's fileId.
  3. In the response, both cells contain the same ð§ prefix — they cannot be told apart.

Likely root cause

Looks like a UTF-8 → Latin-1 charset confusion in the natural-language conversion pipeline. Bytes 0x800x9F are undefined in pure ISO-8859-1 (the "C1 control" range). If the pipeline interprets the UTF-8 bytes as Latin-1 at any step and then runs them through a control-character sanitizer before re-encoding to UTF-8, exactly this behaviour appears: emojis with a "printable Latin-1" 4th byte (e.g. B1 = ±) survive, emojis with a "C1 control" 4th byte get truncated.

Workaround

mcp__claude_ai_Google_Drive__download_file_content with exportMimeType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" (XLSX) or "text/csv" returns the raw bytes in base64. Once decoded, all emojis are intact — the corruption is specifically inside the natural-language conversion path of read_file_content.

Impact

For users who store distinct emojis as semantic data in Sheets (e.g. game design spreadsheets where 🧈 means gold and 🧊 means iron), read_file_content silently merges values that are different in the source. The result is wrong but plausible-looking, so the bug can go unnoticed and lead to incorrect downstream decisions / generated code. In our case it pushed me to confidently report wrong building costs to the user, which they had to push back on multiple times before we tracked down the cause.

Discovered via

Claude Code session reading a multi-sheet Google Sheet.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

claude-code - 💡(How to fix) Fix [MCP/Drive] read_file_content drops 4th UTF-8 byte of emojis whose continuation byte is in the C1 control range