claude-code - 💡(How to fix) Fix [MCP/Drive] read_file_content drops 4th UTF-8 byte of emojis whose continuation byte is in the C1 control range

claude-code2026-05-08 09:19:17

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

Likely root cause

Fix Action

Workaround

mcp__claude_ai_Google_Drive__download_file_content with exportMimeType="application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" (XLSX) or "text/csv" returns the raw bytes in base64. Once decoded, all emojis are intact — the corruption is specifically inside the natural-language conversion path of read_file_content.

RAW_BUFFERClick to expand / collapse

Bug

The mcp__claude_ai_Google_Drive__read_file_content tool (the Google Drive integration exposed via Claude.ai connectors) silently corrupts UTF-8 emojis whose 4th byte falls in the C1 control range (0x80–0x9F). The byte is dropped before the response is returned, which makes distinct emojis indistinguishable in the output.

Affected emojis

Most of the U+1F9XX Supplemental Symbols and Pictographs block, plus any emoji whose 4th UTF-8 byte happens to be 0x80–0x9F. Examples that all collapse to ð§ or ð in the response:

Emoji	Codepoint	UTF-8 bytes	What `read_file_content` returns
🧈	U+1F9C8	`F0 9F A7 88`	`ð§`
🧊	U+1F9CA	`F0 9F A7 8A`	`ð§` (same! 4th byte dropped)
🌑	U+1F311	`F0 9F 8C 91`	`ð`
🍏	U+1F34F	`F0 9F 8D 8F`	`ð`
🏆	U+1F3C6	`F0 9F 8F 86`	`ð`

Emojis whose 4th byte is outside the C1 range come through correctly:

Emoji	Codepoint	UTF-8 bytes	What `read_file_content` returns
🧱	U+1F9F1	`F0 9F A7 B1`	`ð§±` (`B1` = `±` survives)

Repro

Create a Google Sheet with one cell 🧈 Oro and another cell 🧊 Hierro.
Call mcp__claude_ai_Google_Drive__read_file_content with that sheet's fileId.
In the response, both cells contain the same ð§ prefix — they cannot be told apart.

Likely root cause

Looks like a UTF-8 → Latin-1 charset confusion in the natural-language conversion pipeline. Bytes 0x80–0x9F are undefined in pure ISO-8859-1 (the "C1 control" range). If the pipeline interprets the UTF-8 bytes as Latin-1 at any step and then runs them through a control-character sanitizer before re-encoding to UTF-8, exactly this behaviour appears: emojis with a "printable Latin-1" 4th byte (e.g. B1 = ±) survive, emojis with a "C1 control" 4th byte get truncated.

Workaround

Impact

For users who store distinct emojis as semantic data in Sheets (e.g. game design spreadsheets where 🧈 means gold and 🧊 means iron), read_file_content silently merges values that are different in the source. The result is wrong but plausible-looking, so the bug can go unnoticed and lead to incorrect downstream decisions / generated code. In our case it pushed me to confidently report wrong building costs to the user, which they had to push back on multiple times before we tracked down the cause.

Discovered via

Claude Code session reading a multi-sheet Google Sheet.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#latency issue #model loading #dependency error #configuration error #environment variable

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

claude-code - 💡(How to fix) Fix [MCP/Drive] read_file_content drops 4th UTF-8 byte of emojis whose continuation byte is in the C1 control range

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Likely root cause

Fix Action

Workaround

Bug

Affected emojis

Repro

Likely root cause

Workaround

Impact

Discovered via

Still need to ship something?

TRENDING

claude-code - 💡(How to fix) Fix [MCP/Drive] read_file_content drops 4th UTF-8 byte of emojis whose continuation byte is in the C1 control range

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Likely root cause

Fix Action

Workaround

Bug

Affected emojis

Repro

Likely root cause

Workaround

Impact

Discovered via

Still need to ship something?

RELATED_DISCOVERY

TRENDING