hermes - 💡(How to fix) Fix [Bug] Windows: subprocess env missing PYTHONUTF8=1 causes UnicodeEncodeError on non-ASCII output in CP936/legacy locales

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On Windows, when Hermes spawns a Python (or any Unicode-aware) subprocess without explicitly setting PYTHONUTF8=1 in the child's env, the child uses the system legacy code page for stdio — CP936 on zh-CN, CP1252 on en-US, CP932 on ja-JP, etc.

Result: any subprocess that emits non-ASCII output (Chinese / Japanese / Korean / Arabic characters, emoji, mathematical symbols, …) raises UnicodeEncodeError or produces mojibake.

Root Cause

On Windows, when Hermes spawns a Python (or any Unicode-aware) subprocess without explicitly setting PYTHONUTF8=1 in the child's env, the child uses the system legacy code page for stdio — CP936 on zh-CN, CP1252 on en-US, CP932 on ja-JP, etc.

Result: any subprocess that emits non-ASCII output (Chinese / Japanese / Korean / Arabic characters, emoji, mathematical symbols, …) raises UnicodeEncodeError or produces mojibake.

Fix Action

Fix / Workaround

Workaround currently used downstream

Code Example

# Windows + non-en-US locale (e.g., zh-CN with CP936 default)
import asyncio

async def main():
    proc = await asyncio.create_subprocess_exec(
        "python", "-c", "print('中文 emoji 🎉')",
        stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
    )
    out, err = await proc.communicate()
    print("rc:", proc.returncode)
    print("out:", out)
    print("err:", err)

asyncio.run(main())

---

env = os.environ.copy()
env.setdefault("PYTHONUTF8", "1")
proc = await asyncio.create_subprocess_exec(*cmd, env=env, ...)
RAW_BUFFERClick to expand / collapse

Summary

On Windows, when Hermes spawns a Python (or any Unicode-aware) subprocess without explicitly setting PYTHONUTF8=1 in the child's env, the child uses the system legacy code page for stdio — CP936 on zh-CN, CP1252 on en-US, CP932 on ja-JP, etc.

Result: any subprocess that emits non-ASCII output (Chinese / Japanese / Korean / Arabic characters, emoji, mathematical symbols, …) raises UnicodeEncodeError or produces mojibake.

Severity

  • Platform: Windows only (Linux / macOS default to UTF-8 stdio since their shells inherit LANG/LC_ALL)
  • Frequency: 100% of Windows users in non-en-US locales whose subprocesses emit non-ASCII text. Triggers on pytest traces with Chinese assert messages, file-listing of non-ASCII paths, LLM-quoted Chinese content, etc.
  • Symptom: UnicodeEncodeError: 'gbk' codec can't encode character '\\u4e2d' in position N (or similar for other code pages); subprocess crash with cryptic message; or silent mojibake in captured output.

Reproduction

# Windows + non-en-US locale (e.g., zh-CN with CP936 default)
import asyncio

async def main():
    proc = await asyncio.create_subprocess_exec(
        "python", "-c", "print('中文 emoji 🎉')",
        stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE,
    )
    out, err = await proc.communicate()
    print("rc:", proc.returncode)
    print("out:", out)
    print("err:", err)

asyncio.run(main())

Without PYTHONUTF8=1, on a CP936 box this typically produces non-zero return code with UnicodeEncodeError in stderr; with PYTHONUTF8=1, it works.

Workaround currently used downstream

env = os.environ.copy()
env.setdefault("PYTHONUTF8", "1")
proc = await asyncio.create_subprocess_exec(*cmd, env=env, ...)

This is what my own subprocess-executor work (#31385) applies before every spawn. It should be the default behavior for any spawn site, not opt-in per integration.

Proposed fix

Centralize subprocess env preparation in a small helper (hermes_subprocess_env() or similar) that:

  1. Copies os.environ
  2. Sets PYTHONUTF8=1 (Windows only; no-op on Linux/macOS)
  3. Optionally applies the env-strip blocklist (separate security issue I'm about to file: ANTHROPIC_API_KEY / OAuth token strip)

Then audit all subprocess.* / asyncio.subprocess_* callsites in agent/, gateway/, tools/ to use this helper.

Reference: PEP 540 / Python 3.7+ UTF-8 mode — PYTHONUTF8=1 is the standard knob.

Happy to PR the helper + audit if there's interest. Filing as a bug for triage first.

Related: #31385 (bridge), #31417 (StreamReader 64 KiB), and a sibling .cmd shim metacharacter bug I'm filing alongside this.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

hermes - 💡(How to fix) Fix [Bug] Windows: subprocess env missing PYTHONUTF8=1 causes UnicodeEncodeError on non-ASCII output in CP936/legacy locales