codex - 💡(How to fix) Fix Windows: shell wrapper via `powershell.exe -Command` mojibakes UTF-8 files on non-English locales (PowerShell 5.1 `Get-Content` defaults to ANSI codepage)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

On non-English Windows (any CJK / Arabic / Cyrillic locale), the codex shell tool produces mojibake whenever it reads a UTF-8-without-BOM text file. The model receives the corrupted text and silently answers as if the corrupted text were the file's actual content — the failure mode is "confident wrong answer," not an exception, so it can be mistaken for a model hallucination.

  • No exception: The model "succeeds" with a confidently-wrong answer rather than failing loudly, so it can be mistaken for a model hallucination rather than a wire-level encoding bug.

Root Cause

The codex Windows shell wrapper invokes:


powershell.exe -Command "<model_emitted_command>"

(seen verbatim in our shell-tool execution logs: "C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content -Path E:\\...")

The wrapped command commonly contains Get-Content, or aliases like cat (which is a PowerShell alias for Get-Content). On Windows PowerShell 5.1, Get-Content defaults to [System.Text.Encoding]::Default when -Encoding is not specified — and Encoding::Default is the system ANSI codepage, NOT UTF-8:

Windows localeANSI codepageBehavior reading UTF-8 file
zh-CN (Simplified Chinese)CP936 (GBK)mojibake
zh-TW / zh-HK (Traditional Chinese)CP950 (Big5)mojibake
ja-JP (Japanese)CP932 (Shift-JIS)mojibake
ko-KR (Korean)CP949mojibake
en-US (English)CP1252hidden: CP1252 ≡ ASCII for 0x00–0x7F, so pure-ASCII files appear fine; only files containing Latin-1 high bits manifest. Most en-US testing never hits a manifesting file.

This is the well-known PowerShell 5.1 default-encoding behavior that Microsoft fixed in PowerShell 7+ (which defaults to UTF-8 across cmdlets). Codex on Windows ends up sitting on top of the broken 5.1 default because powershell.exe is the only PowerShell guaranteed to be installed on every Windows machine.

Fix Action

Fix / Workaround

Workaround we currently use (not from inside codex)

This workaround does NOT work for the codex IDE Extension wrapper described in #17208, which DOES pass -NoProfile. That code path therefore still mojibakes on non-English Windows even with the profile in place — fixing it requires an in--Command UTF-8 init (fix 1 above); no external workaround is available.

Code Example

not available (using @openai/codex-sdk programmatically; `codex doctor` is the CLI-only entrypoint)

---

powershell.exe -Command "<model_emitted_command>"

---

[Console]::OutputEncoding=[System.Text.Encoding]::UTF8;

---

$PSDefaultParameterValues['Get-Content:Encoding'] = 'utf8'
$PSDefaultParameterValues['Set-Content:Encoding'] = 'utf8'
$PSDefaultParameterValues['Add-Content:Encoding'] = 'utf8'
$PSDefaultParameterValues['Out-File:Encoding']    = 'utf8'

---

"貝克街15號" | Out-File -Encoding utf8NoBOM C:\tmp\repro.md

---

What's inside C:\tmp\repro.md?

---

powershell.exe -Command "Get-Content -Path C:\tmp\repro.md"

---

%UserProfile%\Documents\WindowsPowerShell\Profile.ps1
RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

@openai/codex-sdk 0.130.0 (programmatic JSON-RPC via codex app-server; not the codex CLI binary itself). Same wrapper behavior is observable on the codex CLI build of the same vintage.

What subscription do you have?

ChatGPT Plus

Which model were you using?

gpt-5.5

What platform is your computer?

Windows 10/11, Chinese (Simplified) zh-CN system locale. PowerShell version: Windows PowerShell 5.1 (the default Windows ships with — C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe).

What terminal emulator and version are you using (if applicable)?

custom Tauri-based editor that embeds codex app-server via the TypeScript SDK. The same mojibake reproduces from a plain Windows Terminal session driving codex CLI directly.

Codex doctor report

not available (using @openai/codex-sdk programmatically; `codex doctor` is the CLI-only entrypoint)

What issue are you seeing?

On non-English Windows (any CJK / Arabic / Cyrillic locale), the codex shell tool produces mojibake whenever it reads a UTF-8-without-BOM text file. The model receives the corrupted text and silently answers as if the corrupted text were the file's actual content — the failure mode is "confident wrong answer," not an exception, so it can be mistaken for a model hallucination.

Concrete observation: a skill file containing the literal string 貝克街15號 (UTF-8 bytes E8 B2 9D E5 85 8B E8 A1 97 31 35 E8 99 9F) was read by the model as the mojibake 璨濆厠琛?5铏. The model's final user-facing answer became 貝克街 5號 — note that "1" is gone and a space appears, because the mojibake bytes still form valid (but wrong) Unicode codepoints which the model then tried to reason about.

Root cause

The codex Windows shell wrapper invokes:


powershell.exe -Command "<model_emitted_command>"

(seen verbatim in our shell-tool execution logs: "C:\\WINDOWS\\System32\\WindowsPowerShell\\v1.0\\powershell.exe" -Command "Get-Content -Path E:\\...")

The wrapped command commonly contains Get-Content, or aliases like cat (which is a PowerShell alias for Get-Content). On Windows PowerShell 5.1, Get-Content defaults to [System.Text.Encoding]::Default when -Encoding is not specified — and Encoding::Default is the system ANSI codepage, NOT UTF-8:

Windows localeANSI codepageBehavior reading UTF-8 file
zh-CN (Simplified Chinese)CP936 (GBK)mojibake
zh-TW / zh-HK (Traditional Chinese)CP950 (Big5)mojibake
ja-JP (Japanese)CP932 (Shift-JIS)mojibake
ko-KR (Korean)CP949mojibake
en-US (English)CP1252hidden: CP1252 ≡ ASCII for 0x00–0x7F, so pure-ASCII files appear fine; only files containing Latin-1 high bits manifest. Most en-US testing never hits a manifesting file.

This is the well-known PowerShell 5.1 default-encoding behavior that Microsoft fixed in PowerShell 7+ (which defaults to UTF-8 across cmdlets). Codex on Windows ends up sitting on top of the broken 5.1 default because powershell.exe is the only PowerShell guaranteed to be installed on every Windows machine.

The existing prefix_powershell_script_with_utf8() utility is incomplete

codex-rs/shell-command/src/powershell.rs already has a helper that prepends:

[Console]::OutputEncoding=[System.Text.Encoding]::UTF8;

That fixes the stdout-write side, but it does NOT change how Get-Content reads files. To fix the read side, the prefix must additionally set:

$PSDefaultParameterValues['Get-Content:Encoding'] = 'utf8'
$PSDefaultParameterValues['Set-Content:Encoding'] = 'utf8'
$PSDefaultParameterValues['Add-Content:Encoding'] = 'utf8'
$PSDefaultParameterValues['Out-File:Encoding']    = 'utf8'

So even the codex code paths that DO call the existing UTF-8 prefix utility are still letting Get-Content mojibake on read.

What steps can reproduce the bug?

System: Windows 10 or 11, set to a non-Latin-1 locale (zh-CN reproduces; zh-TW / ja-JP / ko-KR equally affected; en-US likely will NOT reproduce due to CP1252/ASCII overlap).

  1. Create a UTF-8-without-BOM text file containing non-ASCII text:

    "貝克街15號" | Out-File -Encoding utf8NoBOM C:\tmp\repro.md

    Verify with Format-Hex -Path C:\tmp\repro.md — the bytes should be E8 B2 9D E5 85 8B E8 A1 97 31 35 E8 99 9F (clean UTF-8).

  2. In a codex thread (CLI or app-server, any thread ID), ask:

    What's inside C:\tmp\repro.md?
  3. The model will emit a read command (typically cat C:\tmp\repro.md or Get-Content C:\tmp\repro.md).

  4. Codex wraps it as:

    powershell.exe -Command "Get-Content -Path C:\tmp\repro.md"
  5. PowerShell 5.1 decodes the UTF-8 file bytes through CP936 (or the relevant ANSI codepage) and returns mojibake to the model.

  6. The model's final user-facing answer reports the mojibake content as if it were the file content — e.g. our run produced 貝克街 5號 (the digit 1 is gone, a space appears) when the actual file content is 貝克街15號. The corrupted bytes still represent valid Unicode codepoints, so the model never realizes the data is wrong.

This reproduces 100% of the time on a zh-CN Windows host when reading any UTF-8 file containing non-ASCII content. It does NOT reproduce on en-US Windows reading the same file (the bug is real but hidden by the CP1252/ASCII byte overlap).

What is the expected behavior?

codex should return the file's actual content (貝克街15號) to the model, not the ANSI-codepage-misinterpreted byte string. The end-user-facing answer should accordingly be based on the actual content.

The fix is to make codex's Windows shell wrapper produce UTF-8 reads regardless of the user's system ANSI codepage. Concretely, any of the following would resolve the issue (in order of robustness):

  1. Always inject full UTF-8 init in -Command prefix — Console + $PSDefaultParameterValues together — not optionally via the existing prefix utility but unconditionally on every wrapped command. ~6 statement-separated lines, all on one logical line via ;.
  2. Prefer pwsh.exe (PS 7+) over powershell.exe (PS 5.1) when present — PS 7+ defaults to UTF-8 for cmdlets and would dodge the issue entirely. Probe once at startup (where.exe pwsh), cache the result, fall back to powershell.exe only when absent.
  3. Switch the Windows wrapper to cmd.exe + chcp 65001 >nul && <cmd> — loses PowerShell's Unix-style aliases (cat, ls, which etc.), probably not worth the disruption.
  4. Prefer Git Bash / WSL bash when available — matches the Linux/macOS code path most directly; requires bash to be installed which is not universal on Windows.

Recommended combination: fix 1 unconditionally (cheap, handles legacy 5.1 thoroughly) plus fix 2 when pwsh is present (makes the problem disappear entirely for PS 7+ users).

Additional information

Workaround we currently use (not from inside codex)

Since the wrapper cannot be changed from outside codex, we write a user-level PowerShell profile at

%UserProfile%\Documents\WindowsPowerShell\Profile.ps1

containing the full UTF-8 init shown in "Expected behavior" above. PowerShell auto-loads this on any powershell.exe spawn that does NOT pass -NoProfile. The codex-rs/app-server shell wrapper (the one reproduced above) does not pass -NoProfile, so the profile applies and self-heals the bug on that code path.

This workaround does NOT work for the codex IDE Extension wrapper described in #17208, which DOES pass -NoProfile. That code path therefore still mojibakes on non-English Windows even with the profile in place — fixing it requires an in--Command UTF-8 init (fix 1 above); no external workaround is available.

Related issues

  • #15967 — "Codex use UTF-8 BOM by default and change Chinese to unreadable nonsense" (opened 2026-03-27, open, no maintainer reply). Same underlying domain — the body describes Get-Content mojibake, though the title focuses on BOM-on-write. This new issue is filed because #15967's title and discussion are too vague to point precisely at the root cause; happy to close one or the other after maintainer triage if you'd prefer to consolidate.
  • #17208 — "Codex Extension running Powershell behind every command" — describes the IDE Extension wrapper exactly. Different argv (-NoLogo -NoProfile -NonInteractive -EncodedCommand); same underlying fix (in--Command UTF-8 init) applies.
  • #19629 — "Windows Codex app command execution still depends on PowerShell even when Integrated Terminal Shell is set to Command Prompt" — adjacent: once UTF-8 is fixed inside the PowerShell path, the urgency of honouring the configured shell drops, but the configuration override should still be honoured.

Why this has been hard to catch upstream

  • Locale-hidden: CP1252/ASCII overlap means pure-ASCII files on en-US Windows appear fine. en-US testing never manifests this bug.
  • No exception: The model "succeeds" with a confidently-wrong answer rather than failing loudly, so it can be mistaken for a model hallucination rather than a wire-level encoding bug.
  • Estimated impact: ~25% of Windows users globally (any CJK / Arabic / Cyrillic / Greek / Hebrew locale).

Source pointers for the fix

  • codex-rs/shell-command/src/powershell.rs — existing prefix_powershell_script_with_utf8() is the place to extend with $PSDefaultParameterValues defaults (or to add a probe for pwsh.exe).
  • codex-rs/app-server/ — the wrapper invocation site (where to ensure the extended prefix is always applied, not optional).

Happy to verify any candidate fix against our zh-CN test setup. Thanks for codex — this is the only sustained rough edge we've hit in Windows use.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING