Root Cause

In codex review runs, the column-0 codex line that historically delimited the agent's final-verdict block is being written to stderr, while stdout receives only the verdict prose with no preceding marker. Downstream parsers that key off the codex marker on stdout (the documented final-verdict delimiter) see UNKNOWN where they should see PASS/BLOCKED.

Reproducible on 100% of 80 real CLI runs in our test corpus (v0.133.0-alpha.1, gpt-5.5, xhigh reasoning).

Fix Action

Fix / Workaround

codex review writes its final-verdict block to stdout, prefixed by a column-0 codex marker line. The marker serves as the delimiter between the agent's preceding exec/reasoning log and the user-facing review prose. Downstream tooling (CI bots, PR labelers, paste-into-comment helpers) parses stdout, finds the last codex marker, and treats everything after it as the canonical review.

We've spent ~4 working sessions (~16 hours) building a parser-side workaround (stderr-fallback parsing + a head-SHA-verified "captured-PASS via dump" merge-auth protocol). The workaround is stable but adds ~5 manual operator steps per audited PR indefinitely. A 1-line fix upstream would eliminate it.

Workaround we currently use

Code Example

OpenAI Codex v0.133.0-alpha.1
model: gpt-5.5
provider: openai
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none

---

codex review  # in a worktree, against `origin/main`

---

The workflow changes consistently expand the synchronize handler to recognize the same Codex audit signals used by the labeler and preserve/remove labels accordingly. I did not identify any introduced correctness, security, or maintainability issue that warrants an inline finding.

---

…
exec
/bin/bash -lc 'git diff --stat 3b25daf76edc684d63730294f12215f0ca65212a' in /private/var/folders/…/wt
 succeeded in 0ms:
…

codex
The workflow changes consistently expand the synchronize handler to recognize the same Codex audit signals used by the labeler and preserve/remove labels accordingly. I did not identify any introduced correctness, security, or maintainability issue that warrants an inline finding.

`codex review`: final-verdict `codex` marker emitted on stderr instead of stdout (v0.133.0-alpha.1)

Q: Expected behavior

`codex review` writes its final-verdict block to **stdout**, prefixed by a column-0 `codex` marker line. The marker serves as the delimiter between the agent's preceding exec/reasoning log and the user-facing review prose. Downstream tooling (CI bots, PR labelers, paste-into-comment helpers) parses stdout, finds the last `codex` marker, and treats everything after it as the canonical review.

Summary

Reproducible on 100% of 80 real CLI runs in our test corpus (v0.133.0-alpha.1, gpt-5.5, xhigh reasoning).

Environment

OpenAI Codex v0.133.0-alpha.1
model: gpt-5.5
provider: openai
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none

Invoked as:

codex review  # in a worktree, against `origin/main`

(Wrapper context: we run this inside a temporary git worktree checked out to a PR head, then parse codex review's output to extract the final review block and classify PASS/BLOCKED. Wrapper source: https://github.com/jeffhuber/cube-snap/blob/main/tools/codex_audit_pr.py.)

Expected behavior

Actual behavior

stdout (typically ~100-800 bytes): contains only the verdict prose, with no preceding codex marker line.
stderr (typically ~60-180 KB): contains the full session log — exec tool calls, reasoning, AND the column-0 codex marker followed by the same verdict prose that lands on stdout.

The verdict prose is therefore duplicated across the two streams (once on stdout, once on stderr), but the marker that delimits it appears only on stderr.

Empirical evidence

We captured raw stdout+stderr for every codex review invocation in our parser-failure path (using Popen(stdin=DEVNULL, stdout=PIPE, stderr=PIPE)). Over 111 captured dumps:

Marker location	Dump count	% of real runs
stdout has column-0 `codex`	0	0.0%
stderr has column-0 `codex`	80	100.0%
Neither	31	(all bot-setup test fixtures: 442–459 bytes each, not real CLI runs)

Among the 80 real runs: every single one has the marker on stderr, none on stdout. That's deterministic, not flaky.

Minimal canonical example

From dump jeffhuber_cube-snap_pr206_690669b0ec9d_20260528T011833_133584Z.log (282-byte stdout, 108,404-byte stderr):

stdout (entire content):

The workflow changes consistently expand the synchronize handler to recognize the same Codex audit signals used by the labeler and preserve/remove labels accordingly. I did not identify any introduced correctness, security, or maintainability issue that warrants an inline finding.

stderr (relevant tail — same prose, but with the marker):

…
exec
/bin/bash -lc 'git diff --stat 3b25daf76edc684d63730294f12215f0ca65212a' in /private/var/folders/…/wt
 succeeded in 0ms:
…

codex
The workflow changes consistently expand the synchronize handler to recognize the same Codex audit signals used by the labeler and preserve/remove labels accordingly. I did not identify any introduced correctness, security, or maintainability issue that warrants an inline finding.

Impact

The marker location used to be the contract between codex review and any consumer that wanted to extract the verdict programmatically. With the marker on stderr:

Bots that only pipe stdout cannot find the delimiter and have to either (a) treat all of stdout as the verdict (lossy if the model ever prepends anything), or (b) fall back to scraping stderr (which mixes session log + verdict and requires a much fuzzier parser).
CI/labeler pipelines that classify PASS vs BLOCKED via the codex marker silently fall through to "unparseable" on every run.
The duplication (verdict on both streams) suggests the verdict-emission code was split into "echo to stderr for the session log, also echo to stdout for the consumer" — but stdout lost the delimiter in that split.

Proposed fix

Re-add the column-0 codex marker to stdout immediately before the verdict block. Either:

Option A (preferred): emit the full \ncodex\n<verdict>\n block to stdout (matches the historical contract; consumers don't need to change).
Option B: leave stdout as verdict-only and document the new shape — codex review's stdout is now "verdict-only, no marker; consume stderr for session log + delimiter". This is a breaking change for existing consumers but at least makes the new contract explicit.

We'd vote A.

Workaround we currently use

Wrapper-side: https://github.com/jeffhuber/cube-snap/blob/main/tools/codex_audit_pr.py (parse_codex_output), with stderr-fallback that requires strict Codex-verdict shape (substantive summary line + [Pn] title — file:line finding bullets) before accepting a stderr codex block as authoritative. PR-comment-side: a captured-PASS-via-dump protocol that re-verifies head-match before treating a UNKNOWN-with-PASS-prose dump as merge-authorizing.

Happy to share fixture dumps if useful; we have 80 deterministic real-run dumps on disk.

(Filed by a downstream consumer building a parallel PR-audit lane on top of codex review. Not affiliated with OpenAI. No reproduction-environment access to the CLI's source.)

FAQ

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

codex - 💡(How to fix) Fix `codex review`: final-verdict `codex` marker emitted on stderr instead of stdout (v0.133.0-alpha.1)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround we currently use

Code Example

`codex review`: final-verdict `codex` marker emitted on stderr instead of stdout (v0.133.0-alpha.1)

Summary

Environment

Expected behavior

Actual behavior

Empirical evidence

Minimal canonical example

Impact

Proposed fix

Workaround we currently use

FAQ

Expected behavior

Still need to ship something?

TRENDING

codex - 💡(How to fix) Fix `codex review`: final-verdict `codex` marker emitted on stderr instead of stdout (v0.133.0-alpha.1)

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Workaround we currently use

Code Example

codex review: final-verdict codex marker emitted on stderr instead of stdout (v0.133.0-alpha.1)

Summary

Environment

Expected behavior

Actual behavior

Empirical evidence

Minimal canonical example

Impact

Proposed fix

Workaround we currently use

FAQ

Expected behavior

Still need to ship something?

TRENDING

`codex review`: final-verdict `codex` marker emitted on stderr instead of stdout (v0.133.0-alpha.1)