codex - 💡(How to fix) Fix `codex review`: final-verdict `codex` marker emitted on stderr instead of stdout (v0.133.0-alpha.1)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In codex review runs, the column-0 codex line that historically delimited the agent's final-verdict block is being written to stderr, while stdout receives only the verdict prose with no preceding marker. Downstream parsers that key off the codex marker on stdout (the documented final-verdict delimiter) see UNKNOWN where they should see PASS/BLOCKED.

Reproducible on 100% of 80 real CLI runs in our test corpus (v0.133.0-alpha.1, gpt-5.5, xhigh reasoning).

Root Cause

In codex review runs, the column-0 codex line that historically delimited the agent's final-verdict block is being written to stderr, while stdout receives only the verdict prose with no preceding marker. Downstream parsers that key off the codex marker on stdout (the documented final-verdict delimiter) see UNKNOWN where they should see PASS/BLOCKED.

Reproducible on 100% of 80 real CLI runs in our test corpus (v0.133.0-alpha.1, gpt-5.5, xhigh reasoning).

Fix Action

Fix / Workaround

codex review writes its final-verdict block to stdout, prefixed by a column-0 codex marker line. The marker serves as the delimiter between the agent's preceding exec/reasoning log and the user-facing review prose. Downstream tooling (CI bots, PR labelers, paste-into-comment helpers) parses stdout, finds the last codex marker, and treats everything after it as the canonical review.

We've spent ~4 working sessions (~16 hours) building a parser-side workaround (stderr-fallback parsing + a head-SHA-verified "captured-PASS via dump" merge-auth protocol). The workaround is stable but adds ~5 manual operator steps per audited PR indefinitely. A 1-line fix upstream would eliminate it.

Workaround we currently use

Code Example

OpenAI Codex v0.133.0-alpha.1
model: gpt-5.5
provider: openai
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none

---

codex review  # in a worktree, against `origin/main`

---

The workflow changes consistently expand the synchronize handler to recognize the same Codex audit signals used by the labeler and preserve/remove labels accordingly. I did not identify any introduced correctness, security, or maintainability issue that warrants an inline finding.

---

exec
/bin/bash -lc 'git diff --stat 3b25daf76edc684d63730294f12215f0ca65212a' in /private/var/folders//wt
 succeeded in 0ms:

codex
The workflow changes consistently expand the synchronize handler to recognize the same Codex audit signals used by the labeler and preserve/remove labels accordingly. I did not identify any introduced correctness, security, or maintainability issue that warrants an inline finding.
RAW_BUFFERClick to expand / collapse

codex review: final-verdict codex marker emitted on stderr instead of stdout (v0.133.0-alpha.1)

Summary

In codex review runs, the column-0 codex line that historically delimited the agent's final-verdict block is being written to stderr, while stdout receives only the verdict prose with no preceding marker. Downstream parsers that key off the codex marker on stdout (the documented final-verdict delimiter) see UNKNOWN where they should see PASS/BLOCKED.

Reproducible on 100% of 80 real CLI runs in our test corpus (v0.133.0-alpha.1, gpt-5.5, xhigh reasoning).

Environment

OpenAI Codex v0.133.0-alpha.1
model: gpt-5.5
provider: openai
approval: never
sandbox: read-only
reasoning effort: xhigh
reasoning summaries: none

Invoked as:

codex review  # in a worktree, against `origin/main`

(Wrapper context: we run this inside a temporary git worktree checked out to a PR head, then parse codex review's output to extract the final review block and classify PASS/BLOCKED. Wrapper source: https://github.com/jeffhuber/cube-snap/blob/main/tools/codex_audit_pr.py.)

Expected behavior

codex review writes its final-verdict block to stdout, prefixed by a column-0 codex marker line. The marker serves as the delimiter between the agent's preceding exec/reasoning log and the user-facing review prose. Downstream tooling (CI bots, PR labelers, paste-into-comment helpers) parses stdout, finds the last codex marker, and treats everything after it as the canonical review.

Actual behavior

  • stdout (typically ~100-800 bytes): contains only the verdict prose, with no preceding codex marker line.
  • stderr (typically ~60-180 KB): contains the full session log — exec tool calls, reasoning, AND the column-0 codex marker followed by the same verdict prose that lands on stdout.

The verdict prose is therefore duplicated across the two streams (once on stdout, once on stderr), but the marker that delimits it appears only on stderr.

Empirical evidence

We captured raw stdout+stderr for every codex review invocation in our parser-failure path (using Popen(stdin=DEVNULL, stdout=PIPE, stderr=PIPE)). Over 111 captured dumps:

Marker locationDump count% of real runs
stdout has column-0 codex00.0%
stderr has column-0 codex80100.0%
Neither31(all bot-setup test fixtures: 442–459 bytes each, not real CLI runs)

Among the 80 real runs: every single one has the marker on stderr, none on stdout. That's deterministic, not flaky.

Minimal canonical example

From dump jeffhuber_cube-snap_pr206_690669b0ec9d_20260528T011833_133584Z.log (282-byte stdout, 108,404-byte stderr):

stdout (entire content):

The workflow changes consistently expand the synchronize handler to recognize the same Codex audit signals used by the labeler and preserve/remove labels accordingly. I did not identify any introduced correctness, security, or maintainability issue that warrants an inline finding.

stderr (relevant tail — same prose, but with the marker):

exec
/bin/bash -lc 'git diff --stat 3b25daf76edc684d63730294f12215f0ca65212a' in /private/var/folders/…/wt
 succeeded in 0ms:

codex
The workflow changes consistently expand the synchronize handler to recognize the same Codex audit signals used by the labeler and preserve/remove labels accordingly. I did not identify any introduced correctness, security, or maintainability issue that warrants an inline finding.

Impact

The marker location used to be the contract between codex review and any consumer that wanted to extract the verdict programmatically. With the marker on stderr:

  • Bots that only pipe stdout cannot find the delimiter and have to either (a) treat all of stdout as the verdict (lossy if the model ever prepends anything), or (b) fall back to scraping stderr (which mixes session log + verdict and requires a much fuzzier parser).
  • CI/labeler pipelines that classify PASS vs BLOCKED via the codex marker silently fall through to "unparseable" on every run.
  • The duplication (verdict on both streams) suggests the verdict-emission code was split into "echo to stderr for the session log, also echo to stdout for the consumer" — but stdout lost the delimiter in that split.

We've spent ~4 working sessions (~16 hours) building a parser-side workaround (stderr-fallback parsing + a head-SHA-verified "captured-PASS via dump" merge-auth protocol). The workaround is stable but adds ~5 manual operator steps per audited PR indefinitely. A 1-line fix upstream would eliminate it.

Proposed fix

Re-add the column-0 codex marker to stdout immediately before the verdict block. Either:

  • Option A (preferred): emit the full \ncodex\n<verdict>\n block to stdout (matches the historical contract; consumers don't need to change).
  • Option B: leave stdout as verdict-only and document the new shape — codex review's stdout is now "verdict-only, no marker; consume stderr for session log + delimiter". This is a breaking change for existing consumers but at least makes the new contract explicit.

We'd vote A.

Workaround we currently use

Wrapper-side: https://github.com/jeffhuber/cube-snap/blob/main/tools/codex_audit_pr.py (parse_codex_output), with stderr-fallback that requires strict Codex-verdict shape (substantive summary line + [Pn] title — file:line finding bullets) before accepting a stderr codex block as authoritative. PR-comment-side: a captured-PASS-via-dump protocol that re-verifies head-match before treating a UNKNOWN-with-PASS-prose dump as merge-authorizing.

Happy to share fixture dumps if useful; we have 80 deterministic real-run dumps on disk.


(Filed by a downstream consumer building a parallel PR-audit lane on top of codex review. Not affiliated with OpenAI. No reproduction-environment access to the CLI's source.)

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

codex review writes its final-verdict block to stdout, prefixed by a column-0 codex marker line. The marker serves as the delimiter between the agent's preceding exec/reasoning log and the user-facing review prose. Downstream tooling (CI bots, PR labelers, paste-into-comment helpers) parses stdout, finds the last codex marker, and treats everything after it as the canonical review.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING