codex - ๐Ÿ’ก(How to fix) Fix Destructive cleanup can be auto-approved during premature rewrite "finalization" under guardian auto-review [1 comments, 2 participants]

Official PRs (โ€ฆ)
ON THIS PAGE

Recommended Tools

ร—6

Utilities matched from this issueโ€™s tags and category โ€” try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful ยท Quick feedback

Loadingโ€ฆ
GitHub stats
openai/codex#18840โ€ขFetched 2026-04-22 07:51:41
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
0
Timeline (top)
labeled ร—3unlabeled ร—2commented ร—1

Fix Action

Fix / Workaround

I have been experimenting with a local patch to address this by introducing a configurable maximum risk level for auto-approval and by falling back to manual approval when guardian returns allow but the assessed risk exceeds that threshold.

RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

0.122.0

What subscription do you have?

Plus

Which model were you using?

gpt-5.4-mini

What platform is your computer?

linux

What terminal emulator and version are you using (if applicable)?

alacritty

What issue are you seeing?

I ran into a real destructive incident while using Codex CLI with the experimental guardian/auto-review approval flow enabled.

I asked Codex to rewrite a Python-based project A as a Go-based project B. During the session, Codex appeared to decide that the rewrite was already in its "wrapping up" phase, even though in reality the Go project only had an initial skeleton and was not a complete replacement yet.

At that point, Codex proposed deleting the original project A with a command equivalent to rm -rf ....

The approval flow showed the action as roughly:

  • user_authorization: high
  • risk: medium

Even though the action was destructive and irreversible, it was still auto-approved instead of falling back to a manual user confirmation step. The original project was then deleted.

From the user side, this is a serious safety failure in the guardian/auto-review approval path: once Codex prematurely concludes that the rewrite is complete enough, a destructive cleanup command can be auto-approved under a combination like risk=medium and user_authorization=high, with no mandatory fallback to explicit human confirmation.

What steps can reproduce the bug?

  1. Start a rewrite or migration task where the user asks Codex to rewrite an existing project into a new implementation or a new language.
  2. Enable the guardian/auto-review approval flow.
  3. Let Codex create only an initial skeleton of the replacement project, without actually reaching feature parity with the original project.
  4. Continue the session until Codex behaves as if the rewrite is already effectively complete and starts proposing cleanup actions.
  5. Observe Codex requesting a destructive command such as rm -rf <old-project>.
  6. Observe guardian/auto-review returning a combination like risk=medium and user_authorization=high, and the destructive command being auto-approved instead of being escalated to manual confirmation.

I do not have a minimal public repro yet, but the real-world pattern is:

  1. "Rewrite project A as project B."
  2. Codex creates only the skeleton of B.
  3. Codex behaves as if the rewrite is already in a safe cleanup/finalization phase.
  4. Codex proposes deleting A.
  5. Guardian/auto-review returns a medium-risk / high-authorization assessment and the deletion is auto-approved.

What is the expected behavior?

For destructive actions, especially recursive deletion of an existing project, Codex CLI should support a hard safety boundary that prevents automatic approval above a configurable guardian risk threshold.

At minimum, I would expect one of these behaviors:

  1. A config option such as guardian_max_auto_allow_risk = "low" | "medium" | ... that says: if guardian risk is above this threshold, do not auto-approve and always fall back to the standard manual approval UI.
  2. A built-in default that always requires manual approval for obviously destructive operations such as recursive deletion, even if user intent is assessed as high.
  3. A distinct guardian outcome like "requires manual approval" so the UI/history does not treat this case as a normal denial or a silent auto-approval.

In short: guardian should be allowed to say "this seems authorized, but it is still too risky to auto-approve."

Additional information

I have been experimenting with a local patch to address this by introducing a configurable maximum risk level for auto-approval and by falling back to manual approval when guardian returns allow but the assessed risk exceeds that threshold.

The main idea is:

  • keep guardian review
  • allow guardian to classify an action as authorized
  • but require manual approval when risk is above a configured ceiling

That seems closer to the expected safety model for destructive commands.

Version/naming note: this incident happened while using the experimental guardian approval flow. In current alpha builds, the user-facing name seems to be Auto-review, while the underlying code/config still refers to guardian_approval and approvals_reviewer = "guardian_subagent".

This report is based on a real incident that actually happened to me; it is not a hypothetical scenario.

Drafted with assistance from Codex CLI.

extent analysis

TL;DR

To address the safety failure in the guardian/auto-review approval flow, introduce a configurable maximum risk level for auto-approval, falling back to manual approval when the assessed risk exceeds this threshold.

Guidance

  • Review the guardian/auto-review approval flow configuration to understand the current risk assessment and approval thresholds.
  • Consider introducing a guardian_max_auto_allow_risk config option to set a maximum risk level for auto-approval, ensuring that destructive actions like recursive deletion require manual approval above this threshold.
  • Evaluate the effectiveness of the proposed local patch that introduces a configurable maximum risk level for auto-approval and falls back to manual approval when the assessed risk exceeds this threshold.
  • Assess the feasibility of implementing a distinct guardian outcome like "requires manual approval" to handle high-risk actions.

Example

# Example config option to set a maximum risk level for auto-approval
guardian_max_auto_allow_risk = "low"

# Example logic to fall back to manual approval when risk exceeds the threshold
if assessed_risk > guardian_max_auto_allow_risk:
    # Require manual approval
    print("Manual approval required due to high risk")
else:
    # Proceed with auto-approval
    print("Auto-approval granted")

Notes

The proposed solution relies on introducing a configurable maximum risk level for auto-approval, which may require updates to the guardian/auto-review approval flow configuration and logic. The effectiveness of this solution should be evaluated and tested to ensure it addresses the safety failure.

Recommendation

Apply the proposed workaround by introducing a configurable maximum risk level for auto-approval, as it provides a clear and effective way to address the safety failure in the guardian/auto-review approval flow. This approach allows for a more nuanced risk assessment and ensures that high-risk actions require manual approval, enhancing the overall safety of the system.

Vote matrix ยท Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loadingโ€ฆ

Still need to ship something?

ร—6

Another batch ranked right after the header list โ€” different links, same matching logic.

Back to top recommendations

TRENDING

codex - ๐Ÿ’ก(How to fix) Fix Destructive cleanup can be auto-approved during premature rewrite "finalization" under guardian auto-review [1 comments, 2 participants]