codex - 💡(How to fix) Fix Cyber-safety filter still triggers on Codex Business plan after individual chatgpt.com/cyber verification (verified identity, OWN application code) [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#22554Fetched 2026-05-14 03:34:31
View on GitHub
Comments
0
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
labeled ×2

I am performing this as part of a continuous security validation workflow for an HR-tech / recruiting platform that holds candidate PII (passports, ID scans, employment records). We have GDPR obligations to demonstrate ongoing security due diligence. The audit is on OUR OWN code; no third-party system is targeted; no offensive payloads are generated by the prompts.

We previously had ONE clean successful adversarial-review session (519 K + 760 KB analytical output earlier on 2026-05-12) that delivered 15 distinct defensive findings on our StartEuropa application. We are using those findings to ship security fixes right now. Today's blockage is preventing the same workflow on our sister BridgeTest codebase.

We are also building an open-source (Apache-2.0) security-review-pro-mcp pipeline that uses GPT-5.5 calls in this mode to run adversarial reviews on customer-owned codebases.

Error Message

  • Terminated with: ERROR: This content was flagged for possible cybersecurity risk. If this seems wrong, try rephrasing your request. To get authorized for security work, join the Trusted Access for Cyber program: https://chatgpt.com/cyber

Root Cause

I am performing this as part of a continuous security validation workflow for an HR-tech / recruiting platform that holds candidate PII (passports, ID scans, employment records). We have GDPR obligations to demonstrate ongoing security due diligence. The audit is on OUR OWN code; no third-party system is targeted; no offensive payloads are generated by the prompts.

We previously had ONE clean successful adversarial-review session (519 K + 760 KB analytical output earlier on 2026-05-12) that delivered 15 distinct defensive findings on our StartEuropa application. We are using those findings to ship security fixes right now. Today's blockage is preventing the same workflow on our sister BridgeTest codebase.

We are also building an open-source (Apache-2.0) security-review-pro-mcp pipeline that uses GPT-5.5 calls in this mode to run adversarial reviews on customer-owned codebases.

Fix Action

Fix / Workaround

I expected to be able to run a defensive adversarial review of my own Laravel application code owned by my own legal entity (Start Europa sp. z o.o.). The codebase is local, the workspace is read-only, the prompt is framed as senior pentester reviewing OUR application's AI integration for blindspots so we can patch them.

RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

codex-cli 0.124.0

What subscription do you have?

Codex Business (Team plan, 2 seats)

Which model were you using?

gpt-5.5 with reasoning effort xhigh, context tag adversarial-review

What platform is your computer?

Windows 11

What terminal emulator and version are you using (if applicable)?

Git Bash via Claude Code orchestration

What did you expect to happen?

I expected to be able to run a defensive adversarial review of my own Laravel application code owned by my own legal entity (Start Europa sp. z o.o.). The codebase is local, the workspace is read-only, the prompt is framed as senior pentester reviewing OUR application's AI integration for blindspots so we can patch them.

I have already completed individual identity verification at chatgpt.com/cyber on my user account ([email protected]) earlier today after a similar flag occurred yesterday. I expected that verification to lift the cyber-safety filter.

What actually happened?

Two separate flagged sessions, even AFTER individual identity verification was completed.

Incident 1 (2026-05-12, pre-verification):

  • Session ID: 019e1c28-2858-7720-80b3-c6ce89753a4a (related thread; specific failed sub-thread b2dkiy9gq)
  • Prompt: adversarial security review of OUR BridgeTest Laravel codebase, workspace at C:\Users\micha\gpt-workspace\bridgetest-readonly\
  • Approx 367 K input tokens consumed during workspace exploration
  • Terminated with: ERROR: This content was flagged for possible cybersecurity risk. If this seems wrong, try rephrasing your request. To get authorized for security work, join the Trusted Access for Cyber program: https://chatgpt.com/cyber

I then completed identity verification at chatgpt.com/cyber as instructed.

Incident 2 (2026-05-13, POST-verification):

  • Codex CLI background task: b3dun7y33
  • Same prompt structure as a prior successful StartEuropa adversarial-review run from 2026-05-12
  • Workspace: same C:\Users\micha\gpt-workspace\bridgetest-readonly\ (read-only copy of OUR own code)
  • The model began executing normally, ran 2 PowerShell exec calls to enumerate AI/Chatbot controllers, then was terminated with the SAME flag message after only ~6 K tokens

This is the second incident, post-verification. Identity verification at chatgpt.com/cyber clearly does NOT propagate to Codex Business API requests authenticated under the organization's plan.

The classifier appears to over-trigger on standard pentester vocabulary in our prompt: "adversarial", "exploit", "POC", "attack", "bypass". These are unavoidable when describing a defensive code review.

Context

I am performing this as part of a continuous security validation workflow for an HR-tech / recruiting platform that holds candidate PII (passports, ID scans, employment records). We have GDPR obligations to demonstrate ongoing security due diligence. The audit is on OUR OWN code; no third-party system is targeted; no offensive payloads are generated by the prompts.

We previously had ONE clean successful adversarial-review session (519 K + 760 KB analytical output earlier on 2026-05-12) that delivered 15 distinct defensive findings on our StartEuropa application. We are using those findings to ship security fixes right now. Today's blockage is preventing the same workflow on our sister BridgeTest codebase.

We are also building an open-source (Apache-2.0) security-review-pro-mcp pipeline that uses GPT-5.5 calls in this mode to run adversarial reviews on customer-owned codebases.

Specific product feedback (echoing the closed issue #19594)

  1. Individual chatgpt.com/cyber identity verification does NOT carry over to Codex Business plan API requests. The documentation does not make this clear. Please either fix it or document the requirement explicitly.

  2. Filter triggers on prompt vocabulary, not on intent or workspace ownership. Standard pentester / defensive-research vocabulary (adversarial, exploit, POC, attack, bypass) trips the classifier even when the prompt explicitly says "review OUR application code at THIS local read-only path".

  3. Token cost during a flagged session is fully charged to the customer. Cumulative ~373 K tokens over two incidents with zero useful output for us. Per OpenAI's public $10M commitment to accelerate cyber defense, false-positive token burn on verified-identity legitimate defensive workflows should be eligible for credit refund.

  4. No interim allowlist for verified users. Once an account has cleared identity verification, the system should offer at least a 24-72h cooldown period where requests from that user are routed through a stronger semantic adjudication before the cheap classifier flag.

  5. No private diagnostics channel. Our workspace contains internal application code we cannot post publicly here. We need a private channel (or a guarantee that session IDs are sufficient for OpenAI staff to investigate without the user posting source).

Resolution we are seeking

  • Confirm the path to lift the cyber-safety filter for the entire Codex Business plan (likely enterprise track at openai.com/form/enterprise-trusted-access-for-cyber/).
  • Refund the token cost of the two flagged sessions per the $10M cyber-defense API credit commitment.
  • Document plan-tier vs user-tier verification behavior clearly so other Business customers know what to expect.

CC: I have also sent this via [email protected] and [email protected] with the same session/thread IDs.

Happy to provide the full prompt text, output sample from the prior successful run, and our open-source pipeline repo on request.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Cyber-safety filter still triggers on Codex Business plan after individual chatgpt.com/cyber verification (verified identity, OWN application code) [1 participants]