codex - 💡(How to fix) Fix Recurrent instruction-to-UI leakage in frontier GPT models: development criteria and agent instructions appear verbatim in final user-facing copy [3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#17224Fetched 2026-04-10 03:43:46
View on GitHub
Comments
3
Participants
4
Timeline
6
Reactions
0
Timeline (top)
commented ×3labeled ×2closed ×1

This is not about a single isolated example. The concrete shell-script case below is only one instance of a broader, recurrent failure mode across recent GPT frontier models, largely independent of reasoning effort.

The recurring issue is that the model fails to reliably distinguish between:

  • instructions to the agent/model about how to implement something
  • design/development criteria
  • literal end-user copy that should appear in the final product

In practice, this causes process instructions, implementation notes, or development criteria to leak into the generated artifact as if they were user-facing text.

This is especially damaging when building user interfaces. It is routine to find strings that belong to the development process directly rendered into final HTML views, often inside h* headings or p elements.

Root Cause

This is not just a cosmetic copy bug.

It is a prompt-boundary failure that degrades product quality in a systematic way:

  • implementation instructions are mistaken for UX copy
  • developer-facing constraints leak into end-user surfaces
  • generated UI feels prompt-shaped rather than intentionally designed
  • cleanup cost is high because these errors are semantically wrong, not just stylistically weak

The issue is particularly harmful in UI work because once these strings land in headings, labels, helper text, cards, or paragraphs, they distort the product itself rather than merely the implementation.

Fix Action

Fix / Workaround

Possible mitigations:

RAW_BUFFERClick to expand / collapse

Summary

This is not about a single isolated example. The concrete shell-script case below is only one instance of a broader, recurrent failure mode across recent GPT frontier models, largely independent of reasoning effort.

The recurring issue is that the model fails to reliably distinguish between:

  • instructions to the agent/model about how to implement something
  • design/development criteria
  • literal end-user copy that should appear in the final product

In practice, this causes process instructions, implementation notes, or development criteria to leak into the generated artifact as if they were user-facing text.

This is especially damaging when building user interfaces. It is routine to find strings that belong to the development process directly rendered into final HTML views, often inside h* headings or p elements.

Why this matters

This is not just a cosmetic copy bug.

It is a prompt-boundary failure that degrades product quality in a systematic way:

  • implementation instructions are mistaken for UX copy
  • developer-facing constraints leak into end-user surfaces
  • generated UI feels prompt-shaped rather than intentionally designed
  • cleanup cost is high because these errors are semantically wrong, not just stylistically weak

The issue is particularly harmful in UI work because once these strings land in headings, labels, helper text, cards, or paragraphs, they distort the product itself rather than merely the implementation.

Recurrent Pattern

Across recent GPT models, including the newest frontier variants, it is common to see behavior like:

  • "criteria" text showing up as visible labels
  • implementation notes becoming explanatory paragraphs in the final page
  • developer-oriented caveats rendered into headings or content blocks
  • instructions about permission handling, validation, fallback behavior, or structure turned into literal UI copy

This happens even when the user's intent is clearly about implementation behavior, not about the exact wording to render.

Concrete Example

In one recent Codex interaction, the user asked for a CLI script that would toggle FortiClient auto-reconnection with on and off.

The user also said, in effect: if the script needs sudo, make that clear; if not, handle it appropriately.

Codex generated a script that printed:

No hace falta sudo.

in normal success-path output for commands like status, on, and off.

That string was not appropriate end-user copy for the artifact. It was a mistaken literalization of an instruction about behavior and permissions handling.

This specific case was corrected manually, but the important point is that the same class of failure repeatedly appears in UI generation:

  • a requirement about implementation is transformed into visible copy
  • a design or development criterion becomes rendered content
  • model instructions bleed into final user-facing output

UI-Specific Examples of the Same Failure Class

The most damaging versions happen in generated HTML/UI work, where the model places development-time criteria directly into visible elements such as:

  • h1, h2, h3
  • p
  • helper text
  • section intros
  • empty-state copy
  • cards and labels

Examples of the kind of leakage seen in practice:

  • headings that read like implementation goals
  • paragraphs that explain what the page "should do" rather than what the user needs
  • copy that contains development constraints, fallback notes, validation notes, or design-system instructions
  • visible text that sounds like a prompt annotation instead of product copy

Expected Behavior

The model should robustly distinguish among at least these categories:

  • implementation constraint
  • internal reasoning or execution instruction
  • product behavior requirement
  • literal UX/UI copy
  • operator/developer note

A requirement should not become visible copy unless there is strong evidence that the user intended it as actual displayed text.

For UI generation specifically, the model should be conservative about placing explanatory text into visible h* and p nodes unless that copy is clearly product-facing.

Actual Behavior

Recent GPT frontier models recurrently collapse these categories.

As a result:

  • agent instructions leak into generated artifacts
  • development criteria appear in final copy
  • UI strings contain process-language rather than product-language
  • the artifact reflects the prompt structure instead of the user intent

Suggested Direction

This likely needs a stronger generation-time boundary check before emitting user-facing text.

Possible mitigations:

  • classify instructions before generation into implementation constraint vs visible copy
  • require stronger evidence before converting requirements into rendered strings
  • add a dedicated "user-facing copy eligibility" check for generated UI text
  • apply extra scrutiny to visible HTML nodes such as h*, p, labels, buttons, and helper text
  • bias toward minimal visible copy unless the user explicitly requests wording

Scope

Again, this report is not about one shell-script message.

That example is just an easy reproduction of a broader recurrent issue:

  • affects Codex/codegen workflows
  • affects latest/frontier GPT models as well
  • appears across effort levels
  • is especially destructive in UI implementation because leakage lands directly in rendered user experiences

Reproduction Heuristic

This can often be reproduced by asking the model to build a UI or CLI artifact while also giving it a mix of:

  • implementation constraints
  • UX intent
  • operational caveats
  • conditional behavior notes

The model too often promotes some of those instructions into final copy.

Requested Outcome

Please treat this as a model-quality issue around instruction-boundary handling, not as a one-off wording bug.

The key problem is recurrent instruction leakage into final user-facing artifacts, particularly generated interfaces.

extent analysis

TL;DR

The model should be modified to include a stronger generation-time boundary check to distinguish between implementation instructions and literal user-facing copy.

Guidance

  • Implement a classification system to categorize instructions into implementation constraints and visible copy before generation.
  • Require stronger evidence before converting requirements into rendered strings, especially for visible HTML nodes such as h*, p, labels, buttons, and helper text.
  • Add a dedicated "user-facing copy eligibility" check for generated UI text to prevent instruction leakage.
  • Bias toward minimal visible copy unless the user explicitly requests wording to reduce the likelihood of instruction leakage.
  • Apply extra scrutiny to visible HTML nodes to ensure they only contain product-facing copy.

Example

A possible implementation could involve adding a preprocessing step to classify instructions and a post-processing step to filter out non-user-facing copy. For example:

def classify_instructions(prompt):
    # Classify instructions into implementation constraints and visible copy
    implementation_constraints = []
    visible_copy = []
    # ...
    return implementation_constraints, visible_copy

def filter_copy(copy):
    # Filter out non-user-facing copy
    user_facing_copy = []
    for text in copy:
        if is_user_facing(text):
            user_facing_copy.append(text)
    return user_facing_copy

def is_user_facing(text):
    # Determine if text is user-facing
    # ...
    return True or False

Notes

The provided solution is a general outline and may require modifications to fit the specific model architecture and requirements. The key is to implement a robust boundary check to prevent instruction leakage into final user-facing artifacts.

Recommendation

Apply a workaround by implementing a stronger generation-time boundary check to distinguish between implementation instructions and literal user-facing copy. This will help prevent instruction leakage into final user-facing artifacts, particularly generated interfaces.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Recurrent instruction-to-UI leakage in frontier GPT models: development criteria and agent instructions appear verbatim in final user-facing copy [3 comments, 4 participants]