codex - 💡(How to fix) Fix Runtime transparency inconsistencies affecting Codex-adjacent ChatGPT sessions: model metadata, memory visibility, personalization weighting, and surface divergence [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#21297Fetched 2026-05-07 03:42:28
View on GitHub
Comments
0
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×5

This issue reports runtime/product transparency inconsistencies observed in a ChatGPT GPT-5.5 session on May 6, 2026, including behavior that occurred before the session auto-switched from GPT-5.5 Instant to GPT-5.5 Thinking. I am filing this here because the observed behavior is not limited to general chat quality: it directly affects Codex-adjacent workflows where users rely on stable model identity, memory/context application, tool behavior, and consistent instruction-following across ChatGPT, Codex, and GitHub-connected sessions.

The assistant itself identified and acknowledged several inconsistencies during the session, including while operating in the faster/non-thinking mode before automatic escalation to the thinking environment. This is notable because the diagnostic behavior was not limited to a deliberate deep-reasoning pass; the assistant was already able to recognize and articulate product/runtime discrepancies at the standard conversational layer. This issue records those observations for investigation or routing to the appropriate OpenAI product/runtime team.

Root Cause

This is not just a UX polish issue. It affects trust and correctness in long-running technical workflows. Codex and GitHub-connected work often depends on precise instruction-following, stable context, current documentation, and clear tool/model boundaries.

When the assistant can see saved memory but does not reliably follow it, and when the UI no longer shows whether memory was used, users reasonably conclude that memory or personalization is broken. When the assistant gives conflicting knowledge cutoff information, users cannot know whether current technical advice is reliable.

The fact that these inconsistencies were identified not only after escalation into GPT-5.5 Thinking but also during the preceding GPT-5.5 Instant conversational layer makes the lack of user-facing runtime transparency more important. The model can display advanced meta-diagnostic behavior, but the product does not give users an official way to inspect the runtime facts behind that behavior.

The product should not require users to reverse-engineer runtime state through repeated questioning. The runtime should disclose enough non-sensitive session information for users to understand what they are actually interacting with.

RAW_BUFFERClick to expand / collapse

Summary

This issue reports runtime/product transparency inconsistencies observed in a ChatGPT GPT-5.5 session on May 6, 2026, including behavior that occurred before the session auto-switched from GPT-5.5 Instant to GPT-5.5 Thinking. I am filing this here because the observed behavior is not limited to general chat quality: it directly affects Codex-adjacent workflows where users rely on stable model identity, memory/context application, tool behavior, and consistent instruction-following across ChatGPT, Codex, and GitHub-connected sessions.

The assistant itself identified and acknowledged several inconsistencies during the session, including while operating in the faster/non-thinking mode before automatic escalation to the thinking environment. This is notable because the diagnostic behavior was not limited to a deliberate deep-reasoning pass; the assistant was already able to recognize and articulate product/runtime discrepancies at the standard conversational layer. This issue records those observations for investigation or routing to the appropriate OpenAI product/runtime team.

Environment

  • Product surface: ChatGPT web on iOS browser
  • Plan: Plus
  • Model shown/used: GPT-5.5 with automatic switching enabled
  • Observed mode sequence: GPT-5.5 Instant initially, followed by automatic switch/escalation to GPT-5.5 Thinking
  • Date observed: May 6, 2026
  • Connected tool context: GitHub connector available and used during the reporting conversation
  • Relevant downstream workflow: Codex/GitHub-assisted development and repository maintenance, where predictable runtime metadata and instruction adherence matter

Observed Issues

1. Knowledge cutoff / model metadata mismatch

The assistant initially stated that its built-in knowledge cutoff was January 2025. When challenged, it acknowledged that public GPT-5.5 documentation may indicate a later cutoff, while the live runtime/session metadata available to the assistant stated an earlier cutoff. In the broader system context, another cutoff value may also be present, creating further ambiguity.

This creates a user-facing contradiction between:

  • the model name shown in ChatGPT;
  • public model documentation;
  • runtime/system metadata available to the assistant;
  • actual behavior, which may sometimes appear more current than the stated cutoff.

Impact for Codex-adjacent workflows:

When using ChatGPT or Codex to reason about current OpenAI docs, SDK behavior, model migration guidance, GitHub workflows, or repository configuration, the user cannot tell whether the assistant is using current knowledge, stale metadata, live tools, or latent inference.

2. Memory appears internally available but externally under-disclosed

The assistant stated that saved memory/context was internally loaded and visible to it, including long-term user preferences related to:

  • output format;
  • coding and prompt-writing workflows;
  • GitHub/Codex repository practices;
  • tone and brevity preferences;
  • legal/professional drafting preferences;
  • PDF/tooling workflows;
  • recruiter and interview formats.

However, the user no longer consistently sees UI indicators showing that memory was used. Previously, memory usage appeared more visibly surfaced in the product.

Impact:

The user cannot distinguish among:

  • memory not being loaded;
  • memory being loaded but underweighted;
  • memory being overridden by higher-priority instructions;
  • memory being applied silently without UI disclosure;
  • model drift or product experiment behavior.

This reduces trust in long-running Codex/project workflows, where memory and project continuity are expected to stabilize output.

3. Personalization weighting is inconsistent

The assistant acknowledged that user preferences are present but not always applied reliably. For example, the user has stable saved preferences requiring:

  • direct, non-patronizing communication;
  • no therapeutic framing unless expressly requested;
  • concise, high-signal answers;
  • specific legal and professional drafting registers;
  • strict formatting rules for prompts, code blocks, and interview answers;
  • not rewriting already-sent messages unless explicitly asked.

Despite those preferences, answers can drift into generic product-safety tone, overexplaining, corporate politeness, emotional cushioning, or formatting deviations.

Impact for Codex:

Codex-style work depends on exact instruction adherence: repository edits, prompts, audits, migration plans, and code reviews can materially degrade when stable instructions are underweighted or overridden without explanation.

4. Surface divergence across ChatGPT, Codex, GitHub tools, voice, and API documentation

The assistant acknowledged that different OpenAI surfaces may expose different model labels, routing behavior, tool access, memory behavior, and runtime metadata.

Examples of problematic divergence:

  • ChatGPT may show one model identity while runtime metadata suggests another snapshot or cutoff.
  • API documentation may describe one model behavior while ChatGPT runtime behaves differently.
  • Voice mode, ChatGPT web/app, Codex, and GitHub-connected sessions may not clearly disclose their effective runtime configuration.
  • Tool availability and memory visibility vary by surface.
  • Auto-switching can move a conversation from Instant to Thinking without clearly exposing which observations, tool calls, or reasoning statements came from which mode.

Impact:

Users cannot reliably know whether a behavioral difference is caused by:

  • model routing;
  • a stale runtime wrapper;
  • a product experiment;
  • tool availability;
  • memory application;
  • safety/runtime policy weighting;
  • Instant vs. Thinking mode;
  • or a genuine model limitation.

5. The assistant can infer inconsistencies, but the product does not expose them cleanly

During the conversation, the assistant was able to infer or explain several likely inconsistencies, including:

  • cutoff metadata mismatch;
  • memory loaded but not visibly cited;
  • safety/product-tone layers competing with saved user preferences;
  • inconsistent behavior across OpenAI product surfaces;
  • dynamic routing or policy-weighting differences across sessions;
  • web freshness versus latent model knowledge ambiguity;
  • ambiguity caused by automatic switching between GPT-5.5 Instant and GPT-5.5 Thinking.

However, the user has no stable UI mechanism to inspect this session state directly.

Impact:

The user must interrogate the assistant to reconstruct runtime state. That is fragile and unreliable. A user should not need the assistant to identify runtime inconsistencies conversationally in order to understand which configuration is active.

Requested Improvements

Please consider the following improvements, especially for Codex-adjacent ChatGPT workflows:

  1. Expose the effective model snapshot and knowledge cutoff for the active session.
  2. Clarify when ChatGPT runtime metadata differs from public API/model documentation.
  3. Restore or improve visible memory/context usage indicators where memory materially influences output.
  4. Provide a user-facing explanation when saved memory is loaded but overridden, underweighted, or deemed irrelevant.
  5. Improve personalization weighting so stable saved preferences are followed more consistently unless a clear higher-priority constraint applies.
  6. Add a session diagnostics panel or exportable debug summary showing non-sensitive runtime facts, such as:
    • model family;
    • effective runtime/snapshot label;
    • active mode, including Instant vs. Thinking;
    • whether auto-switching occurred;
    • active tools;
    • memory enabled/disabled status;
    • whether memory was consulted;
    • knowledge cutoff metadata;
    • browsing/tool availability;
    • whether the session is in a special routing or experiment bucket.
  7. Clarify how ChatGPT, Codex, API, voice mode, and GitHub-connected sessions differ in runtime configuration.
  8. When auto-switching is enabled, make it clearer which model/mode produced a given answer or diagnostic statement.

Why this matters

This is not just a UX polish issue. It affects trust and correctness in long-running technical workflows. Codex and GitHub-connected work often depends on precise instruction-following, stable context, current documentation, and clear tool/model boundaries.

When the assistant can see saved memory but does not reliably follow it, and when the UI no longer shows whether memory was used, users reasonably conclude that memory or personalization is broken. When the assistant gives conflicting knowledge cutoff information, users cannot know whether current technical advice is reliable.

The fact that these inconsistencies were identified not only after escalation into GPT-5.5 Thinking but also during the preceding GPT-5.5 Instant conversational layer makes the lack of user-facing runtime transparency more important. The model can display advanced meta-diagnostic behavior, but the product does not give users an official way to inspect the runtime facts behind that behavior.

The product should not require users to reverse-engineer runtime state through repeated questioning. The runtime should disclose enough non-sensitive session information for users to understand what they are actually interacting with.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING