codex - 💡(How to fix) Fix App-server sessions need stable prompt-cache key support for identical startup context

codex2026-05-08 18:29:38

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

0.129.0

Which model were you using?

gpt-5.3-codex-spark

What platform is your computer?

Windows, using Codex app-server through an external multi-session orchestrator.

What issue are you seeing?

Independent fresh Codex app-server sessions do not appear to get full first-turn prompt-cache reuse even when the client sends the same stable startup context byte-for-byte.

From local instrumentation in a multi-session app-server workflow:

Session A first turn: 11,749 input tokens, 6,528 cached input tokens (~55.6%).
Session B first turn: 11,749 input tokens, 6,400 cached input tokens (~54.5%).
The model-visible first-turn startup prefix sent by the client was byte-identical across both sessions.
Both sessions were independent fresh Codex app-server threads with different Codex thread IDs.

The likely cause is that Codex currently derives prompt_cache_key from the unique thread id:

https://github.com/openai/codex/blob/872b8b15b38acbcc19457ef96b171819a56206db/codex-rs/core/src/client.rs#L713-L728

That is reasonable for cache reuse within a single long-lived thread, but it prevents app-server clients from intentionally sharing cache for stable startup/base context across multiple independent sessions. This is especially visible for multi-session / multi-agent orchestrators that start several fresh Codex sessions with identical base instructions, developer instructions, project context, and first-turn setup.

I did not find a current app-server option or schema field that lets the client provide a stable prompt cache key or split the cache key for the reusable startup prefix.

This is related to, but more specific than, the broader low-cache-hit issue in #20301. That issue appears to cover general GPT-5.5 cache-hit anomalies; this report is about the app-server architecture using per-thread cache keys for otherwise identical fresh sessions.

What steps can reproduce the bug?

Start two independent fresh Codex app-server sessions for the same project.
Use the same model and runtime configuration for both sessions.
Send the same startup/base/developer/project context in the first turn of each session.
Confirm locally that the model-visible first-turn prefix is byte-identical across the two sessions.
Compare input_tokens and cached_input_tokens for the first completed turn of each session.

In the observed case, both sessions had exactly 11,749 input tokens, but only about 55% of input tokens were reported as cached, despite the stable prefix being byte-identical.

What is the expected behavior?

App-server clients should have a way to opt into stable prompt-cache reuse for stable startup/base context across independent fresh sessions.

Possible fixes:

expose a client-provided prompt_cache_key / cache namespace for app-server sessions;
allow app-server clients to provide a stable cache key for base/developer/startup context while keeping per-thread state isolated;
or otherwise avoid tying reusable startup-context caching exclusively to the unique thread id.

The important property is that two independent sessions with identical stable startup context can share cache for that prefix without sharing conversation state.

Additional information

A previous PR appears to have explored passing a prompt cache key when starting Codex, but it was closed without merging: #5949.

Related cache-key propagation work for compaction: #21249.

Related fork prompt-cache-state work: #20909.

If maintainers want private thread/session IDs or uploaded feedback bundles for the two observed sessions, I can provide them separately. I avoided putting those IDs in this public issue body.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #ssr #memory optimization #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

codex - 💡(How to fix) Fix App-server sessions need stable prompt-cache key support for identical startup context

Recommended Tools

GitHub issue graph ai analysis

What version of Codex CLI is running?

Which model were you using?

What platform is your computer?

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Still need to ship something?

TRENDING

codex - 💡(How to fix) Fix App-server sessions need stable prompt-cache key support for identical startup context

Recommended Tools

GitHub issue graph ai analysis

What version of Codex CLI is running?

Which model were you using?

What platform is your computer?

What issue are you seeing?

What steps can reproduce the bug?

What is the expected behavior?

Additional information

Still need to ship something?

RELATED_DISCOVERY

TRENDING