openclaw - 💡(How to fix) Fix [Bug]: Codex app-server rewrites prompt on tool-call continuation turns, busting OpenAI prompt cache mid-turn (cache ratio 93% → 47%) [1 comments, 2 participants]

Q: Expected behavior

Pre-update behavior (≤ 2026-05-14, codex-cli < 0.130.0): Codex app-server appends tool_call_output to the existing cached prompt prefix, preserving the OpenAI prompt cache across tool-call continuations. Session-wide cache ratio: ~93% (measured across 15 sessions, 2026-05-10 to 2026-05-14).

Fix Action

Fix / Workaround

Note: PI-runtime workaround was attempted (agentRuntime.id: "pi") but was not effective in this environment due to env-OPENAI_API_KEY taking precedence over the OAuth profile in the PI code path (/status showed api-key instead of oauth despite registered openai-codex profile). This is a separate issue and NOT the subject of this report.

No effective client-side workaround found. PI-runtime path not available in this environment (see provider setup details above).

Code Example

Session-level cache bust pattern (post-update, representative session):
    Turn 1 (user message):       cached=1, input=~38k tokens
    Turn 2 (exec tool call):     cached=1
    Turn 3 (tool continuation):  cached=0, input=~36.9k tokens  ← bust, -1.1k tokens
    Turn 4 (user message):       cached=0
    Turn 5 (exec tool call):     cached=0
    Turn 6 (tool continuation):  cached=0  ← never recovers
   
  Pre-update comparison (same model, same config, codex-cli < 0.130.0):
    All continuation turns: cached=1, no token-count shrink observed.

  Aggregate (35 sessions):
    Pre-update  (15 sessions, 2026-05-10–14): cache ratio ~93%
    Post-update (20 sessions, 2026-05-15–18): cache ratio ~47–49%
    Mid-turn bust rate post-update: 100% of turns following custom_tool_call name=exec

  Analysis performed with: /tmp/bust_analyzer.py (Codex trajectory format), /tmp/post_aggregator.py (multi-session aggregation).

Bug type

Regression (worked before, now fails)

Beta release blocker

Summary

Since OC v2026.5.12 + codex-cli 0.130.0 (first observed 2026-05-15), the Codex app-server rewrites the full prompt between a user turn and its tool-call continuation turn instead of appending only the tool_call_output, invalidating the OpenAI prompt cache on every exec tool call and raising effective per-token costs by ~3.5×.

Steps to reproduce

Run OpenClaw 2026.5.12 with model openai/gpt-5.4, Codex runtime (default, no agentRuntime.id override), auth via openai-codex OAuth profile.
Send any message that triggers a custom_tool_call name=exec (e.g. a shell command via a cron job or assistant task).
Observe cached=0 on the turn immediately following the exec tool call.
Note that the input token count on the continuation turn is ~1.1k tokens smaller than on the preceding user turn — indicating a full prompt reorganization, not a simple append.
Repeat across multiple tool calls in the same session: every tool-call continuation shows cached=0; cache never recovers within the session.
Compare against a pre-update session (codex-cli < 0.130.0, same model/config): continuation turns show cached=1, cache ratio ~93%.

Expected behavior

Pre-update behavior (≤ 2026-05-14, codex-cli < 0.130.0): Codex app-server appends tool_call_output to the existing cached prompt prefix, preserving the OpenAI prompt cache across tool-call continuations. Session-wide cache ratio: ~93% (measured across 15 sessions, 2026-05-10 to 2026-05-14).

Actual behavior

Post-update (≥ 2026-05-15, OC v2026.5.12 / codex-cli 0.130.0): On every turn following custom_tool_call name=exec (or function_call:wait), the Codex app-server rewrites the full prompt rather than appending. Result: cached=0 on all tool-call continuation turns. Input token count shrinks by ~1.1k tokens vs. the preceding turn — a reorganization signal, not an append.

Session-wide cache ratio drops from ~93% to ~47–49%. Effective per-token cost is ~3.5× higher.

Root cause confirmed NOT to be dev_instructions / inbound_meta variation: all turn_contexts in the affected sessions are byte-identical, dev_len=37177 constant across all 9 cached=0 bust events in the analyzed problem session.

Evidence: 35 sessions analyzed (2026-05-10 to 2026-05-18). 100% of mid-turn cache busts occur immediately after custom_tool_call name=exec or function_call:wait. Zero mid-turn busts in pre-update sessions.

OpenClaw version

2026.5.12 +

Operating system

macOS 26.3.1

Install method

docker / orbstack

Model

openai/gpt-5.x

Provider / routing chain

openclaw -> openai-codex (ChatGPT-Plus OAuth) -> OpenAI Codex runtime

Additional provider/model setup details

Auth profile: openai-codex (ChatGPT-Plus subscription OAuth — not API key). Runtime: Codex-Default (agentRuntime.id not set; Codex app-server is the effective runtime). thinkingDefault: medium, primary model openai/gpt-5.4 (alias GPT5). OPENAI_API_KEY env var present in environment but session auth resolves via OAuth profile.

Logs, screenshots, and evidence

Session-level cache bust pattern (post-update, representative session):
    Turn 1 (user message):       cached=1, input=~38k tokens
    Turn 2 (exec tool call):     cached=1
    Turn 3 (tool continuation):  cached=0, input=~36.9k tokens  ← bust, -1.1k tokens
    Turn 4 (user message):       cached=0
    Turn 5 (exec tool call):     cached=0
    Turn 6 (tool continuation):  cached=0  ← never recovers
   
  Pre-update comparison (same model, same config, codex-cli < 0.130.0):
    All continuation turns: cached=1, no token-count shrink observed.

  Aggregate (35 sessions):
    Pre-update  (15 sessions, 2026-05-10–14): cache ratio ~93%
    Post-update (20 sessions, 2026-05-15–18): cache ratio ~47–49%
    Mid-turn bust rate post-update: 100% of turns following custom_tool_call name=exec

  Analysis performed with: /tmp/bust_analyzer.py (Codex trajectory format), /tmp/post_aggregator.py (multi-session aggregation).

Impact and severity

Affected: All users running openai/gpt-5.4 (or any openai/gpt-* model) via Codex runtime with frequent exec/shell tool calls — particularly personal-assistant and cron-based workloads. Severity: High — blocks efficient use of ChatGPT-Plus subscription quota. Frequency: 100% reproducible post-update (every session, every exec tool call). Consequence: Effective per-token cost ~3.5× higher than pre-update; ChatGPT-Plus quota depleted ~3.5× faster for identical workloads. Users on API-key billing incur proportionally higher charges.

Additional information

Last known good: codex-cli < 0.130.0 / OC < 2026.5.12 (cache ratio ~93%) First known bad: codex-cli 0.130.0 / OC v2026.5.12 (cache ratio ~47–49%, first observed 2026-05-15)

This matches the root cause of upstream issue #64217 (OPEN): "Source-dependent instructions/tools drift invalidates OpenAI prompt cache within the same session." That issue covers inter-turn drift at the user-instruction level; this report covers the same underlying mechanism (prompt rewriting instead of appending) triggered specifically by tool-call continuations in the Codex app-server.

Related upstream issues for context: #66251 (umbrella), #82710 (app-server probe requests), #80131 (per-request tool bundling), #63030 (same pattern on Anthropic-side).

No effective client-side workaround found. PI-runtime path not available in this environment (see provider setup details above).

FAQ

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Codex app-server rewrites prompt on tool-call continuation turns, busting OpenAI prompt cache mid-turn (cache ratio 93% → 47%) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Codex app-server rewrites prompt on tool-call continuation turns, busting OpenAI prompt cache mid-turn (cache ratio 93% → 47%) [1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

RELATED_DISCOVERY

TRENDING