codex - 💡(How to fix) Fix Goal sessions can pin cached_input_tokens to small fixed prefix (e.g. 2432)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In long-running goal-driven threads, last_token_usage.cached_input_tokens can stay pinned to a small constant (commonly 2432) for many consecutive turns.

The same thread later recovers to much higher cache hits after a context_compacted event, which strongly suggests this is a prompt-stability/composition issue (prefix invalidation), not a cache service outage.


Error Message

A) Long plateau at 2432

Root Cause

When this happens, prompt-cache efficiency drops sharply for long threads, increasing token usage and latency despite substantial repeated context.


Code Example

2026-05-31T01:36:00Z  input=206240  cached=2432  output=635
2026-05-31T01:37:07Z  input=207059  cached=2432  output=199
2026-05-31T01:37:54Z  input=207829  cached=2432  output=355
2026-05-31T01:39:13Z  input=208362  cached=2432  output=37
2026-05-31T01:40:35Z  input=208455  cached=2432  output=37
2026-05-31T01:41:19Z  input=208540  cached=2432  output=224
... (many similar turns)

---

2026-05-31T02:02:40Z  token_count: cached=0
2026-05-31T02:02:40Z  event: compacted
2026-05-31T02:02:40Z  event: context_compacted

---

2026-05-31T02:03:06Z  cached=4480
2026-05-31T02:03:15Z  cached=15232
2026-05-31T02:03:31Z  cached=21376
2026-05-31T02:04:01Z  cached=26496
2026-05-31T02:04:48Z  cached=39296
2026-05-31T02:05:54Z  cached=49536
2026-05-31T02:06:40Z  cached=52608

---

{
  "threadId": "<redacted>",
  "objective": "<redacted>",
  "status": "active",
  "tokensUsed": 10039897,
  "timeUsedSeconds": 3214,
  "createdAt": 1780188796,
  "updatedAt": 1780193170
}

---

{
  "threadId": "<redacted>",
  "objective": "<redacted>",
  "status": "active",
  "tokensUsed": 10041221,
  "timeUsedSeconds": 3227,
  "createdAt": 1780188796,
  "updatedAt": 1780193183
}
RAW_BUFFERClick to expand / collapse

What version of Codex is running?

  • codex-cli 0.135.0-alpha.1
  • Also observed with gpt-5.3-codex and gpt-5.5

Platform

  • Linux x86_64

Summary

In long-running goal-driven threads, last_token_usage.cached_input_tokens can stay pinned to a small constant (commonly 2432) for many consecutive turns.

The same thread later recovers to much higher cache hits after a context_compacted event, which strongly suggests this is a prompt-stability/composition issue (prefix invalidation), not a cache service outage.


Why this matters

When this happens, prompt-cache efficiency drops sharply for long threads, increasing token usage and latency despite substantial repeated context.


Reproduction (generic)

  1. Start a thread and set an active goal (/goal) with a non-trivial objective.
  2. Continue for many turns with normal user messages and tool activity.
  3. Observe token_count events in local rollout JSONL.
  4. Track payload.info.last_token_usage.cached_input_tokens per turn.

Observed behavior

A) Long plateau at 2432

For a long span, every turn reports the same low cached prefix:

2026-05-31T01:36:00Z  input=206240  cached=2432  output=635
2026-05-31T01:37:07Z  input=207059  cached=2432  output=199
2026-05-31T01:37:54Z  input=207829  cached=2432  output=355
2026-05-31T01:39:13Z  input=208362  cached=2432  output=37
2026-05-31T01:40:35Z  input=208455  cached=2432  output=37
2026-05-31T01:41:19Z  input=208540  cached=2432  output=224
... (many similar turns)

B) Immediate recovery after compaction boundary

Same thread, around compaction:

2026-05-31T02:02:40Z  token_count: cached=0
2026-05-31T02:02:40Z  event: compacted
2026-05-31T02:02:40Z  event: context_compacted

Following turns then ramp cache quickly:

2026-05-31T02:03:06Z  cached=4480
2026-05-31T02:03:15Z  cached=15232
2026-05-31T02:03:31Z  cached=21376
2026-05-31T02:04:01Z  cached=26496
2026-05-31T02:04:48Z  cached=39296
2026-05-31T02:05:54Z  cached=49536
2026-05-31T02:06:40Z  cached=52608

This “plateau -> compaction -> ramp-up” pattern is repeatable in our logs.


Evidence that goal metadata is highly dynamic

During the plateau period, thread/goal/updated fires frequently and includes changing fields every few seconds:

  • tokensUsed
  • timeUsedSeconds
  • updatedAt

Redacted example (objective omitted):

{
  "threadId": "<redacted>",
  "objective": "<redacted>",
  "status": "active",
  "tokensUsed": 10039897,
  "timeUsedSeconds": 3214,
  "createdAt": 1780188796,
  "updatedAt": 1780193170
}

Next update shortly after:

{
  "threadId": "<redacted>",
  "objective": "<redacted>",
  "status": "active",
  "tokensUsed": 10041221,
  "timeUsedSeconds": 3227,
  "createdAt": 1780188796,
  "updatedAt": 1780193183
}

Source-level context (same version family)

Current code/templates appear consistent with this behavior:

  • codex-rs/core/templates/goals/continuation.md
    • includes dynamic budget lines (Tokens used, Token budget)
  • codex-rs/core/templates/goals/objective_updated.md
    • includes dynamic budget lines
  • codex-rs/app-server/src/request_processors/thread_goal_processor.rs
    • thread/goal/updated contains time_used_seconds and updated_at
  • codex-rs/app-server/README.md
    • documents thread/goal/updated as including full current goal

This issue does not claim these fields are always wrong; the concern is whether volatile data is entering model-visible context too early and fragmenting cache prefix reuse.


Expected behavior

Adjacent turns with stable instructions/history should typically reuse a larger cached prefix and not stay pinned to a tiny constant for long stretches.

Actual behavior

Long runs where cache repeatedly hits only a tiny fixed prefix (e.g. 2432) until compaction rewrites context.


Possible fix directions

  1. Keep model-visible goal context stable; avoid placing rapidly changing counters/timestamps in cache-critical prefix segments.
  2. Separate UI/runtime telemetry (tokensUsed, timeUsedSeconds, updatedAt) from prompt-injection content.
  3. Add diagnostics for first prompt-diff boundary between adjacent turns to make cache invalidation root-cause obvious.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Adjacent turns with stable instructions/history should typically reuse a larger cached prefix and not stay pinned to a tiny constant for long stretches.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Goal sessions can pin cached_input_tokens to small fixed prefix (e.g. 2432)