codex - 💡(How to fix) Fix Forked subagents lose prompt-cache lineage for inherited parent context

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

fork_context=true is most useful when the parent has already accumulated important project/task context and a subagent needs to perform a bounded task using that context.

Without cache-lineage preservation, that pattern becomes unexpectedly expensive:

  • each fork can rebill a large inherited prefix as uncached input;
  • multi-subagent workflows multiply the cost;
  • users may see rapid quota/cost drain even when the actual per-subagent task prompt is small.

Fix Action

Fix / Workaround

  1. Start a normal Codex session using gpt-5.5.
  2. Add enough stable context to the parent thread, for example by discussing or loading a large repo/file/document so the parent has a substantial cached prefix.
  3. Spawn a subagent with inherited/forked context for a small task.
  4. Inspect the usage events in the rollout/session JSONL, especially input_tokens and cached_input_tokens, for the forked child turn.
  5. Compare that with either:
    • a normal continuation in the parent thread, or
    • a patched build where the forked child preserves the parent prompt-cache lineage.
RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

codex-cli 0.133.0

Which model were you using?

gpt-5.5

What platform is your computer?

macOS

What issue are you seeing?

Forked subagents appear to lose the parent thread's prompt-cache lineage even though they inherit a large prefix of the parent conversation.

When a subagent is spawned with inherited context, the child thread usually starts from a long parent prefix that should be highly cacheable. However, the child currently gets its own thread-derived prompt_cache_key, so the backend appears to treat the fork as an unrelated request instead of a continuation/fork of the parent cache lineage.

The practical effect is that fork_context=true subagents can pay mostly uncached input-token cost for a prefix that was already present and cached in the parent thread.

This is especially expensive for workflows that intentionally load substantial project context in the main thread and then delegate small follow-up tasks to forked subagents.

What steps can reproduce the bug?

  1. Start a normal Codex session using gpt-5.5.
  2. Add enough stable context to the parent thread, for example by discussing or loading a large repo/file/document so the parent has a substantial cached prefix.
  3. Spawn a subagent with inherited/forked context for a small task.
  4. Inspect the usage events in the rollout/session JSONL, especially input_tokens and cached_input_tokens, for the forked child turn.
  5. Compare that with either:
    • a normal continuation in the parent thread, or
    • a patched build where the forked child preserves the parent prompt-cache lineage.

Expected behavior

A forked subagent that inherits parent context should preserve the relevant parent prompt-cache lineage for the inherited prefix.

The child should still have its own thread/session identity, but the prompt-cache key used for the reusable inherited prefix should not be replaced solely because the fork has a new child thread id.

In other words, forked subagents should be isolated as conversation threads, but not forced into unrelated prompt-cache namespaces for the parent prefix they inherit.

Actual behavior

The forked child receives a new thread-derived prompt_cache_key.

Because the cache key changes at fork startup, the inherited parent prefix is not reused as effectively as expected. The observed cached_input_tokens for forked subagents is much lower than it should be for a mostly identical inherited prefix.

Why this matters

fork_context=true is most useful when the parent has already accumulated important project/task context and a subagent needs to perform a bounded task using that context.

Without cache-lineage preservation, that pattern becomes unexpectedly expensive:

  • each fork can rebill a large inherited prefix as uncached input;
  • multi-subagent workflows multiply the cost;
  • users may see rapid quota/cost drain even when the actual per-subagent task prompt is small.

Related issues / PRs

This is related to, but more specific than:

  • #21796: stable prompt-cache key support for identical startup context across app-server sessions.
  • #20301: broader low cache hit rate reports with GPT-5.5.
  • #18683: feature request for batch-spawning forked subagents from shared preloaded context.
  • #20909: a closed PR that attempted to preserve fork prompt-cache state.

Unlike #20301, this report is not about a general model-level cache anomaly. It is specifically about forked subagents losing cache lineage because the fork gets a new thread-derived prompt-cache key.

Suggested fix

When starting a forked subagent, preserve the parent's prompt-cache lineage for the inherited context instead of deriving a fresh key only from the child thread id.

A narrow fix is to carry an inherited prompt-cache key through fork startup and use it when constructing the child session/client, while keeping the child thread id and conversation state separate.

Reference implementation: john-lu-ptc/codex#1

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A forked subagent that inherits parent context should preserve the relevant parent prompt-cache lineage for the inherited prefix.

The child should still have its own thread/session identity, but the prompt-cache key used for the reusable inherited prefix should not be replaced solely because the fork has a new child thread id.

In other words, forked subagents should be isolated as conversation threads, but not forced into unrelated prompt-cache namespaces for the parent prefix they inherit.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

codex - 💡(How to fix) Fix Forked subagents lose prompt-cache lineage for inherited parent context