codex - 💡(How to fix) Fix Remote compaction times out [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openai/codex#18829Fetched 2026-04-22 07:51:58
View on GitHub
Comments
1
Participants
2
Timeline
7
Reactions
0
Author
Timeline (top)
labeled ×4closed ×1commented ×1renamed ×1

Error Message

Error running remote compact task: timeout waiting for child process to exit

  1. The error still occurs on 0.121.0
  • error: compact_error=timeout waiting for child process to exit
  1. Or Codex should fail with a clearer, more actionable error and/or documented limit
  • whether the current error message is masking a lower-level transport timeout again (similar to earlier reports)

Root Cause

I am filing this because the issue appears to be in Codex's remote compaction path, not VMware specifically.

Code Example

"harness_drop_base_config_keys": [
  "model_context_window",
  "model_auto_compact_token_limit"
]
RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

codex-cli 0.121.0

What subscription do you have?

Pro

Which model were you using?

gpt-5.4

What platform is your computer?

Primary repro environment:

  • Ubuntu 24.04.4 guest (Linux 6.17.0-20-generic x86_64) in VMware Workstation Pro on a Windows host
  • GNOME Terminal in the guest

I am filing this because the issue appears to be in Codex's remote compaction path, not VMware specifically.

What terminal emulator and version are you using (if applicable)?

GNOME Terminal

What issue are you seeing?

Error running remote compact task: timeout waiting for child process to exit

I realize there are already compaction timeout issues such as #14860. This report adds a concrete live-session reproduction showing that:

  1. The error still occurs on 0.121.0
  2. It happens in a long-lived interactive CLI session
  3. It also happens in a nested Codex orchestration setup where one Codex session launches child Codex runs and reuses the child thread with resume --last --all

The key point is that the workload is heavy but not unreasonable for an orchestrator/controller use case. The failure is in remote compaction, not in an obviously invalid input shape.

What steps can reproduce the bug?

There are two closely related manifestations.

A. Long-lived interactive CLI session

In a normal interactive CLI session, after enough investigation turns / enough accumulated context, Codex starts a compact turn and then fails after about 158 seconds.

From the live codex-tui.log in my environment:

  • compact start: 2026-04-21T09:22:06Z
  • failure: 2026-04-21T09:24:44Z
  • error: compact_error=timeout waiting for child process to exit
  • effective outer model_context_window: 950000

This suggests the general remote compaction bug still exists even without any custom harnessing.

B. Nested Codex orchestration (much easier to trigger)

I have a controller-managed workflow where Codex is used as an orchestrator and then launches child Codex runs for roles like orchestrator / builder / verifier / reviewer.

The child orchestrator is run in same-thread append-only mode, using:

  • first turn: codex exec ...
  • follow-up turns: codex exec resume --last --all ...

In a failing live run:

  • builder-turn-001 succeeded
  • orchestrator-turn-002 failed only because remote compaction timed out
  • the builder result was already READY_FOR_CONTROLLER_TESTS

From the child orchestrator terminal log:

  • remote compaction failed ... compact_error=timeout waiting for child process to exit
  • child model_context_window_tokens=Some(258400)
  • failing turn last_token_usage.total_tokens=265904

The generated orchestrator packets in that same run were:

  • turn 1 prompt file: 285,248 bytes, 219,510 chars, 59,328 tokens
  • turn 2 prompt file: 357,207 bytes, 291,463 chars, 78,774 tokens

Those are large, but still plausible for a controller/orchestrator packet that contains:

  • spec text
  • acceptance criteria
  • repo map
  • full files / excerpts
  • JSON output schema

One especially relevant breakdown from the failing orchestrator-turn-002 packet:

  • SPEC: ~44 KB
  • FULL FILES: ~202 KB

This is exactly the sort of packet a controller/orchestrator can generate while still being a legitimate Codex use case.

What is the expected behavior?

One of these should happen reliably:

  1. Remote compaction succeeds, even for large same-thread orchestrator/controller packets
  2. Or Codex should fail with a clearer, more actionable error and/or documented limit

But it should not spend ~158 seconds and then fail with timeout waiting for child process to exit during compaction.

Additional information

A few details that may help narrow root cause:

1. The same environment had both a failing outer session and a failing inner child session

That is why I do not think this is just a VMware quirk or just a custom-controller bug.

  • outer interactive CLI session failed compaction at 950000 context window
  • inner orchestrator child session failed compaction at 258400

The inner case is easier to trigger, but the outer case shows the upstream compaction path is still unstable.

2. There is a local amplifier in my controller harness, but it is not the whole story

My controller-generated project config explicitly drops these keys from child harness homes:

"harness_drop_base_config_keys": [
  "model_context_window",
  "model_auto_compact_token_limit"
]

That causes the child harness to lose the larger base settings and run with an effective model_context_window of 258400.

This clearly makes the issue easier to trigger.

However, the same remote compact timeout also occurred in the outer interactive CLI session at 950000, so this local config stripping looks like an amplifier, not the root cause.

3. Successful control cases in the same environment did exist

I ran isolated fresh-home codex exec tests in the same guest, including forced compaction scenarios. Those could succeed repeatedly. The failure showed up when the usage matched the real operational pattern:

  • long-lived same-thread sessions
  • large orchestrator packets
  • nested Codex orchestration

That suggests the failure mode depends on session shape and accumulated history, not just on OS/VM presence.

4. Possible areas worth looking at

  • remote compaction timeout handling / retry behavior
  • compaction behavior in long-lived same-thread sessions
  • nested Codex / child-session orchestration support expectations
  • whether the current error message is masking a lower-level transport timeout again (similar to earlier reports)
  • whether packet/historical context size interacts poorly with the remote compact path even when the packet itself is within a plausible token range

If useful, I can provide more precise local evidence excerpts, but I wanted to file a concise report first because this still looks like a real upstream compaction stability problem in 0.121.0, especially for controller/orchestrator style usage.

extent analysis

TL;DR

The most likely fix or workaround for the remote compaction timeout issue in Codex CLI is to adjust the model_context_window and model_auto_compact_token_limit settings to accommodate large orchestrator packets and long-lived same-thread sessions.

Guidance

  • Investigate the remote compaction timeout handling and retry behavior to determine if adjustments can be made to prevent timeouts.
  • Review the compaction behavior in long-lived same-thread sessions to identify potential issues with accumulated history and session shape.
  • Examine the nested Codex / child-session orchestration support expectations to ensure that the current implementation meets the requirements for controller/orchestrator style usage.
  • Consider increasing the model_context_window and model_auto_compact_token_limit settings to allow for larger packets and more tokens.

Example

No code snippet is provided as the issue is related to configuration and settings rather than code.

Notes

The issue may be specific to the 0.121.0 version of Codex CLI, and further investigation is needed to determine the root cause. The provided information suggests that the issue is related to remote compaction and long-lived same-thread sessions, but more data and testing may be required to confirm.

Recommendation

Apply a workaround by adjusting the model_context_window and model_auto_compact_token_limit settings to accommodate large orchestrator packets and long-lived same-thread sessions, as this may help prevent remote compaction timeouts and allow for successful compaction.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING