openclaw - 💡(How to fix) Fix Codex app-server rotates context-engine bootstrap threads after large first turns

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

On the current main branch and the latest stable release I verified (v2026.5.22, published 2026-05-24), Codex app-server sessions can repeatedly lose their warmed native thread after a large context-engine bootstrap turn. The observed release log shape is:

codex app-server native transcript exceeded active token limit; starting a fresh thread

This is not Discord losing messages and it is not the model context limit. It is OpenClaw clearing the saved Codex app-server native thread binding before the context-engine compatibility path can decide whether that thread is still valid.

Root Cause

extensions/codex/src/app-server/run-attempt.ts calls rotateOversizedCodexAppServerStartupBinding(...) immediately after reading the startup binding. That helper reads the native Codex rollout/session token stats and clears the binding when the latest usage is at or above CODEX_APP_SERVER_NATIVE_THREAD_MAX_TOKENS (70_000).

For context-engine thread_bootstrap, that ordering is wrong: the bootstrap turn is expected to be large, and later turns should be able to reuse the same native thread as long as the stored context-engine projection metadata still matches the current engine/policy/epoch. The later context-engine reuse logic already knows how to decide whether the binding is compatible, but it never gets the chance because the startup guard deletes the binding first.

Fix Action

Fix / Workaround

I have a focused regression test and patch in progress that proves:

Code Example

codex app-server native transcript exceeded active token limit; starting a fresh thread

---

/Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse

---

OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.context-engine.test.ts --run
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --run -t "starts a fresh Codex thread before resume when the native rollout is over budget|uses current rollout token usage before cumulative usage|clears native rollouts at the configured byte limit"
pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts
git diff --check
RAW_BUFFERClick to expand / collapse

Summary

On the current main branch and the latest stable release I verified (v2026.5.22, published 2026-05-24), Codex app-server sessions can repeatedly lose their warmed native thread after a large context-engine bootstrap turn. The observed release log shape is:

codex app-server native transcript exceeded active token limit; starting a fresh thread

This is not Discord losing messages and it is not the model context limit. It is OpenClaw clearing the saved Codex app-server native thread binding before the context-engine compatibility path can decide whether that thread is still valid.

Impact

For long-running Codex-backed agents with contextEngine projection mode thread_bootstrap, a large first/bootstrap native turn can exceed the local 70k native active-token guard. Once that happens, each later turn can cold-start the native Codex thread instead of using thread/resume, causing repeated bootstrap/projection work and loss of the warmed app-server fast path.

That matches the token/latency symptom we are seeing in long Discord sessions: the Gateway still routes the turn, but the Codex-side native wrapper repeatedly starts fresh threads and burns tokens/CPU.

Root Cause

extensions/codex/src/app-server/run-attempt.ts calls rotateOversizedCodexAppServerStartupBinding(...) immediately after reading the startup binding. That helper reads the native Codex rollout/session token stats and clears the binding when the latest usage is at or above CODEX_APP_SERVER_NATIVE_THREAD_MAX_TOKENS (70_000).

For context-engine thread_bootstrap, that ordering is wrong: the bootstrap turn is expected to be large, and later turns should be able to reuse the same native thread as long as the stored context-engine projection metadata still matches the current engine/policy/epoch. The later context-engine reuse logic already knows how to decide whether the binding is compatible, but it never gets the chance because the startup guard deletes the binding first.

Expected Behavior

A saved Codex native thread binding with contextEngine.projection.mode === "thread_bootstrap" should survive the startup native transcript size guard. Compatibility should then be decided by the context-engine projection/epoch checks and the existing per-turn overflow recovery path. If the epoch or policy changes, OpenClaw should still rotate and reproject.

Proposed Fix

Defer the startup native token/byte guard for context-engine thread_bootstrap bindings. Keep the existing guard behavior for non-context-engine and non-bootstrap native sessions.

I have a focused regression test and patch in progress that proves:

  • an 86k-token bootstrap rollout still resumes with thread/resume
  • the following turn sends only the current user prompt, not the assembled bootstrap context again
  • existing non-bootstrap native over-budget rotation tests still pass

Validation So Far

Local validation ran from the Lexar-backed worktree:

/Volumes/LEXAR/repos/worktrees/openclaw-codex-native-thread-reuse

Focused checks:

OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.context-engine.test.ts --run
OPENCLAW_VITEST_MAX_WORKERS=1 node scripts/run-vitest.mjs extensions/codex/src/app-server/run-attempt.test.ts --run -t "starts a fresh Codex thread before resume when the native rollout is over budget|uses current rollout token usage before cumulative usage|clears native rollouts at the configured byte limit"
pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.context-engine.test.ts
git diff --check

Parallel review also checked Pi runtime risk. The proposed change is limited to the Codex app-server startup binding guard and should not change Pi embedded-runner compaction semantics.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Codex app-server rotates context-engine bootstrap threads after large first turns