codex - 💡(How to fix) Fix thread/rollback numTurns uses inconsistent units between core history and app-server replay

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

The root cause appears to be that the same numTurns value is interpreted in different units:

Code Example

T1: U1 -> A1
T2: U2 -> A2
T3: U3
    U4  // steer
    U5  // steer
    A3
T4: U6 -> A6

---

U1, U2, U3, U4, U5, U6

---

num_turns = 4   // U3, U4, U5, U6

---

T1: U1 -> A1
T2: U2 -> A2

---

T1, T2, T3, T4

---

app-server replayed transcript: empty

---

TUI shows:                  T1, T2
app-server/session replay:  empty

---

T1: U1 -> A1
T2: goal / agent-only output
    // no visible UserMessage in this raw turn
T3: U2 -> A2

---

U1, U2

---

num_turns = 2   // U1, U2

---

TUI-local transcript: empty

---

T1, T2, T3

---

app-server replayed transcript:

T1: U1 -> A1

---

TUI shows:                  empty
app-server/session replay:  T1: U1 -> A1
RAW_BUFFERClick to expand / collapse

What version of Codex CLI is running?

v0.131.0-alpha.16

What subscription do you have?

Pro

Which model were you using?

gpt-5.5-xhigh

What platform is your computer?

Linux 6.8.0-1050-aws x86_64 x86_64

What terminal emulator and version are you using (if applicable)?

No response

Codex doctor report

No response

What issue are you seeing?

thread/rollback can make the TUI-local transcript and the app-server replayed transcript disagree about which history survived a successful rewind.

The root cause appears to be that the same numTurns value is interpreted in different units:

  • core/protocol treat numTurns as a count of user/instruction-turn rollback boundaries;
  • app-server ThreadHistory replay treats numTurns as a count of raw app-server Turn objects;
  • TUI backtrack computes numTurns from visible user messages.

This only works when every raw app-server Turn contains exactly one rollback boundary. Valid histories can violate that invariant.

  • codex-rs/protocol/src/protocol.rs documents Op::ThreadRollback as “drop the last N user turns from in-memory context”.
  • ThreadRolledBackEvent.num_turns is documented as “Number of user turns that were removed from context”.
  • codex-rs/core/src/context_manager/history.rs::drop_last_n_user_turns drops the last N instruction turns.
  • is_user_turn_boundary treats ordinary non-contextual user messages and structured assistant inter-agent instructions as rollback boundaries.
  • codex-rs/app-server/src/request_processors/thread_processor.rs forwards num_turns unchanged to Op::ThreadRollback { num_turns }.
  • codex-rs/app-server-protocol/src/protocol/thread_history.rs::ThreadHistoryBuilder::handle_thread_rollback clears/truncates self.turns by raw Turn count.

Problematic cases include:

  1. a raw app-server turn with zero visible UserMessage items, such as a goal/autonomous continuation turn or other agent-only/tool-only turn;
  2. a raw app-server turn with multiple UserMessage items, such as mid-turn steering;
  3. a structured assistant inter-agent instruction, which core counts as an instruction-turn boundary but TUI-visible user-message counting does not.

This can make core model history, app-server thread/read includeTurns, persisted rollout reconstruction, resume/fork reconstruction, and TUI backtrack disagree about which conversation prefix survived rollback.

What steps can reproduce the bug?

This can be reproduced with histories where raw app-server Turn objects do not map 1:1 to visible user-message rollback boundaries.

Below are two simplified scenarios. T1, T2, etc. are raw app-server turns. U1, U2, etc. are visible user messages. A1, A2, etc. are assistant messages.

The same class may also affect any feature that creates zero-visible-user raw turns or core-only rollback boundaries, such as standalone !cmd, compaction/review/hook/abort-only turns, and subagent/multi-agent structured inter-agent messages. The two scenarios below are the clearest examples.


Scenario 1: multiple steer messages inside one raw turn

Suppose the persisted history has this shape:

T1: U1 -> A1
T2: U2 -> A2
T3: U3
    U4  // steer
    U5  // steer
    A3
T4: U6 -> A6

Now use the TUI backtrack/rewind UI and rewind to U3.

From the TUI's point of view, the visible user messages are:

U1, U2, U3, U4, U5, U6

So rewinding to U3 asks to remove the last four visible user-message boundaries:

num_turns = 4   // U3, U4, U5, U6

The TUI-local transcript is then trimmed to the prefix before U3:

T1: U1 -> A1
T2: U2 -> A2

That is also what core user/instruction-boundary semantics would imply, assuming U3, U4, U5, and U6 are ordinary non-contextual user messages.

However, app-server ThreadHistory replay currently treats the same num_turns = 4 as four raw app-server turns. Since the raw turns are:

T1, T2, T3, T4

replaying the rollback as raw-turn truncation removes all four turns:

app-server replayed transcript: empty

So after TUI rewind:

TUI shows:                  T1, T2
app-server/session replay:  empty

This is a visible data mismatch: the TUI appears to preserve U1/A1 and U2/A2, but thread/read includeTurns, resume, fork, or another client rebuilding from app-server ThreadHistory can see a different transcript.


Scenario 2: goal / agent-only raw turn with zero visible user messages

A similar mismatch can happen in the opposite direction when the rollback span contains a raw turn with no visible user message, for example a goal-driven / autonomous / agent-only continuation turn.

Suppose the persisted history has this shape:

T1: U1 -> A1
T2: goal / agent-only output
    // no visible UserMessage in this raw turn
T3: U2 -> A2

Now use the TUI backtrack/rewind UI and rewind to the first visible user message, U1.

From the TUI's point of view, the visible user messages are:

U1, U2

So rewinding to U1 asks to remove the last two visible user-message boundaries:

num_turns = 2   // U1, U2

The TUI-local transcript is then trimmed to the prefix before U1:

TUI-local transcript: empty

Core user/instruction-boundary semantics would also remove both visible user turns, leaving no prior user conversation.

However, app-server ThreadHistory replay sees three raw turns:

T1, T2, T3

and treats num_turns = 2 as “remove the last two raw turns”. It removes T2 and T3, but leaves T1:

app-server replayed transcript:

T1: U1 -> A1

So after TUI rewind:

TUI shows:                  empty
app-server/session replay:  T1: U1 -> A1

In both scenarios, rollback can look successful in the TUI because the TUI trims its local transcript by visible user-message position. The persisted/app-server replay can still disagree because it applies the same num_turns value to a different unit: raw app-server Turn objects. The same class of unit mismatch can also become a model-context problem when future requests are built from a mismatched reconstructed history, or when core has rollback boundaries that are not visible user messages.

What is the expected behavior?

thread/rollback should use one consistent rollback boundary across:

  • core in-memory history,
  • persisted rollout reconstruction,
  • app-server ThreadHistoryBuilder / thread/read includeTurns,
  • resume/fork reconstruction,
  • TUI backtrack/local transcript trimming.

Since the protocol and core already document/use numTurns as user/instruction-turn count, the least surprising fix is to keep that semantic and make app-server transcript replay follow it.

Suggested fix:

  • Define numTurns explicitly as “number of rollback boundaries / instruction turns”, not raw app-server Turn objects.
  • Keep app-server replay aligned with core’s rollback-boundary semantics:
    • ordinary non-contextual user messages are rollback boundaries;
    • structured assistant inter-agent instructions are rollback boundaries;
    • contextual/hidden user messages such as goal context are not ordinary visible user-message boundaries.
  • Update ThreadHistoryBuilder::handle_thread_rollback to truncate by rollback boundary position rather than by raw Turn count.
  • If the rollback boundary falls inside a raw app-server Turn, truncate only that turn’s item suffix instead of deleting the whole raw turn.

One maintainable implementation approach would be to record rollback-boundary positions while replaying rollout items into ThreadHistory. A boundary position could point to a raw turn index plus an item index inside that turn. Then ThreadRolledBack { num_turns } can look up the Nth boundary from the end and truncate at that exact position.

Additional information

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING