openclaw - ✅(Solved) Fix Codex app-server: evaluate post-tool raw assistant completion semantics [1 pull requests, 1 comments, 2 participants]

openclaw2026-05-19 13:16:50

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#84137•Fetched 2026-05-20 03:43:39

View on GitHub

Comments

Participants

Timeline

Reactions

Author

rozmiarD

Participants

clawsweeper[bot]

rozmiarD

Timeline (top)

commented ×1cross-referenced ×1

While preparing #84135, we verified that Codex app-server currently treats two raw assistant completion cases differently:

Pre-tool raw assistant completion without turn/completed can represent final assistant output. The current short assistant idle release is useful here because it delivers captured assistant text instead of waiting for a terminal idle timeout.
Post-tool raw assistant completion or progress after a dynamic/native tool handoff can represent synthesis or a terminal-event wait rather than a final answer. Today that path arms the completion-idle guard with turnAssistantCompletionIdleTimeoutMs, which defaults to 10 seconds.

Root Cause

Pre-tool raw assistant completion without turn/completed can represent final assistant output. The current short assistant idle release is useful here because it delivers captured assistant text instead of waiting for a terminal idle timeout.
Post-tool raw assistant completion or progress after a dynamic/native tool handoff can represent synthesis or a terminal-event wait rather than a final answer. Today that path arms the completion-idle guard with turnAssistantCompletionIdleTimeoutMs, which defaults to 10 seconds.

Fix Action

Fixed

Fixed by PR: fix(codex): make post-tool raw assistant timeout configurable (https://github.com/openclaw/openclaw/pull/84135)

PR fix notes

PR #84135: fix(codex): make post-tool raw assistant timeout configurable

Repository: openclaw/openclaw
Author: rozmiarD
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/84135

Description (problem / solution / changelog)

Motivation

Codex app-server currently uses the assistant completion idle timeout for the post-tool raw assistant completion guard. That keeps stuck post-tool turns fail-fast, but couples two different behaviors: final assistant release and post-tool completion wait. Heavy or trusted local workloads may need a longer post-tool wait without weakening final assistant release.

Root cause / diagnosis

Pre-tool raw assistant completion can represent final assistant output and should still release the session after the assistant idle budget.

Post-tool raw assistant completion or progress happens after a tool handoff and can represent post-tool synthesis or terminal-event wait.

Current code uses turnAssistantCompletionIdleTimeoutMs for the post-tool guard. That timeout defaults to 10s. Existing tests show this was intentional fail-fast behavior, so this PR adds a separate configurable budget rather than globally changing assistant release semantics.

Implementation

Added appServer.postToolRawAssistantCompletionIdleTimeoutMs.

Added runtime option and config resolution for the post-tool raw assistant completion guard.

Post-tool raw assistant guard now uses the dedicated timeout. If unset, behavior remains compatible by falling back to the assistant completion idle timeout.

Pre-tool raw assistant release remains unchanged.

Updated both Codex harness docs pages so the post-tool raw assistant paragraph mentions the dedicated completion-idle guard instead of only turn/completed or the terminal watchdog.

What did not change

Pre-tool raw assistant release.
turn/completed handling.
Raw tool-output completion handling.
Dynamic tool diagnostics.
Dynamic tool execution timeout.
requestTimeoutMs.
Terminal idle hard fallback.

Tests

git diff --check passed.
git diff --check HEAD~1..HEAD passed after the docs follow-up commit.
pnpm docs:list completed and confirmed the touched Codex harness docs are in the docs inventory.
NODE_OPTIONS=<process-only path.matchesGlob shim> node --no-maglev ./node_modules/vitest/vitest.mjs --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/config.test.ts extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose -t 'post-tool raw assistant|raw assistant response item without turn completion|Codex app-server config' passed before the docs-only follow-up: 46 passed, 180 skipped.
A full targeted-files Vitest run reached the relevant Codex tests and the new regression test passed, but the run failed on an unrelated existing local timeout in exposes Docker sandbox shell tools when native Code Mode cannot honor sandbox paths.
pnpm check:changed was not run locally. This contributor checkout is a Codex worktree where broad pnpm gates should run in Testbox/Crabbox, and this contributor account cannot run Blacksmith Testbox because Blacksmith requires a GitHub organization app install. Direct AWS Crabbox credentials are also not available in this environment.

Regression coverage

Existing pre-tool raw assistant release test still passes.

New test proves post-tool raw assistant completion can use a longer dedicated timeout than assistant release.

Config test proves the appServer field is accepted and resolved, and unknown fields remain rejected.

Existing post-tool timeout tests remain valid; the default behavior still falls back to the assistant completion idle timeout.

Risk

Longer post-tool timeout may delay stuck-turn detection when operators configure it too high.

Too-short timeout can interrupt legitimate post-tool synthesis.

Keeping a separate config avoids weakening final assistant release and avoids global timeout changes.

Proof / evidence

git diff --stat origin/main...HEAD:

CHANGELOG.md                                       |   1 +
docs/plugins/codex-harness-reference.md            |  44 ++++----
docs/plugins/codex-harness.md                      |  44 ++++----
extensions/codex/openclaw.plugin.json              |   9 ++
extensions/codex/src/app-server/config.test.ts     |  13 +++
extensions/codex/src/app-server/config.ts          |  12 +++
extensions/codex/src/app-server/run-attempt.test.ts | 113 +++++++++++++++++++++
extensions/codex/src/app-server/run-attempt.ts     |  22 +++-
8 files changed, 217 insertions(+), 41 deletions(-)

Before: post-tool raw assistant completion armed turn.completion_idle_timeout with turnAssistantCompletionIdleTimeoutMs, so the guard was tied to the final assistant release budget.

After: post-tool raw assistant completion arms the same completion-idle guard with postToolRawAssistantCompletionIdleTimeoutMs, falling back to turnAssistantCompletionIdleTimeoutMs when unset. Pre-tool raw assistant release still uses the assistant release path.

Docs after the follow-up commit now state that post-tool raw assistant progress keeps waiting for turn/completed while a completion-idle guard stays armed, using appServer.postToolRawAssistantCompletionIdleTimeoutMs when configured and falling back to the assistant completion idle timeout otherwise.

Real behavior proof

Behavior addressed: Post-tool raw assistant completion after a tool handoff can now use a dedicated completion-idle timeout instead of being hard-coupled to the final assistant release timeout. Pre-tool raw assistant completion still releases captured final assistant text after the assistant idle budget.

Real environment tested: Local OpenClaw checkout on the PR branch, exercising runCodexAppServerAttempt with the real app-server bridge code path and a mocked Codex app-server client that sends the relevant item/tool/call, rawResponseItem/completed, and no turn/completed notifications. No live external Codex app-server or model request was used.

Exact steps or command run after this patch: git diff --check; git diff --check HEAD~1..HEAD; pnpm docs:list; NODE_OPTIONS=<process-only path.matchesGlob shim> node --no-maglev ./node_modules/vitest/vitest.mjs --config test/vitest/vitest.extensions.config.ts extensions/codex/src/app-server/config.test.ts extensions/codex/src/app-server/run-attempt.test.ts --reporter=verbose -t 'post-tool raw assistant|raw assistant response item without turn completion|Codex app-server config'.

Evidence after fix: Focused regression run passed with 46 tests passed and 180 skipped. The passing set included times out post-tool raw assistant progress after the assistant idle timeout, uses configured post-tool raw assistant completion timeout instead of assistant release timeout, releases the session after a raw assistant response item without turn completion, and the Codex app-server config/manifest alignment tests. The docs follow-up commit also passed git diff --check HEAD~1..HEAD.

Observed result after fix: With turnAssistantCompletionIdleTimeoutMs: 5 and postToolRawAssistantCompletionIdleTimeoutMs: 100, the post-tool raw assistant completion case did not finish after the 5ms assistant release budget, then timed out through turn.completion_idle_timeout after the dedicated 100ms budget with lastActivityReason: "notification:rawResponseItem/completed". The pre-tool raw assistant completion test still returned captured assistant text ("Done.") without timing out.

What was not tested: A live Codex app-server/model turn was not tested. Contributor-side Blacksmith Testbox is unavailable because Blacksmith requires installing the app into a GitHub organization, and this contributor account does not have an OpenClaw organization Blacksmith setup. Direct AWS Crabbox proof is also unavailable because no AWS Crabbox credentials or instance role are present. A full targeted-file Vitest run reached and passed the relevant Codex timeout tests, but failed on an unrelated existing local timeout in exposes Docker sandbox shell tools when native Code Mode cannot honor sandbox paths.

AI assistance

This PR was AI-assisted. I reviewed the diff, tests, and behavior before submitting.

Changed files

CHANGELOG.md (modified, +1/-0)
docs/plugins/codex-harness-reference.md (modified, +24/-20)
docs/plugins/codex-harness.md (modified, +24/-20)
extensions/codex/openclaw.plugin.json (modified, +9/-0)
extensions/codex/src/app-server/config.test.ts (modified, +13/-0)
extensions/codex/src/app-server/config.ts (modified, +12/-0)
extensions/codex/src/app-server/run-attempt.test.ts (modified, +113/-0)
extensions/codex/src/app-server/run-attempt.ts (modified, +21/-1)

RAW_BUFFERClick to expand / collapse

Context

While preparing #84135, we verified that Codex app-server currently treats two raw assistant completion cases differently:

Pre-tool raw assistant completion without turn/completed can represent final assistant output. The current short assistant idle release is useful here because it delivers captured assistant text instead of waiting for a terminal idle timeout.
Post-tool raw assistant completion or progress after a dynamic/native tool handoff can represent synthesis or a terminal-event wait rather than a final answer. Today that path arms the completion-idle guard with turnAssistantCompletionIdleTimeoutMs, which defaults to 10 seconds.

Conservative fix in #84135

#84135 takes the low-risk path:

Adds appServer.postToolRawAssistantCompletionIdleTimeoutMs.
Uses it only for the post-tool raw assistant completion guard.
Falls back to turnAssistantCompletionIdleTimeoutMs when unset.
Preserves the current default behavior and the pre-tool raw assistant release semantics.

That makes the timeout configurable for heavy/trusted workloads without changing existing fail-fast behavior.

Alternative considered

The more complete long-term fix may be to change the semantics, not only expose a config knob.

One possible direction:

Treat post-tool raw assistant completion/progress as a distinct post-tool synthesis or terminal-wait state.
Give that state its own default budget, for example 60s, instead of inheriting the 10s assistant release budget.
Keep pre-tool raw assistant release on the short assistant idle budget.
Update the existing tests that currently encode the 10s post-tool fail-fast behavior.

This may better match real heavy local/tool workloads, but it is a behavior change. It could delay stuck-turn detection and needs maintainer agreement on the intended default before changing shipped semantics.

Decision needed

Should post-tool raw assistant completion keep the current fail-fast default unless operators opt in via config, or should OpenClaw change the default semantics so post-tool synthesis waits longer by default?

Suggested follow-up proof

A future semantic-change PR should include:

A regression test showing pre-tool raw assistant completion still releases captured final text quickly.
A regression test showing post-tool raw assistant completion uses the new default budget.
Evidence from a real or close-to-real Codex app-server workload where post-tool synthesis legitimately stays quiet longer than the assistant release budget.
A clear before/after note for stuck-turn detection latency.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#memory optimization #batch processing #GPU compatibility #latency issue #model loading

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - ✅(Solved) Fix Codex app-server: evaluate post-tool raw assistant completion semantics [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #84135: fix(codex): make post-tool raw assistant timeout configurable

Description (problem / solution / changelog)

Motivation

Root cause / diagnosis

Implementation

What did not change

Tests

Regression coverage

Risk

Proof / evidence

Real behavior proof

AI assistance

Changed files

Context

Conservative fix in #84135

Alternative considered

Decision needed

Suggested follow-up proof

Still need to ship something?

TRENDING

openclaw - ✅(Solved) Fix Codex app-server: evaluate post-tool raw assistant completion semantics [1 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fixed

PR fix notes

PR #84135: fix(codex): make post-tool raw assistant timeout configurable

Description (problem / solution / changelog)

Motivation

Root cause / diagnosis

Implementation

What did not change

Tests

Regression coverage

Risk

Proof / evidence

Real behavior proof

AI assistance

Changed files

Context

Conservative fix in #84135

Alternative considered

Decision needed

Suggested follow-up proof

Still need to ship something?

RELATED_DISCOVERY

TRENDING