openclaw - ✅(Solved) Fix Track Codex app-server terminal notification hardening [1 pull requests, 4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75205Fetched 2026-05-01 05:36:53
View on GitHub
Comments
4
Participants
3
Timeline
12
Reactions
2
Author
Assignees
Timeline (top)
commented ×4cross-referenced ×2mentioned ×2subscribed ×2

OpenClaw hit a stuck Discord reply lane after Codex app-server accepted a turn/start request and then failed to deliver the terminal turn/completed or abort notification OpenClaw was waiting for. The active embedded-run handle stayed registered, so diagnostics correctly treated the session as an active run and did not release the lane.

Observed stuck session key: agent:main:discord:channel:1456744319972282449 Observed run/session id: b5e075cc-bf19-4f91-83e3-79e32f338bb5 OpenClaw workaround commit: 54e6e3d7daf5d0d857edf756b35628a29d11c7f5

Error Message

  • App-server should guarantee a terminal notification (turn/completed, turn/aborted, or an explicit terminal error event) for every accepted turn/start, even when the underlying Responses/SSE stream idles or fails.

Root Cause

OpenClaw hit a stuck Discord reply lane after Codex app-server accepted a turn/start request and then failed to deliver the terminal turn/completed or abort notification OpenClaw was waiting for. The active embedded-run handle stayed registered, so diagnostics correctly treated the session as an active run and did not release the lane.

Observed stuck session key: agent:main:discord:channel:1456744319972282449 Observed run/session id: b5e075cc-bf19-4f91-83e3-79e32f338bb5 OpenClaw workaround commit: 54e6e3d7daf5d0d857edf756b35628a29d11c7f5

Fix Action

Fix / Workaround

Observed stuck session key: agent:main:discord:channel:1456744319972282449 Observed run/session id: b5e075cc-bf19-4f91-83e3-79e32f338bb5 OpenClaw workaround commit: 54e6e3d7daf5d0d857edf756b35628a29d11c7f5

PR fix notes

PR #75308: Prefer Codex native workspace tools

Description (problem / solution / changelog)

This came out of the stuck Codex app-server lane investigation in https://github.com/openclaw/openclaw/issues/75205. The specific incident still deserves its own lifecycle analysis, but one thing was clearly wrong in our default shape: Codex mode was giving the model two different ways to operate on the same local workspace.

Codex already has native file, patch, process, and planning tools. OpenClaw was also injecting its PI-era dynamic tools for read, write, edit, apply_patch, exec, process, and update_plan into the Codex app-server turn. That split ownership makes the runtime boundary harder to reason about and creates room for duplicated command or file state when Codex should be the single owner of those local operations.

This PR adds a native-first dynamic-tool profile for the Codex plugin and makes it the default. In native-first mode, OpenClaw keeps the integration tools that are actually OpenClaw-owned: messaging, sessions, cron, media, gateway, nodes, browser, and web_search. It stops advertising the redundant workspace, process, patch, and planning tools to Codex app-server, so Codex owns that lifecycle end to end.

The old shape remains available for deployments that intentionally need it. codexDynamicToolsProfile: "openclaw-compat" restores the full OpenClaw dynamic tool catalog, and codexDynamicToolsExclude gives us an additive way to remove more OpenClaw dynamic tools from Codex turns without another one-off filter.

I also tightened the Codex developer instruction so it points the model at OpenClaw integration tools rather than vague host actions, and added a regression that checks the actual thread/start dynamic tool catalog sent to Codex.

What I checked before handing this over:

  • Reviewed the OpenClaw Codex app-server path and the local Codex harness source so the split is based on the actual native tool boundary, not a guess.

  • Ran Pash's dev gateway from this branch with the Codex runtime forced on the native-first profile, then confirmed a live Codex turn only received OpenClaw integration tools. The OpenClaw dynamic read, write, edit, apply_patch, exec, and process tools were absent; update_plan was absent too.

  • Temporarily set codexDynamicToolsExclude to ["web_fetch"], restarted the gateway, and confirmed a new live turn omitted web_fetch; then restored the config to [].

  • Ran a live native workspace smoke where the agent created, patched, read, and shell-verified a file through Codex-native apply_patch and exec_command while the OpenClaw file/process dynamic tools were still excluded.

  • Confirmed /verbose full is handled on the gateway chat path, then captured the live gateway WebSocket event stream for a native Codex turn. The stream showed native command items, native patch items, Codex hook events, lifecycle start/end, and a successful agent.wait completion.

  • Re-ran the PR parity gate after its first Opus mock lane timeout. The rerun passed the OpenAI candidate lane, the Opus 4.6 baseline lane, and the generated parity report.

  • Addressed the fresh ClawSweeper changelog placement review by moving the PR-linked changelog entry to the ### Changes section tail, then ran git diff --check, git diff --check origin/main...HEAD -- CHANGELOG.md, pnpm check:changelog-attributions, and the merge-helper validate_changelog_entry_for_pr 75308 pashpashpash check.

  • Final PR head 0d72cd5ca4ae5292bcf41b3f376d7421dc94fba6 has no failed or pending GitHub checks, including the OpenAI / Opus 4.6 parity gate; ClawSweeper's durable review is updated to needs-human with no remaining repair blocker.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/.generated/config-baseline.sha256 (modified, +2/-2)
  • docs/plugins/codex-harness.md (modified, +13/-0)
  • extensions/codex/openclaw.plugin.json (modified, +20/-0)
  • extensions/codex/src/app-server/config.test.ts (modified, +12/-0)
  • extensions/codex/src/app-server/config.ts (modified, +6/-0)
  • extensions/codex/src/app-server/run-attempt.test.ts (modified, +74/-0)
  • extensions/codex/src/app-server/run-attempt.ts (modified, +42/-3)
  • extensions/codex/src/app-server/thread-lifecycle.ts (modified, +1/-1)
  • src/agents/pi-tools.model-provider-collision.test.ts (modified, +21/-0)
  • src/agents/pi-tools.ts (modified, +5/-0)
RAW_BUFFERClick to expand / collapse

Summary

OpenClaw hit a stuck Discord reply lane after Codex app-server accepted a turn/start request and then failed to deliver the terminal turn/completed or abort notification OpenClaw was waiting for. The active embedded-run handle stayed registered, so diagnostics correctly treated the session as an active run and did not release the lane.

Observed stuck session key: agent:main:discord:channel:1456744319972282449 Observed run/session id: b5e075cc-bf19-4f91-83e3-79e32f338bb5 OpenClaw workaround commit: 54e6e3d7daf5d0d857edf756b35628a29d11c7f5

What OpenClaw did

  • Added a Codex app-server terminal-progress watchdog after turn/start returns an in-progress turn.
  • The watchdog resets on Codex app-server notifications and request/response activity.
  • If a Codex turn remains silent before any terminal event, OpenClaw marks the attempt timed out, sends best-effort turn/interrupt, resolves the attempt, clears the active embedded-run handle, and releases the session lane.
  • Added regression coverage for the accepted-but-silent turn case in extensions/codex/src/app-server/run-attempt.test.ts.
  • Documented the behavior in the agent-loop and queue docs.

What Codex should fix

  • turn/start should not accept work unless the app-server listener/subscription path is healthy for that conversation.
  • App-server should guarantee a terminal notification (turn/completed, turn/aborted, or an explicit terminal error event) for every accepted turn/start, even when the underlying Responses/SSE stream idles or fails.
  • App-server should expose enough read-back state for clients to reconcile an accepted turn after listener failure, for example via thread/read turn status or a dedicated active-turn status endpoint.
  • Listener failure and SSE idle timeout paths should be surfaced as terminal app-server events, not just internal logs.

Evidence from code read

Codex app-server currently treats turn/start as accepted once it submits Op::UserInput; the turn lifecycle notifications depend on the separate listener path reading conversation.next_event and translating TurnStarted/TurnComplete/TurnAborted into app-server events. That creates a gap where OpenClaw can receive turn/start success but never receive the terminal notification it needs to release the channel lane.

Relevant Codex paths inspected locally:

  • codex-rs/app-server/src/codex_message_processor.rs: turn/start, thread/start, thread/resume, listener loop.
  • codex-rs/app-server/src/bespoke_event_handling.rs: mapping core turn lifecycle events into app-server notifications.
  • codex-rs/core/src/tasks/regular.rs and codex-rs/core/src/tasks/mod.rs: core TurnStarted, TurnComplete, and TurnAborted emission.

Verification

  • pnpm test extensions/codex/src/app-server/run-attempt.test.ts passed locally: 40 tests.
  • pnpm exec oxfmt --check --threads=1 extensions/codex/src/app-server/run-attempt.ts extensions/codex/src/app-server/run-attempt.test.ts docs/concepts/agent-loop.md docs/concepts/queue.md passed locally.
  • git diff --check origin/main...HEAD passed after rebase.
  • Testbox pnpm check:changed passed for lanes extensions, extensionTests, and docs.

extent analysis

TL;DR

Codex app-server should guarantee a terminal notification for every accepted turn/start to prevent stuck Discord reply lanes.

Guidance

  • Review the codex-rs/app-server/src/codex_message_processor.rs and codex-rs/app-server/src/bespoke_event_handling.rs files to ensure that terminal notifications are sent for all accepted turn/start requests.
  • Implement a mechanism to expose read-back state for clients to reconcile an accepted turn after listener failure, such as via thread/read turn status or a dedicated active-turn status endpoint.
  • Surface listener failure and SSE idle timeout paths as terminal app-server events, rather than just internal logs, to provide better visibility and handling of these cases.
  • Consider adding additional logging or monitoring to detect and diagnose stuck sessions, such as the one observed with session key agent:main:discord:channel:1456744319972282449.

Example

No code snippet is provided as the issue does not contain sufficient information to create a specific example.

Notes

The provided information suggests that the issue is related to the Codex app-server's handling of turn/start requests and terminal notifications. However, without more detailed information about the Codex app-server's implementation, it is difficult to provide a more specific solution.

Recommendation

Apply the workaround implemented in OpenClaw commit 54e6e3d7daf5d0d857edf756b35628a29d11c7f5 to prevent stuck Discord reply lanes, while also working on implementing the necessary changes in the Codex app-server to guarantee terminal notifications for all accepted turn/start requests.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING