openclaw - 💡(How to fix) Fix Cross-agent Codex app-server abort still reproduces on 2026.5.16-beta.4 (38c3a8d) — #82805 not resolving; broader than #82758 (any mid-turn overlap, not just simultaneous) [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

The cross-agent Codex app-server abort tracked in #82758 (canonical #79495, fix PR #82805 "isolate app-server clients per runtime key") still reproduces on 2026.5.16-beta.4 (38c3a8d). A second agent's Codex turn starting at any point during a first agent's in-flight turn evicts the first agent: the running agent's Codex rollout receives <turn_aborted> "user interrupted the previous turn" (the user did not interrupt) within seconds, the codex app-server process is re-created, and the victim never self-recovers. Filing fresh per @steipete's request; this also shows the failure is broader than the original report — it does not require near-simultaneous submits.

Root Cause

Same root cause as #79495 / #82805 (process-global shared Codex app-server client keyed in a way that evicts across agents). The keyed-client-pool fix should be verified to actually ship in the next beta cut, and a regression test covering "agent B turn started mid agent-A turn must not abort A" (the offset case, not just simultaneous) would lock this — the offset case is the common real-world path (messaging one agent while another is working).

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Summary

The cross-agent Codex app-server abort tracked in #82758 (canonical #79495, fix PR #82805 "isolate app-server clients per runtime key") still reproduces on 2026.5.16-beta.4 (38c3a8d). A second agent's Codex turn starting at any point during a first agent's in-flight turn evicts the first agent: the running agent's Codex rollout receives <turn_aborted> "user interrupted the previous turn" (the user did not interrupt) within seconds, the codex app-server process is re-created, and the victim never self-recovers. Filing fresh per @steipete's request; this also shows the failure is broader than the original report — it does not require near-simultaneous submits.

Environment

OpenClaw2026.5.16-beta.4 (38c3a8d)
Harnessnative Codex app-server
Model / authopenai/gpt-5.5, OAuth subscription
Setupmulti-agent Telegram (9 agents), fresh sessions per test
Last-known-goodnone — #82758 filed on beta.3; #82805 (claimed fix) merged to main; this is the verification that beta.4 does not resolve it

Expected

Concurrent Codex turns for different agents are isolated. A turn for agent B must not abort agent A's in-flight turn, and must never leave the victim unrecoverable.

Actual

A second agent's turn starting while another agent is mid-turn aborts the already-running agent (<turn_aborted> in its Codex rollout, "user interrupted the previous turn" — the user did not), re-creates the codex app-server process, and the victim does not recover (stays wedged until it hits the turn timeout, or, on default config, until a manual openclaw gateway restart). The agent that started second "wins" the shared client and completes.

Controlled, instrumented, 3-test reproduction (single log, 5s sampling, ground truth = per-agent Codex rollout growth + turn_aborted marker + codex app-server pid identity). Sanitized timeline (lifecycle/timing/opaque-IDs only — no message bodies, no user/chat IDs, no client data):

https://gist.github.com/PashaGanson/8221c87a4e62bcb5d3421d9985968f62

Decisive points:

  • Test 1 (control, single agent solo): completes, 0 turn_aborted. Isolates concurrency as the variable.
  • Test 2 (~7s apart): victim user_message 05:27:02.384reasoning 05:27:07.362turn_aborted 05:27:07.502 ("user interrupted the previous turn"); rollout frozen at 142926 B. codex app-server pid re-created twice in ~11s at the collision (05:26:59, 05:27:10). Winner completed.
  • Test 3 (~91s apart — second agent starts mid-turn of the first): victim ran normally ~90s (function_call_output/reasoning at 05:42:07–09), then the second agent submitted at ~05:42:13; codex pid re-created 05:42:11; victim turn_aborted 05:42:11.015 the same second. Winner task_complete 05:50:14. Victim never recovered.

The success-vs-fail asymmetry holds in every window: the agent already running is the one aborted, the moment any other agent's Codex turn starts.

Source-side (calibrated — not asserted, please confirm against the build)

  • beta.4 (38c3a8d) shipped dist/ still contains the legacy process-global singleton symbol Symbol.for("openclaw.codexAppServerClientState") (same as the #82758-era beta.3).
  • No keyed-pool / per-runtime-key fingerprints from #82805 were found in the shipped dist/ (searched clientsByKey, keyed-pool, "isolate app-server clients per runtime key", PR refs). This is suggestive that #82805 may not actually be present in the 38c3a8d build (vs. "present but insufficient") — but this is from a minified bundle and is not asserted. Worth confirming the fix actually landed in the beta.4 cut.

Repro (minimal)

  1. Two Telegram agents, native Codex app-server, same model/auth, fresh sessions.
  2. Send a real tool-using task to agent A.
  3. While A is mid-turn (even ~1–2 min in), send a task to agent B.
  4. A's Codex rollout gets <turn_aborted> within seconds; A wedges and does not recover; B completes.

Suggested fix direction

Same root cause as #79495 / #82805 (process-global shared Codex app-server client keyed in a way that evicts across agents). The keyed-client-pool fix should be verified to actually ship in the next beta cut, and a regression test covering "agent B turn started mid agent-A turn must not abort A" (the offset case, not just simultaneous) would lock this — the offset case is the common real-world path (messaging one agent while another is working).

Cross-reference

  • #82758 (this report's predecessor — closed/superseded, source-confirmed)
  • #79495 (canonical tracking issue)
  • #82805 (keyed shared-client pool fix — verified here as NOT resolving on beta.4 38c3a8d)

Secondary, separate (not conflated)

Independently, the winning agent's completed reply sometimes arrives truncated in Telegram while the same output is complete/normal in the Control UI. Intermittent, hard to instrument (only the Telegram delivery is cut). Mentioned only for maintainer awareness — root cause unpinned, should be a separate issue, not part of this one.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Cross-agent Codex app-server abort still reproduces on 2026.5.16-beta.4 (38c3a8d) — #82805 not resolving; broader than #82758 (any mid-turn overlap, not just simultaneous) [1 pull requests]