openclaw - 💡(How to fix) Fix [Bug]: Codex app-server stalls after `item/completed`, then aborts without recovery/status [4 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#84076Fetched 2026-05-20 03:44:22
View on GitHub
Comments
4
Participants
3
Timeline
15
Reactions
1
Author
Timeline (top)
labeled ×10commented ×4cross-referenced ×1

OpenClaw 2026.5.18 still loses productive Codex app-server turns when the last observed current-turn notification is item/completed and no turn/completed follows.

The already-merged fixes for #78756 and #82171 appear to be present in this installation. The current behavior is therefore not a missing-fix case, but a remaining recovery/turn-semantics problem:

  • the session lane enters processing
  • diagnostics report active_work_without_progress
  • lastProgress=codex_app_server:notification:item/completed
  • recovery=none
  • after turnCompletionIdleTimeoutMs, OpenClaw aborts the run
  • no useful visible recovery/status is delivered for the failed work
  • already-started work is not resumed

This makes chat lanes look silent or stuck and can drop real work after a completed tool call.

Root Cause

At minimum, if OpenClaw decides the app-server turn is unrecoverably incomplete because turn/completed never arrived, it should:

Fix Action

Fix / Workaround

Workaround in this environment: avoid the Codex app-server runtime for user-facing chat lanes until this recovery gap is fixed. For OpenAI GPT models, forcing harness=pi is only viable if the OpenAI provider credentials have api.responses.write; otherwise the normal OpenAI Responses API path fails with HTTP 401.

Code Example

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai/gpt-5.5",
        "fallbacks": []
      },
      "timeoutSeconds": 900
    }
  },
  "plugins": {
    "entries": {
      "codex": {
        "config": {
          "appServer": {
            "turnCompletionIdleTimeoutMs": 180000
          }
        }
      }
    }
  }
}

---

2026-05-19T08:07:43.604Z user prompt from Discord
2026-05-19T08:07:43.995Z assistant toolCall: bash mkdir -p /home/casper/.openclaw/workspace/artifacts/maria-ward-smartphone-start/site/assets/img
2026-05-19T08:07:44.092Z toolResult: completed exitCode 0 durationMs 0

---

"fallbacks": []

---

Gateway log excerpts:


2026-05-19T08:04:04.805Z [agent/embedded]
strict-agentic execution contract active:
runId=fa6f5365-411f-4028-8985-a9ec7a9b35a4
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
provider=openai-codex/gpt-5.5 harness=codex



2026-05-19T08:07:04.822Z [diagnostic]
stalled session:
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
sessionKey=agent:main:discord:channel:1497109509825626232
state=processing age=142s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=141s
recovery=none



2026-05-19T08:07:34.819Z [diagnostic]
stalled session:
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
sessionKey=agent:main:discord:channel:1497109509825626232
state=processing age=172s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=171s
recovery=none



2026-05-19T08:07:43.435Z [agent/embedded]
codex app-server turn idle timed out waiting for completion
{
  threadId: "019e3f2d-b7f2-7443-ab96-4e72fe219fe1",
  turnId: "019e3f43-8034-7001-88af-70ffeb9bdb43",
  idleMs: 180003,
  timeoutMs: 180000,
  lastActivityReason: "notification:item/completed",
  lastNotificationMethod: "item/completed"
}



2026-05-19T08:07:43.457Z [agent/embedded]
codex app-server client retired after timed-out turn
{
  threadId: "019e3f2d-b7f2-7443-ab96-4e72fe219fe1",
  turnId: "019e3f43-8034-7001-88af-70ffeb9bdb43",
  reason: "turn_completion_idle_timeout",
  clearedSharedClient: true
}



2026-05-19T08:07:44.198Z [agent/embedded]
embedded run failover decision
{
  runId: "fa6f5365-411f-4028-8985-a9ec7a9b35a4",
  stage: "assistant",
  decision: "surface_error",
  failoverReason: "timeout",
  profileFailureReason: "timeout",
  provider: "openai-codex",
  model: "gpt-5.5",
  fallbackConfigured: false,
  timedOut: true,
  aborted: true
}


While diagnosing the Discord stall from Telegram, the Telegram direct session itself hit the same failure mode.


2026-05-19T08:14:59.977Z [agent/embedded]
strict-agentic execution contract active:
runId=6e9f7eb1-5418-4d5c-aabc-df8a1e7f7619
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
provider=openai-codex/gpt-5.5 harness=codex



2026-05-19T08:17:38.070Z [diagnostic]
stalled session:
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
sessionKey=agent:main:telegram:direct:287384854
state=processing age=129s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=129s
recovery=none



2026-05-19T08:18:08.068Z [diagnostic]
stalled session:
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
sessionKey=agent:main:telegram:direct:287384854
state=processing age=159s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=159s
recovery=none



2026-05-19T08:18:29.525Z [agent/embedded]
codex app-server turn idle timed out waiting for completion
{
  threadId: "019e3ef4-0e36-7b32-b9e1-36b98cc115a8",
  turnId: "019e3f4d-7f38-74e2-82fc-2557e24a98b1",
  idleMs: 180001,
  timeoutMs: 180000,
  lastActivityReason: "notification:item/completed",
  lastNotificationMethod: "item/completed"
}



2026-05-19T08:18:30.061Z [agent/embedded]
embedded run failover decision
{
  runId: "6e9f7eb1-5418-4d5c-aabc-df8a1e7f7619",
  stage: "assistant",
  decision: "surface_error",
  failoverReason: "timeout",
  profileFailureReason: "timeout",
  provider: "openai-codex",
  model: "gpt-5.5",
  fallbackConfigured: false,
  timedOut: true,
  aborted: true
}
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

OpenClaw 2026.5.18 still loses productive Codex app-server turns when the last observed current-turn notification is item/completed and no turn/completed follows.

The already-merged fixes for #78756 and #82171 appear to be present in this installation. The current behavior is therefore not a missing-fix case, but a remaining recovery/turn-semantics problem:

  • the session lane enters processing
  • diagnostics report active_work_without_progress
  • lastProgress=codex_app_server:notification:item/completed
  • recovery=none
  • after turnCompletionIdleTimeoutMs, OpenClaw aborts the run
  • no useful visible recovery/status is delivered for the failed work
  • already-started work is not resumed

This makes chat lanes look silent or stuck and can drop real work after a completed tool call.

Steps to reproduce

  1. Run OpenClaw with a user-facing chat lane, reproduced here in Discord and Telegram direct chat.
  2. Configure an OpenAI GPT model to use the Codex app-server runtime.
  3. Disable model fallbacks to avoid hiding the Codex failure behind Anthropic fallback.
  4. Set plugins.entries.codex.config.appServer.turnCompletionIdleTimeoutMs to 180000 to prove which watchdog fires.
  5. In Discord, ask the agent to do a multi-step file-producing task, for example building a static multi-page web presence from existing project drafts.
  6. Observe that the assistant completes one tool item and then no turn/completed arrives.
  7. Watch diagnostics until the completion-idle timeout fires.

Relevant redacted config used during the test:

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai/gpt-5.5",
        "fallbacks": []
      },
      "timeoutSeconds": 900
    }
  },
  "plugins": {
    "entries": {
      "codex": {
        "config": {
          "appServer": {
            "turnCompletionIdleTimeoutMs": 180000
          }
        }
      }
    }
  }
}

Discord reproduction sequence from the session JSONL:

2026-05-19T08:07:43.604Z user prompt from Discord
2026-05-19T08:07:43.995Z assistant toolCall: bash mkdir -p /home/casper/.openclaw/workspace/artifacts/maria-ward-smartphone-start/site/assets/img
2026-05-19T08:07:44.092Z toolResult: completed exitCode 0 durationMs 0

No subsequent assistant work was written for the requested site build before timeout. The only filesystem result was directory creation.

Expected behavior

OpenClaw should not silently drop a productive Codex app-server turn after a completed tool item if the turn is still expected to continue.

At minimum, if OpenClaw decides the app-server turn is unrecoverably incomplete because turn/completed never arrived, it should:

  • release the session lane
  • send a visible channel status explaining the failed turn
  • preserve enough state to allow the user to retry/resume
  • avoid misleading explanations such as user/UI interruption when the log cause is turn_completion_idle_timeout
  • avoid losing already-started work without a user-visible failure/recovery message

Better behavior would distinguish:

  • completed tool call followed by expected assistant continuation
  • genuinely terminal item completion
  • missing/late turn/completed
  • app-server still computing vs. app-server protocol dead-air

Actual behavior

The run is aborted after the completion idle timeout. Diagnostics explicitly say recovery=none.

In the Discord reproduction, only a directory was created; no requested site files were produced. The user saw typing/activity disappear and no useful recovery surfaced.

Subsequent status questions can create confusing assistant explanations that imply a user/UI abort, even though the durable gateway evidence for the original run points to turn_completion_idle_timeout.

OpenClaw version

OpenClaw 2026.5.18 (50a2481)

Operating system

Ubuntu

Install method

npm global

Model

gpt-5.5

Provider / routing chain

openai-codex/gpt-5.5 -> Codex app-server harness -> OpenClaw embedded run -> Discord/Telegram chat lane

Additional provider/model setup details

Fallbacks were disabled during the primary test:

"fallbacks": []

This was intentional to avoid an Anthropic fallback hiding the Codex app-server failure.

turnCompletionIdleTimeoutMs was deliberately raised to 180000 during testing. The same pattern had previously been observed around the default shorter idle behavior; raising the timeout made it clear which watchdog fired.

Earlier tests with fallbacks enabled caused additional confusing behavior: OpenClaw fell back to Anthropic, then hit context overflow/compaction and separate message tool delivery errors.

Related issues/PRs:

  • #78756: Codex app-server turns time out after 60s despite meaningful progress
  • #79667: fix(codex): ignore account updates for turn liveness
  • #82171: Codex app-server can stall after the last current-turn item completes without turn/completed
  • #82172: fix(codex): fail fast after quiescent turn completion stalls

Logs, screenshots, and evidence

Gateway log excerpts:


2026-05-19T08:04:04.805Z [agent/embedded]
strict-agentic execution contract active:
runId=fa6f5365-411f-4028-8985-a9ec7a9b35a4
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
provider=openai-codex/gpt-5.5 harness=codex



2026-05-19T08:07:04.822Z [diagnostic]
stalled session:
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
sessionKey=agent:main:discord:channel:1497109509825626232
state=processing age=142s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=141s
recovery=none



2026-05-19T08:07:34.819Z [diagnostic]
stalled session:
sessionId=ac54314e-d1ad-4145-b8fe-932309953759
sessionKey=agent:main:discord:channel:1497109509825626232
state=processing age=172s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=171s
recovery=none



2026-05-19T08:07:43.435Z [agent/embedded]
codex app-server turn idle timed out waiting for completion
{
  threadId: "019e3f2d-b7f2-7443-ab96-4e72fe219fe1",
  turnId: "019e3f43-8034-7001-88af-70ffeb9bdb43",
  idleMs: 180003,
  timeoutMs: 180000,
  lastActivityReason: "notification:item/completed",
  lastNotificationMethod: "item/completed"
}



2026-05-19T08:07:43.457Z [agent/embedded]
codex app-server client retired after timed-out turn
{
  threadId: "019e3f2d-b7f2-7443-ab96-4e72fe219fe1",
  turnId: "019e3f43-8034-7001-88af-70ffeb9bdb43",
  reason: "turn_completion_idle_timeout",
  clearedSharedClient: true
}



2026-05-19T08:07:44.198Z [agent/embedded]
embedded run failover decision
{
  runId: "fa6f5365-411f-4028-8985-a9ec7a9b35a4",
  stage: "assistant",
  decision: "surface_error",
  failoverReason: "timeout",
  profileFailureReason: "timeout",
  provider: "openai-codex",
  model: "gpt-5.5",
  fallbackConfigured: false,
  timedOut: true,
  aborted: true
}


While diagnosing the Discord stall from Telegram, the Telegram direct session itself hit the same failure mode.


2026-05-19T08:14:59.977Z [agent/embedded]
strict-agentic execution contract active:
runId=6e9f7eb1-5418-4d5c-aabc-df8a1e7f7619
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
provider=openai-codex/gpt-5.5 harness=codex



2026-05-19T08:17:38.070Z [diagnostic]
stalled session:
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
sessionKey=agent:main:telegram:direct:287384854
state=processing age=129s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=129s
recovery=none



2026-05-19T08:18:08.068Z [diagnostic]
stalled session:
sessionId=9578d939-b2fd-4ec9-b65b-8a93348ca570
sessionKey=agent:main:telegram:direct:287384854
state=processing age=159s queueDepth=1
reason=active_work_without_progress
classification=stalled_agent_run
activeWorkKind=embedded_run
lastProgress=codex_app_server:notification:item/completed
lastProgressAge=159s
recovery=none



2026-05-19T08:18:29.525Z [agent/embedded]
codex app-server turn idle timed out waiting for completion
{
  threadId: "019e3ef4-0e36-7b32-b9e1-36b98cc115a8",
  turnId: "019e3f4d-7f38-74e2-82fc-2557e24a98b1",
  idleMs: 180001,
  timeoutMs: 180000,
  lastActivityReason: "notification:item/completed",
  lastNotificationMethod: "item/completed"
}



2026-05-19T08:18:30.061Z [agent/embedded]
embedded run failover decision
{
  runId: "6e9f7eb1-5418-4d5c-aabc-df8a1e7f7619",
  stage: "assistant",
  decision: "surface_error",
  failoverReason: "timeout",
  profileFailureReason: "timeout",
  provider: "openai-codex",
  model: "gpt-5.5",
  fallbackConfigured: false,
  timedOut: true,
  aborted: true
}

Impact and severity

Severity: high for user-facing chat lanes using Codex app-server.

Impact:

  • User-facing Discord/Telegram lanes can appear silent or stuck.
  • Real work may be dropped after a completed tool call.
  • Diagnostics say recovery=none, leaving no clear user-facing recovery path.
  • The failure can be confused with a user/UI abort even though logs show turn_completion_idle_timeout.
  • Increasing turnCompletionIdleTimeoutMs only delays the abort; it does not solve recovery.

Additional information

Why #78756 and #82171 do not fully cover this:

The fixes appear to be present and working in a narrow sense:

  • account/rate-limit updates are not prolonging this stall indefinitely
  • the session does not wait for the 30-minute terminal cap
  • the configured completion-idle watchdog fires

However, that still leaves a correctness/recovery gap:

  • productive work can be aborted after the last observed item/completed
  • no useful visible recovery is emitted
  • no resume/retry path is provided
  • the lane is not self-healing in a user-meaningful way

This looks like a remaining bug adjacent to #82171: the fail-fast behavior prevents long hangs, but it does not provide correct turn semantics or recovery when turn/completed is missing.

Suggested fix direction:

  1. Preserve and expose a structured recovery result when turn_completion_idle_timeout fires after item/completed.
  2. Emit a visible channel message when a user-facing lane aborts due to missing turn/completed, including the last completed item/tool and retry guidance.
  3. Add a retry/resume mechanism that restarts the turn with a compact summary of already-completed tool calls and their results.
  4. Improve app-server protocol handling so that if the final observed current-turn item is a tool result, OpenClaw does not treat silence as terminal without preserving recovery.
  5. Add diagnostics that distinguish:
    • turn/completed missing after assistant final text
    • turn/completed missing after tool result where more assistant work is expected
    • raw response completion stalls
    • user/UI aborts

Workaround in this environment: avoid the Codex app-server runtime for user-facing chat lanes until this recovery gap is fixed. For OpenAI GPT models, forcing harness=pi is only viable if the OpenAI provider credentials have api.responses.write; otherwise the normal OpenAI Responses API path fails with HTTP 401.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

OpenClaw should not silently drop a productive Codex app-server turn after a completed tool item if the turn is still expected to continue.

At minimum, if OpenClaw decides the app-server turn is unrecoverably incomplete because turn/completed never arrived, it should:

  • release the session lane
  • send a visible channel status explaining the failed turn
  • preserve enough state to allow the user to retry/resume
  • avoid misleading explanations such as user/UI interruption when the log cause is turn_completion_idle_timeout
  • avoid losing already-started work without a user-visible failure/recovery message

Better behavior would distinguish:

  • completed tool call followed by expected assistant continuation
  • genuinely terminal item completion
  • missing/late turn/completed
  • app-server still computing vs. app-server protocol dead-air

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Codex app-server stalls after `item/completed`, then aborts without recovery/status [4 comments, 3 participants]