openclaw - 💡(How to fix) Fix 2026.5.12 Codex app-server runtime stalls/timeouts despite healthy gateway

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

codex app-server turn idle timed out waiting for completion codex app-server turn idle timed out waiting for turn/completed FailoverError: LLM request timed out stalled_agent_run active_work_without_progress blocked_tool_call stopReason=aborted errorMessage=codex app-server attempt timed out

Root Cause

This looks related to the existing stuck-session / Codex runtime timeout family, but I am opening a dedicated report because the failure is reproducible on the latest npm release and because base Onboard/default config appears to route into the same Codex runtime path.

Fix Action

Fix / Workaround

  • Non-interactive base Onboard produces a minimal config: workspace, gateway, session scope, tools profile, wizard metadata.
  • OpenAI Codex Onboard auth is interactive-only and registers OAuth/device-code flows; API key is labeled backup.
  • The Codex auth config patch adds/normalizes openai/gpt-5.5, but does not appear to add any hidden reliability/timeout/recovery config that would avoid this runtime path.
  • Therefore this does not appear to be caused by skipping broad Onboard during upgrade. Base Onboard would still route official OpenAI/Codex usage into the Codex runtime path.

Current mitigation

Code Example

codex app-server turn idle timed out waiting for completion
codex app-server turn idle timed out waiting for turn/completed
FailoverError: LLM request timed out
stalled_agent_run
active_work_without_progress
blocked_tool_call
stopReason=aborted
errorMessage=codex app-server attempt timed out
RAW_BUFFERClick to expand / collapse

Bug description

After upgrading a mature OpenClaw fleet from 2026.5.7 to 2026.5.12, direct gateway health stays green and auth validates, but user-visible Telegram/interactive lanes and long-running scheduled/tool lanes can stall or time out on the Codex app-server runtime path.

This looks related to the existing stuck-session / Codex runtime timeout family, but I am opening a dedicated report because the failure is reproducible on the latest npm release and because base Onboard/default config appears to route into the same Codex runtime path.

Please close as duplicate if this is already fully covered by another issue.

Environment

  • OpenClaw package: 2026.5.12
  • OpenClaw version probe: OpenClaw 2026.5.12 (f066dd2)
  • Prior stable fleet version: 2026.5.7
  • OS: macOS 26.2
  • Node: v22.22.2
  • npm: 10.9.7
  • Gateway mode: local loopback
  • Primary affected runtime path: official OpenAI/Codex runtime via openai/gpt-5.5 / Codex app-server
  • Auth: OAuth/token-backed OpenAI Codex profiles; no API-key failure during the reproduced incidents

Expected behavior

  • Interactive Telegram messages should start promptly and return a final response when gateway health is live.
  • Long-running agent/tool calls should either complete, surface progress, or fail cleanly without wedging/stalling sessions.
  • A healthy gateway should not hide stuck Codex app-server sessions or delayed run start.

Actual behavior

Observed after controlled rollout to 2026.5.12:

  • Gateway /health stayed live.
  • Codex auth/profile checks passed.
  • Direct short smokes passed.
  • Some Telegram/interactive lanes had delayed starts or no timely visible reply.
  • Long-running/silent exec tool work in scheduled isolated agent runs could be killed/stalled around the dynamic-tool watchdog window.
  • Session/runtime logs showed Codex app-server timeout/stall classes rather than Telegram ingress or auth errors.

Representative local symptoms/log classes:

codex app-server turn idle timed out waiting for completion
codex app-server turn idle timed out waiting for turn/completed
FailoverError: LLM request timed out
stalled_agent_run
active_work_without_progress
blocked_tool_call
stopReason=aborted
errorMessage=codex app-server attempt timed out

One reproducible canary shape: a scheduled/isolated agent run invoking a silent exec longer than ~60 seconds can trip the Codex dynamic tool/runtime watchdog path. Short cron canaries and direct short smokes can pass, so this is easy to miss if postflight only checks health and short requests.

Reproduction shape

  1. Run OpenClaw 2026.5.12 with official OpenAI/Codex model routing, e.g. openai/gpt-5.5 using Codex runtime/app-server.
  2. Send normal interactive Telegram messages while the fleet has existing persistent sessions and multiple agent lanes.
  3. Run an isolated scheduled agent/tool lane that performs long/silent exec work (>60s) or otherwise produces no app-server/tool progress for a while.
  4. Observe that gateway health can remain OK while user-visible sessions are delayed/stalled/aborted and logs show Codex app-server timeout/stall classes above.

Source-level clues from installed package

In the installed 2026.5.12 package:

  • Official OpenAI provider traffic resolves to Codex runtime by default when runtime is auto.
  • openai-codex provider also resolves to Codex runtime by default.
  • Codex app-server default request/turn idle timeouts are 60000ms.
  • Dynamic tool call timeout constant is 30000ms with max 600000ms.
  • The observed log strings exist in the Codex app-server runtime path.

Onboard/base-config check

I also simulated openclaw onboard in an isolated temp HOME to compare base config vs the mature fleet config.

Findings:

  • Non-interactive base Onboard produces a minimal config: workspace, gateway, session scope, tools profile, wizard metadata.
  • OpenAI Codex Onboard auth is interactive-only and registers OAuth/device-code flows; API key is labeled backup.
  • The Codex auth config patch adds/normalizes openai/gpt-5.5, but does not appear to add any hidden reliability/timeout/recovery config that would avoid this runtime path.
  • Therefore this does not appear to be caused by skipping broad Onboard during upgrade. Base Onboard would still route official OpenAI/Codex usage into the Codex runtime path.

Impact

This is high-impact for mature multi-agent fleets because:

  • Direct health checks stay green.
  • Short smokes can pass.
  • Auth can be fully healthy.
  • User-visible Telegram lanes can still stall or respond very late.
  • Scheduled long-work lanes can silently fail/stall unless explicitly canaried.

Current mitigation

We are mitigating locally by pinning interactive and long-running work away from Codex runtime toward the embedded/Pi runtime path while keeping 2026.5.12 installed where possible.

Related issues that may overlap

  • Stuck sessions / recovery incomplete
  • Gateway/session deadlock cases
  • Codex app-server timeout after progress / blocked tool call recovery

I am filing this to provide latest-release evidence and the Onboard/base-config comparison, not to duplicate noise.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Interactive Telegram messages should start promptly and return a final response when gateway health is live.
  • Long-running agent/tool calls should either complete, surface progress, or fail cleanly without wedging/stalling sessions.
  • A healthy gateway should not hide stuck Codex app-server sessions or delayed run start.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix 2026.5.12 Codex app-server runtime stalls/timeouts despite healthy gateway