openclaw - 💡(How to fix) Fix [Bug]: Codex bundled harness initialize still hangs in 2026.5.18 isolated cron — surfaces via #64744 timeout-wrapping as 'isolated agent setup timed out before runner start'

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

In OpenClaw 2026.5.18, isolated cron jobs running payload.kind: agentTurn with openai/gpt-5.5 via the bundled codex harness consistently fail with cron: isolated agent setup timed out before runner start ~60 s before the runner starts, while the same job-config completes cleanly in ~26 s when switched to claude-cli/claude-opus-4-7.

Error Message

Diagnostic isolation matrix (all five variations on the same job-id, gateway, and cron-config):

#model / runtimeresultduration
1openai/gpt-5.5 (codex), scheduled fire❌ setup-timeout~60 s
2openai/gpt-5.5 (codex), manual cron run❌ setup-timeout~60 s
3openai/gpt-5.5, --clear-agent (agentId removed)❌ setup-timeout~60 s
4openai/gpt-5.5, payload.timeoutSeconds: 1800❌ setup-timeout~60 s (raising outer timeout does not help — setup hangs BEFORE runner)
5--model claude-cli/claude-opus-4-7✅ delivered cleanly~26 s

Log excerpt (one representative cycle, manual fire 08:10:00 CEST 2026-05-20):

2026-05-20T08:10:00 cron run enqueued runId=manual:f8d337e0-...:1779257403555:3 2026-05-20T08:11:03 cron: job run returned error status (lastError: "cron: isolated agent setup timed out before runner start") 2026-05-20T08:11:03 cron: applying error backoff

Codex worker process at time of failure (ps):

1889996 17:09 0.0 0.2 node .../codex.js app-server --listen stdio:// 1890003 17:09 8.4 0.8 .../codex-linux-arm64/.../codex app-server --listen stdio://

(uptime 17 minutes, idle CPU, both processes alive — no obvious worker-side hang)

openclaw doctor reports clean before, during, and after the failure cycle.

Symptom-onset: First setup-timeout observed at 2026-05-19 02:10 CEST on the 2026.5.12 build (cron-server dist file server-cron-C6BmZ1gR.js). Persisted across the 2026.5.12 → 2026.5.18 auto-upgrade at 03:03 CEST (dist file changed to server-cron-CKQu4czX.js), with all subsequent ~30 scheduled fires affected over ~14 hours.

Root Cause

In OpenClaw 2026.5.18, isolated cron jobs running payload.kind: agentTurn with openai/gpt-5.5 via the bundled codex harness consistently fail with cron: isolated agent setup timed out before runner start ~60 s before the runner starts, while the same job-config completes cleanly in ~26 s when switched to claude-cli/claude-opus-4-7.

Fix Action

Fix / Workaround

Consequence: ~14 h of failure-DM-spam to chat channel before mitigation. Effective workaround: route the heartbeat or affected cron jobs through claude-cli/claude-opus-4-7 (or another non-codex runtime). Primary main-conversation can remain on codex.

Happy to provide additional logs, run further diagnostics, or test patches against this setup.

Code Example

{
     "schedule": { "kind": "cron", "expr": "0 20,21,22,23,0,1,2,3,4,5,6 * * *", "tz": "Europe/Berlin" },
     "sessionTarget": "isolated",
     "payload": {
       "kind": "agentTurn",
       "message": "<any agent prompt>",
       "timeoutSeconds": 1800,
       "lightContext": true
     }
   }

---

"lastErrorReason": "timeout",
"lastDiagnosticSummary": "cron: isolated agent setup timed out before runner start",
"consecutiveErrors": 16

---

**Diagnostic isolation matrix (all five variations on the same job-id, gateway, and cron-config):**

| # | model / runtime | result | duration |
|---|---|---|---|
| 1 | `openai/gpt-5.5` (codex), scheduled fire | ❌ setup-timeout | ~60 s |
| 2 | `openai/gpt-5.5` (codex), manual `cron run` | ❌ setup-timeout | ~60 s |
| 3 | `openai/gpt-5.5`, `--clear-agent` (agentId removed) | ❌ setup-timeout | ~60 s |
| 4 | `openai/gpt-5.5`, `payload.timeoutSeconds: 1800` | ❌ setup-timeout | ~60 s (raising outer timeout does not help — setup hangs BEFORE runner) |
| 5 | `--model claude-cli/claude-opus-4-7` | ✅ delivered cleanly | ~26 s |

**Log excerpt** (one representative cycle, manual fire 08:10:00 CEST 2026-05-20):

2026-05-20T08:10:00 cron run enqueued runId=manual:f8d337e0-...:1779257403555:3
2026-05-20T08:11:03 cron: job run returned error status (lastError: "cron: isolated agent setup timed out before runner start")
2026-05-20T08:11:03 cron: applying error backoff


**Codex worker process at time of failure** (`ps`):

1889996  17:09  0.0  0.2  node .../codex.js app-server --listen stdio://
1890003  17:09  8.4  0.8  .../codex-linux-arm64/.../codex app-server --listen stdio://

(uptime 17 minutes, idle CPU, both processes alive — no obvious worker-side hang)

**`openclaw doctor`** reports clean before, during, and after the failure cycle.

**Symptom-onset:** First setup-timeout observed at 2026-05-19 02:10 CEST on the `2026.5.12` build (cron-server dist file `server-cron-C6BmZ1gR.js`). Persisted across the 2026.5.122026.5.18 auto-upgrade at 03:03 CEST (dist file changed to `server-cron-CKQu4czX.js`), with all subsequent ~30 scheduled fires affected over ~14 hours.
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

In OpenClaw 2026.5.18, isolated cron jobs running payload.kind: agentTurn with openai/gpt-5.5 via the bundled codex harness consistently fail with cron: isolated agent setup timed out before runner start ~60 s before the runner starts, while the same job-config completes cleanly in ~26 s when switched to claude-cli/claude-opus-4-7.

Steps to reproduce

  1. Configure an OpenClaw 2026.5.18 install with the openai-codex plugin enabled and an OAuth profile for openai-codex (no API-key fallback).
  2. Set agents.defaults.model.primary to openai/gpt-5.5 with models.providers.openai-codex routing.
  3. Create a recurring cron job with:
    {
      "schedule": { "kind": "cron", "expr": "0 20,21,22,23,0,1,2,3,4,5,6 * * *", "tz": "Europe/Berlin" },
      "sessionTarget": "isolated",
      "payload": {
        "kind": "agentTurn",
        "message": "<any agent prompt>",
        "timeoutSeconds": 1800,
        "lightContext": true
      }
    }
  4. Trigger manually: openclaw cron run <job-id>.
  5. After ~60 s, observe cron get <job-id> reports state.lastDiagnosticSummary: "cron: isolated agent setup timed out before runner start".
  6. Repeat with openclaw cron edit <job-id> --model claude-cli/claude-opus-4-7 and openclaw cron run <job-id> — completes successfully in ~26 s.

Expected behavior

The isolated cron job should complete agent setup and start the runner within seconds (reference: the same job-config with --model claude-cli/claude-opus-4-7 completes in ~26 s end-to-end). Pre-#64744 the underlying initialize hang was indefinite; #64744 added timeout-wrapping so the failure surfaces explicitly, but the expected post-fix behavior is that setup either completes or surfaces an explicit underlying error — not that every codex isolated-cron setup deterministically times out.

Actual behavior

Every fire — scheduled or manual cron run — terminates at ~60 s with:

"lastErrorReason": "timeout",
"lastDiagnosticSummary": "cron: isolated agent setup timed out before runner start",
"consecutiveErrors": 16

No HTTP error is logged. The codex app-server --listen stdio:// worker process is alive and idle (0 % CPU, 12+ min uptime) at the moment the timeout fires. Interactive sessions on the same gateway continue to work normally with codex. Only the isolated-cron + codex combination fails.

OpenClaw version

2026.5.18

Operating system

Linux 6.12.75+rpt-rpi-2712 aarch64 (host: Raspberry Pi 500+, 16 GB RAM, NVMe storage)

Install method

Global npm install: npm install -g openclaw into ~/.npm-global/lib/node_modules/openclaw. Managed as a systemd user-service (openclaw-gateway.service). Node 24.15.0.

Model

openai/gpt-5.5 (configured as agents.defaults.model.primary) — fails claude-cli/claude-opus-4-7 (manual override) — works.

Provider / routing chain

openai/gpt-5.5 resolves through models.providers.openai-codex (auth mode: oauth, no apiKey). Codex plugin enabled (plugins.entries.codex.enabled: true). Codex bundled app-server transport: stdio (default). No external proxy or gateway in front of the OpenClaw gateway. Local loopback only.

Additional provider/model setup details

Effective agents.defaults.model.fallbacks chain: claude-cli/claude-opus-4-7, claude-cli/claude-sonnet-4-6, openai/gpt-5.5, openai/gpt-5.5, claude-cli/claude-haiku-4-5. Fallback chain is not exercised by the setup-phase timeout — failure occurs before model invocation, so fallback policy does not apply.

plugins.entries.codex.config set to { codexDynamicToolsLoading: "searchable", codexDynamicToolsExclude: [] }. After observing the bug, also tried adding appServer.clearEnv: [] explicitly — no change in symptom.

agents.defaults.heartbeat is configured (separate from the cron-job) with model: "claude-cli/claude-opus-4-7" as a working bridge. Codex remains the primary main-conversation runtime; only isolated-cron-sessions are affected.

Logs, screenshots, and evidence

**Diagnostic isolation matrix (all five variations on the same job-id, gateway, and cron-config):**

| # | model / runtime | result | duration |
|---|---|---|---|
| 1 | `openai/gpt-5.5` (codex), scheduled fire | ❌ setup-timeout | ~60 s |
| 2 | `openai/gpt-5.5` (codex), manual `cron run` | ❌ setup-timeout | ~60 s |
| 3 | `openai/gpt-5.5`, `--clear-agent` (agentId removed) | ❌ setup-timeout | ~60 s |
| 4 | `openai/gpt-5.5`, `payload.timeoutSeconds: 1800` | ❌ setup-timeout | ~60 s (raising outer timeout does not help — setup hangs BEFORE runner) |
| 5 | `--model claude-cli/claude-opus-4-7` | ✅ delivered cleanly | ~26 s |

**Log excerpt** (one representative cycle, manual fire 08:10:00 CEST 2026-05-20):

2026-05-20T08:10:00 cron run enqueued runId=manual:f8d337e0-...:1779257403555:3
2026-05-20T08:11:03 cron: job run returned error status (lastError: "cron: isolated agent setup timed out before runner start")
2026-05-20T08:11:03 cron: applying error backoff


**Codex worker process at time of failure** (`ps`):

1889996  17:09  0.0  0.2  node .../codex.js app-server --listen stdio://
1890003  17:09  8.4  0.8  .../codex-linux-arm64/.../codex app-server --listen stdio://

(uptime 17 minutes, idle CPU, both processes alive — no obvious worker-side hang)

**`openclaw doctor`** reports clean before, during, and after the failure cycle.

**Symptom-onset:** First setup-timeout observed at 2026-05-19 02:10 CEST on the `2026.5.12` build (cron-server dist file `server-cron-C6BmZ1gR.js`). Persisted across the 2026.5.12 → 2026.5.18 auto-upgrade at 03:03 CEST (dist file changed to `server-cron-CKQu4czX.js`), with all subsequent ~30 scheduled fires affected over ~14 hours.

Impact and severity

Affected users/systems/channels: Any deployment running isolated agentTurn cron jobs on the bundled codex harness with openai/gpt-5.5 via OAuth.

Severity: Blocks workflow. Recurring scheduled work fails deterministically. Health/healthcheck-style automation that depends on isolated cron + codex stops running. Affected my agent's heartbeat-equivalent cron jobs which provide proactive maintenance behavior — all delivery turns into Teams-DM failure-notifications.

Frequency: Always (every cron fire that uses the codex isolated-session-setup path). Manual cron run reproduces 100 % of the time. Scheduled fires fail 100 % of the time and accumulate consecutive-error backoff until the cron-job is effectively disabled.

Consequence: ~14 h of failure-DM-spam to chat channel before mitigation. Effective workaround: route the heartbeat or affected cron jobs through claude-cli/claude-opus-4-7 (or another non-codex runtime). Primary main-conversation can remain on codex.

Additional information

Related and prior issues:

  • #64744 (closed 2026-04-27 as implemented) — "codex/gpt-5.4 via bundled Codex harness hangs indefinitely on initialize". That fix wraps initialize/startup with timeouts so the underlying initialize-hang surfaces as an explicit timeout. The post-#64744 behavior I am observing is consistent with the fix being correctly applied (timeout fires instead of indefinite hang), but the underlying initialize-hang itself appears unresolved — codex isolated-cron setup still hangs 100 % of the time, just visibly now.
  • #66251 (open umbrella) — "Track Codex harness stability work". Linking this report so maintainers can track under the umbrella.
  • #61741 (closed) — "Race condition in subagent/session cleanup causes late child stdout to hit cleared active run". Referenced by jankutschera in #64744 as potential underlying race; mentioning here in case the codex isolated-cron path triggers the same class of state-race.
  • #82662 (open) — different bug: HTTP 408 from deepseek/Xiaomi-Mimo providers + memory-core plugin auto-cron. Same diagnostic-string by coincidence but different mechanism.

Regression boundaries: Last known good version not directly observed (no logs predating 5.12 still on disk after a tmpfs cleanup event on 2026-05-18). First known bad: 2026.5.12 (build hash server-cron-C6BmZ1gR.js) — symptom-onset 2026-05-19 02:10 CEST while still on 5.12. Symptom persists in 2026.5.18 (build hash server-cron-CKQu4czX.js).

Happy to provide additional logs, run further diagnostics, or test patches against this setup.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The isolated cron job should complete agent setup and start the runner within seconds (reference: the same job-config with --model claude-cli/claude-opus-4-7 completes in ~26 s end-to-end). Pre-#64744 the underlying initialize hang was indefinite; #64744 added timeout-wrapping so the failure surfaces explicitly, but the expected post-fix behavior is that setup either completes or surfaces an explicit underlying error — not that every codex isolated-cron setup deterministically times out.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Codex bundled harness initialize still hangs in 2026.5.18 isolated cron — surfaces via #64744 timeout-wrapping as 'isolated agent setup timed out before runner start'