openclaw - 💡(How to fix) Fix [Bug]: Codex harness sessions stall with active_work_without_progress and stale running/hasActiveRun=false state on 2026.5.19

openclaw2026-05-21 01:10:11

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

On OpenClaw 2026.5.19, multiple Telegram sessions using the Codex harness logged stalled_agent_run / active_work_without_progress after embedded_run:started; session snapshots also showed stale status:"running" rows with hasActiveRun:false while the gateway and Telegram channel remained healthy.

Root Cause

Fix Action

Fix / Workaround

Gateway health remained OK and Telegram stayed connected.
Diagnostic logs reported classification=stalled_agent_run, reason=active_work_without_progress, activeWorkKind=embedded_run, and lastProgress=embedded_run:started for multiple agents.
Some recoveries aborted the embedded run with action=abort_embedded_run, aborted=true, drained=false, forceCleared=true, released=1.
A session snapshot before mitigation showed four direct Telegram sessions as status:"running" while hasActiveRun:false.
After setting diagnostics.stuckSessionAbortMs=720000 and restarting the gateway, the same session listing showed no active running/timeout sessions.

Current mitigation applied locally: diagnostics.stuckSessionAbortMs=720000 (12 minutes) followed by openclaw gateway restart.

Environment:

OpenClaw: 2026.5.19 (a185ca2)
Node: v24.15.0
OS: macOS 26.5 (25F71), arm64
Gateway: local loopback, LaunchAgent running
Telegram: 11/11 accounts OK in openclaw status --all
Current diagnostics mitigation: {"stuckSessionAbortMs":720000}

Code Example

Identifiers below are redacted. Direct Telegram chat IDs, token data, local username, machine name, and full home paths are omitted.

Environment:
- OpenClaw: 2026.5.19 (a185ca2)
- Node: v24.15.0
- OS: macOS 26.5 (25F71), arm64
- Gateway: local loopback, LaunchAgent running
- Telegram: 11/11 accounts OK in `openclaw status --all`
- Current diagnostics mitigation: `{"stuckSessionAbortMs":720000}`

Representative diagnostic log lines from 2026-05-20:


2026-05-20T21:08:12-03:00 stalled session: sessionId=<redacted> sessionKey=agent:gus:telegram:direct:<redacted> state=processing age=298s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=embedded_run:started lastProgressAge=297s recovery=none

2026-05-20T21:09:57-03:00 stuck session recovery: sessionId=<redacted> sessionKey=agent:gus:telegram:direct:<redacted> age=388s action=abort_embedded_run aborted=true drained=false released=1

2026-05-20T21:09:57-03:00 stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=<redacted> sessionKey=agent:gus:telegram:direct:<redacted> activeWorkKind=embedded_run lane=session:agent:gus:telegram:direct:<redacted> aborted=true drained=false forceCleared=true released=1

2026-05-20T21:10:12-03:00 stalled session: sessionId=unknown sessionKey=agent:kramer:telegram:direct:<redacted> state=processing age=357s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=embedded_run:started lastProgressAge=356s recovery=none

2026-05-20T21:10:57-03:00 stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=<redacted> sessionKey=agent:kramer:telegram:direct:<redacted> activeWorkKind=embedded_run lane=session:agent:kramer:telegram:direct:<redacted> aborted=true drained=false forceCleared=true released=1


Session snapshot evidence:


Before mitigation/restart:
- total sessions: 195
- status counts: done=134, failed=1, missing=56, running=4
- four recent Telegram direct sessions were `status:"running"` with `hasActiveRun:false`
- affected sessions used `openai/gpt-5.5`

After mitigation/restart:
- `activeRunning: []`
- status counts in the returned session list: done=138, failed=1
- gateway health OK and event loop not degraded after warm-up

RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

Summary

Steps to reproduce

NOT_ENOUGH_INFO

Expected behavior

Codex-harness turns should either emit meaningful progress/turn completion or be aborted/released with consistent session state. Persisted session state should not remain status:"running" when OpenClaw no longer has an active run registered for that session. New Telegram messages should not be blocked by stale lane state while gateway health and Telegram connectivity are OK.

Actual behavior

Observed on several Telegram direct sessions using the Codex harness:

Gateway health remained OK and Telegram stayed connected.
Diagnostic logs reported classification=stalled_agent_run, reason=active_work_without_progress, activeWorkKind=embedded_run, and lastProgress=embedded_run:started for multiple agents.
Some recoveries aborted the embedded run with action=abort_embedded_run, aborted=true, drained=false, forceCleared=true, released=1.
A session snapshot before mitigation showed four direct Telegram sessions as status:"running" while hasActiveRun:false.
After setting diagnostics.stuckSessionAbortMs=720000 and restarting the gateway, the same session listing showed no active running/timeout sessions.

No transport outage was observed in the same window.

OpenClaw version

OpenClaw 2026.5.19 (a185ca2)

Operating system

macOS 26.5 (25F71), arm64

Install method

Homebrew Node global package path (/opt/homebrew/bin/openclaw -> ../lib/node_modules/openclaw/openclaw.mjs), managed by macOS LaunchAgent

Model

openai/gpt-5.5

Provider / routing chain

OpenClaw Telegram -> Codex harness -> openai-codex/gpt-5.5

Additional provider/model setup details

Default model config reports primary openai/gpt-5.5, but affected session snapshots showed modelProvider:"openai-codex", model:"gpt-5.5", and agentRuntime.id:"codex".

Affected Telegram sessions had large context windows (contextTokens:272000) and mixed thinking levels (high / xhigh observed in session snapshots). No API keys or auth tokens are included here.

Current mitigation applied locally: diagnostics.stuckSessionAbortMs=720000 (12 minutes) followed by openclaw gateway restart.

Logs, screenshots, and evidence

Identifiers below are redacted. Direct Telegram chat IDs, token data, local username, machine name, and full home paths are omitted.

Environment:
- OpenClaw: 2026.5.19 (a185ca2)
- Node: v24.15.0
- OS: macOS 26.5 (25F71), arm64
- Gateway: local loopback, LaunchAgent running
- Telegram: 11/11 accounts OK in `openclaw status --all`
- Current diagnostics mitigation: `{"stuckSessionAbortMs":720000}`

Representative diagnostic log lines from 2026-05-20:


2026-05-20T21:08:12-03:00 stalled session: sessionId=<redacted> sessionKey=agent:gus:telegram:direct:<redacted> state=processing age=298s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=embedded_run:started lastProgressAge=297s recovery=none

2026-05-20T21:09:57-03:00 stuck session recovery: sessionId=<redacted> sessionKey=agent:gus:telegram:direct:<redacted> age=388s action=abort_embedded_run aborted=true drained=false released=1

2026-05-20T21:09:57-03:00 stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=<redacted> sessionKey=agent:gus:telegram:direct:<redacted> activeWorkKind=embedded_run lane=session:agent:gus:telegram:direct:<redacted> aborted=true drained=false forceCleared=true released=1

2026-05-20T21:10:12-03:00 stalled session: sessionId=unknown sessionKey=agent:kramer:telegram:direct:<redacted> state=processing age=357s queueDepth=1 reason=active_work_without_progress classification=stalled_agent_run activeWorkKind=embedded_run lastProgress=embedded_run:started lastProgressAge=356s recovery=none

2026-05-20T21:10:57-03:00 stuck session recovery outcome: status=aborted action=abort_embedded_run sessionId=<redacted> sessionKey=agent:kramer:telegram:direct:<redacted> activeWorkKind=embedded_run lane=session:agent:kramer:telegram:direct:<redacted> aborted=true drained=false forceCleared=true released=1


Session snapshot evidence:


Before mitigation/restart:
- total sessions: 195
- status counts: done=134, failed=1, missing=56, running=4
- four recent Telegram direct sessions were `status:"running"` with `hasActiveRun:false`
- affected sessions used `openai/gpt-5.5`

After mitigation/restart:
- `activeRunning: []`
- status counts in the returned session list: done=138, failed=1
- gateway health OK and event loop not degraded after warm-up

Impact and severity

Affected: Telegram direct sessions using the Codex harness with openai/gpt-5.5.

Severity: High for affected sessions. The gateway and Telegram transport stayed healthy, but individual lanes stopped making progress and required diagnostic recovery or restart/mitigation.

Frequency: Intermittent, but observed across multiple agents/sessions in the same installation. The same stale-session shape (status:"running" with hasActiveRun:false) was seen repeatedly in local diagnostics.

Consequence: Replies can be delayed, aborted, or blocked behind stale active-work state. Recovery can abort embedded runs after several minutes, causing lost work and extra retries/tokens.

Additional information

Related issues I found, but this report appears narrower/current-version specific:

This report is for OpenClaw 2026.5.19 with current diagnostics and Codex harness behavior. I do not have a deterministic minimal repro path yet; the evidence is from live Telegram sessions and gateway/session diagnostics.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix [Bug]: Codex harness sessions stall with active_work_without_progress and stale running/hasActiveRun=false state on 2026.5.19

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Bug type

Beta release blocker

Summary

Steps to reproduce

Expected behavior

Actual behavior

OpenClaw version

Operating system

Install method

Model

Provider / routing chain

Additional provider/model setup details

Logs, screenshots, and evidence

Impact and severity

Additional information

FAQ

Expected behavior

Still need to ship something?

TRENDING