openclaw - 💡(How to fix) Fix [2026.5.12-beta.3] Cron agent jobs time out after turn-accepted; Discord DM lane can be blocked by stale cron sessionKey

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

After upgrading a beta host to OpenClaw 2026.5.12-beta.3 (cc46ca9), two related reliability issues were observed around agent scheduling and channel responsiveness:

  1. Discord direct messages stopped receiving assistant replies even though the Discord channel and gateway both reported healthy.
  2. Multiple cron agent jobs began failing or timing out, commonly with cron: job execution timed out (last phase: turn-accepted).

This report is public-safe. Hostnames, IP addresses, user names, Discord IDs, account names, private paths, and private message contents have been replaced with generic placeholders.

Error Message

Cron jobs with sessionTarget: isolated should not be able to retain or reuse a Discord direct-message session key in a way that blocks live channel replies. If legacy state contains a stale direct-channel sessionKey, cron normalization/doctor/gateway startup should probably detect it and either clear it or warn loudly. After the same beta update window, multiple scheduled agent jobs failed. The most common error was:

Remaining stale error states after mitigation

Root Cause

  1. A workspace auto-commit job had one stale turn-accepted timeout from before the reroute.
  2. A daily tracker job had one generic cron timeout from before the reroute.
  3. A nightly long-running job had cron: job interrupted by gateway restart, caused by the manual recovery restart.

Fix Action

Fix / Workaround

Before mitigation, many cron jobs were configured with models in the OpenAI/Codex family, including names such as:

After mitigation, all visible cron agent jobs were pinned to:

Post-mitigation snapshot:

Code Example

agent:<agent>:discord:direct:<redacted-discord-dm-id>

---

cron: job execution timed out (last phase: turn-accepted)

---

cron: job execution timed out
codex app-server client retired after timed-out turn
Profile openai-codex:<redacted> timed out. Trying next account...
embedded run failover decision ... sourceProvider openai sourceModel gpt-5.5 ... timedOut true

---

openai/gpt-5.5
openai/gpt-5.4-mini
gpt-mini

---

anthropic/claude-opus-4-7

---

{
  "visibleCronJobs": 35,
  "modelCounts": [
    { "model": "anthropic/claude-opus-4-7", "count": 35 }
  ],
  "riskyModelsMatchingOpenAIOrCodexOrGpt": [],
  "runningTasks": 0
}

---

{
  "status": "ok",
  "summary": "DONE",
  "durationMs": 36916,
  "provider": "anthropic",
  "model": "claude-opus-4-7"
}
RAW_BUFFERClick to expand / collapse

Summary

After upgrading a beta host to OpenClaw 2026.5.12-beta.3 (cc46ca9), two related reliability issues were observed around agent scheduling and channel responsiveness:

  1. Discord direct messages stopped receiving assistant replies even though the Discord channel and gateway both reported healthy.
  2. Multiple cron agent jobs began failing or timing out, commonly with cron: job execution timed out (last phase: turn-accepted).

This report is public-safe. Hostnames, IP addresses, user names, Discord IDs, account names, private paths, and private message contents have been replaced with generic placeholders.

Environment

  • OpenClaw: 2026.5.12-beta.3 (cc46ca9)
  • Channel: beta
  • Install/update path: pnpm-based beta install
  • OS: macOS arm64
  • Node: 25.9.0
  • Gateway: LaunchAgent, local gateway reachable
  • Channels enabled during the incident: Discord, Telegram, WhatsApp
  • Plugins loaded included: browser, codex, discord, memory-core, telegram, whatsapp
  • OpenAI auth mode: OAuth/Codex route, not raw API key

Incident A: Discord DM stopped answering

Symptom

Discord was configured and showed healthy in openclaw status --deep, but messages in a direct-message channel were not receiving assistant replies. Gateway/channel health checks were not enough to expose the actual blockage.

Findings

The active task/session state showed a long-running cron job using the same sessionKey shape as the Discord DM lane:

agent:<agent>:discord:direct:<redacted-discord-dm-id>

Several cron jobs had stale/persisted sessionKey values that pointed at the Discord direct-message session instead of isolated cron sessions. These were scheduled agent turns, but they were effectively pinned to a direct-message session key.

Recovery

Clearing the stale sessionKey fields from the affected cron jobs and restarting the gateway restored Discord DM responsiveness. After restart, channel status was healthy, no background tasks were running, and Discord inbound/outbound activity resumed.

Why this seems like an OpenClaw issue

Cron jobs with sessionTarget: isolated should not be able to retain or reuse a Discord direct-message session key in a way that blocks live channel replies. If legacy state contains a stale direct-channel sessionKey, cron normalization/doctor/gateway startup should probably detect it and either clear it or warn loudly.

Incident B: Cron agent jobs timing out after turn acceptance

Symptom

After the same beta update window, multiple scheduled agent jobs failed. The most common error was:

cron: job execution timed out (last phase: turn-accepted)

Other related timeout forms were also observed:

cron: job execution timed out
codex app-server client retired after timed-out turn
Profile openai-codex:<redacted> timed out. Trying next account...
embedded run failover decision ... sourceProvider openai sourceModel gpt-5.5 ... timedOut true

One frequent cron job accumulated repeated consecutive failures, then succeeded immediately after being moved off the OpenAI/Codex route.

Model/provider pattern

Before mitigation, many cron jobs were configured with models in the OpenAI/Codex family, including names such as:

openai/gpt-5.5
openai/gpt-5.4-mini
gpt-mini

The gateway default still reported gpt-5.5, and recent session rows showed some cron sessions on runtime OpenAI Codex with unknown token accounting after the timed-out runs.

After mitigation, all visible cron agent jobs were pinned to:

anthropic/claude-opus-4-7

Post-mitigation snapshot:

{
  "visibleCronJobs": 35,
  "modelCounts": [
    { "model": "anthropic/claude-opus-4-7", "count": 35 }
  ],
  "riskyModelsMatchingOpenAIOrCodexOrGpt": [],
  "runningTasks": 0
}

The previously failing high-frequency sync job completed successfully after the reroute:

{
  "status": "ok",
  "summary": "DONE",
  "durationMs": 36916,
  "provider": "anthropic",
  "model": "claude-opus-4-7"
}

Remaining stale error states after mitigation

Three cron entries still displayed last-run errors, but they were consistent with prior state rather than active stuck work:

  1. A workspace auto-commit job had one stale turn-accepted timeout from before the reroute.
  2. A daily tracker job had one generic cron timeout from before the reroute.
  3. A nightly long-running job had cron: job interrupted by gateway restart, caused by the manual recovery restart.

At the time of the final snapshot, openclaw tasks list --status running --json returned zero running tasks.

Expected behavior

  • Cron jobs using sessionTarget: isolated should not inherit or retain live Discord DM session keys.
  • A stale cron sessionKey that points at a channel/direct-message lane should be detected by doctor, migration, cron edit, or gateway startup.
  • If an agent turn is accepted and then times out, diagnostics should clearly distinguish provider timeout, harness timeout, gateway/session lock, context maintenance delay, and fallback exhaustion.
  • Cron job state should make it easy to distinguish stale last-run failures from currently active failures.
  • If the OpenAI/Codex route is unavailable or timing out, fallback behavior and runtime/provider labels should be consistent in both run history and status output.

Actual behavior

  • Discord DM replies stopped despite healthy channel/gateway status because cron state appeared to be occupying or contaminating the DM session lane.
  • Cron jobs using OpenAI/Codex-family model names timed out after turn acceptance.
  • Some run/status views kept showing old OpenAI Codex cron sessions after job payloads were rerouted, which made it harder to distinguish stale session rows from active config.
  • Moving cron jobs to an Anthropic model cleared the repeated timeout behavior for the high-frequency job.

Mitigation used locally

  1. Cleared stale sessionKey fields from cron jobs that referenced a Discord direct-message session.
  2. Restarted the gateway to release the blocked direct-message lane.
  3. Bulk-edited scheduled agent jobs away from OpenAI/Codex-family model names.
  4. Pinned all visible cron agent jobs to an Anthropic model.
  5. Verified: scheduler enabled, gateway reachable, Discord configured, zero running tasks, and the high-frequency cron job succeeded manually after reroute.

Suggested areas to inspect

  1. Cron migration/normalization: reject or clear sessionKey values that point at agent:<agent>:discord:direct:* when sessionTarget is isolated.
  2. Gateway lane locking: ensure a scheduled cron run cannot block live channel replies by sharing a direct-message session key.
  3. Doctor checks: add a check for cron jobs whose sessionKey kind does not match their schedule/session target.
  4. Timeout diagnostics: preserve the exact failing layer for turn-accepted timeouts and expose whether the provider, harness, context engine, or fallback path caused the timeout.
  5. Model alias handling: clarify whether openai/gpt-5.5, openai-codex/gpt-5.5, gpt-5.5, and gpt-mini should normalize to the same provider/harness or be rejected with a targeted config warning.
  6. Status/run history: mark stale last-run errors separately from active failures so operators can tell whether a mitigation actually cleared the current problem.

Severity assessment

I would rank this as high severity for beta reliability because it can make a configured messaging channel appear healthy while not answering, and can cause scheduled jobs to fail repeatedly after the update. The workaround is available, but it requires manual inspection/editing of cron session keys and model overrides.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Cron jobs using sessionTarget: isolated should not inherit or retain live Discord DM session keys.
  • A stale cron sessionKey that points at a channel/direct-message lane should be detected by doctor, migration, cron edit, or gateway startup.
  • If an agent turn is accepted and then times out, diagnostics should clearly distinguish provider timeout, harness timeout, gateway/session lock, context maintenance delay, and fallback exhaustion.
  • Cron job state should make it easy to distinguish stale last-run failures from currently active failures.
  • If the OpenAI/Codex route is unavailable or timing out, fallback behavior and runtime/provider labels should be consistent in both run history and status output.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [2026.5.12-beta.3] Cron agent jobs time out after turn-accepted; Discord DM lane can be blocked by stale cron sessionKey