openclaw - 💡(How to fix) Fix Gateway process exits after reasoning-only retry exhaustion on embedded agent turn (gpt-5.4-mini) [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71516Fetched 2026-04-26 05:12:01
View on GitHub
Comments
2
Participants
3
Timeline
6
Reactions
0
Author
Timeline (top)
labeled ×3commented ×2closed ×1

On a Linux gateway running OpenClaw 2026.4.23, embedded agent turns can repeatedly produce reasoning-only responses with no visible content. OpenClaw detects this and retries with visible-answer continuation, but when retry exhaustion is reached it surfaces an incomplete-turn error. In the observed incident, this was followed by cron error backoff / cron-delivery resolution failure and then a gateway-level disruption: Discord gateway websocket clean-close/reconnects and clients drop. Earlier tonight one gateway restart produced an unhandled promise rejection in the journal (CIAO PROBING CANCELLED).

The most suspicious sequence is a cron-triggered embedded turn using openai-codex/gpt-5.4-mini where runId === sessionId.

Error Message

On a Linux gateway running OpenClaw 2026.4.23, embedded agent turns can repeatedly produce reasoning-only responses with no visible content. OpenClaw detects this and retries with visible-answer continuation, but when retry exhaustion is reached it surfaces an incomplete-turn error. In the observed incident, this was followed by cron error backoff / cron-delivery resolution failure and then a gateway-level disruption: Discord gateway websocket clean-close/reconnects and clients drop. Earlier tonight one gateway restart produced an unhandled promise rejection in the journal (CIAO PROBING CANCELLED). 4. After 2 retries exhausted, logs show: reasoning-only retries exhausted — surfacing incomplete-turn error. 5. Immediately after: cron applies error backoff, and cron-delivery fails to resolve channel. 2026-04-25T02:28:45.156-07:00 [agent/embedded] reasoning-only retries exhausted: runId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 sessionId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 provider=openai-codex/gpt-5.4-mini attempts=2/2 — surfacing incomplete-turn error 2026-04-25T02:28:45.629-07:00 [cron] cron: applying error backoff jobId=86e8709c-85c5-455c-81f7-17cfef97be96 consecutiveErrors=4 backoffMs=900000 2026-04-25T02:26:50.352-07:00 [agent/embedded] incomplete turn detected: runId=3358e6f0-1ad7-45c6-bf6a-038574a14c31 sessionId=3358e6f0-1ad7-45c6-bf6a-038574a14c31 stopReason=stop payloads=0 — surfacing error to user

  • cron: applying error backoff: 18 total 2026-04-25T02:07:54.001-07:00 [agent/embedded] incomplete turn detected: runId=4466637f-b827-4c5a-bd71-b6e1af179242 sessionId=4466637f-b827-4c5a-bd71-b6e1af179242 stopReason=stop payloads=0 — surfacing error to user
  • nohup.out, *.err, *stderr*, *error*.log under home and /tmp/openclaw
  • surface a normal incomplete-turn error to the caller/job

Root Cause

On a Linux gateway running OpenClaw 2026.4.23, embedded agent turns can repeatedly produce reasoning-only responses with no visible content. OpenClaw detects this and retries with visible-answer continuation, but when retry exhaustion is reached it surfaces an incomplete-turn error. In the observed incident, this was followed by cron error backoff / cron-delivery resolution failure and then a gateway-level disruption: Discord gateway websocket clean-close/reconnects and clients drop. Earlier tonight one gateway restart produced an unhandled promise rejection in the journal (CIAO PROBING CANCELLED).

The most suspicious sequence is a cron-triggered embedded turn using openai-codex/gpt-5.4-mini where runId === sessionId.

Code Example

2026-04-25T02:27:57.767-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 sessionId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 provider=openai-codex/gpt-5.4-mini — retrying 1/2 with visible-answer continuation
2026-04-25T02:28:02.406-07:00 [lcm] bootstrap: session file rotated for session c09342c4-6329-4e1a-a38d-c4a62ea59a87: "/home/argus/.openclaw/agents/argus/sessions/4466637f-b827-4c5a-bd71-b6e1af179242.jsonl""/home/argus/.openclaw/agents/argus/sessions/c09342c4-6329-4e1a-a38d-c4a62ea59a87.jsonl" — resetting conversation 1906
2026-04-25T02:28:02.744-07:00 [lcm] bootstrap: purged 25 messages and all summaries for conversation 1906
2026-04-25T02:28:29.543-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 sessionId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 provider=openai-codex/gpt-5.4-mini — retrying 2/2 with visible-answer continuation
2026-04-25T02:28:45.156-07:00 [agent/embedded] reasoning-only retries exhausted: runId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 sessionId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 provider=openai-codex/gpt-5.4-mini attempts=2/2 — surfacing incomplete-turn error
2026-04-25T02:28:45.629-07:00 [cron] cron: applying error backoff jobId=86e8709c-85c5-455c-81f7-17cfef97be96 consecutiveErrors=4 backoffMs=900000
2026-04-25T02:28:45.663-07:00 [cron-delivery] cron: failed to resolve failure destination target: Channel is required when multiple channels are configured: discord, telegram Set delivery.channel explicitly or use a main session with a previous channel.
2026-04-25T02:29:51.555-07:00 [discord] gateway: Gateway websocket closed: 1000
2026-04-25T02:29:51.557-07:00 [discord] gateway: Gateway reconnect scheduled in 991ms (close, resume=true)

---

2026-04-25T02:26:50.352-07:00 [agent/embedded] incomplete turn detected: runId=3358e6f0-1ad7-45c6-bf6a-038574a14c31 sessionId=3358e6f0-1ad7-45c6-bf6a-038574a14c31 stopReason=stop payloads=0 — surfacing error to user

---

2026-04-25T01:41:15.698-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=e07660d5-46ad-4216-9865-e459ff0b7943 sessionId=e07660d5-46ad-4216-9865-e459ff0b7943 provider=openai-codex/gpt-5.4-mini — retrying 1/2 with visible-answer continuation
2026-04-25T01:41:42.249-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=e07660d5-46ad-4216-9865-e459ff0b7943 sessionId=e07660d5-46ad-4216-9865-e459ff0b7943 provider=openai-codex/gpt-5.4-mini — retrying 2/2 with visible-answer continuation
2026-04-25T02:07:09.334-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=4466637f-b827-4c5a-bd71-b6e1af179242 sessionId=4466637f-b827-4c5a-bd71-b6e1af179242 provider=openai-codex/gpt-5.4-mini — retrying 2/2 with visible-answer continuation
2026-04-25T02:07:54.001-07:00 [agent/embedded] incomplete turn detected: runId=4466637f-b827-4c5a-bd71-b6e1af179242 sessionId=4466637f-b827-4c5a-bd71-b6e1af179242 stopReason=stop payloads=0 — surfacing error to user

---

2026-04-25T02:12:13.783-07:00 [openclaw] Unhandled promise rejection: CIAO PROBING CANCELLED

---

openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE

---

OpenClaw 2026.4.23 (a979721)
Node.js v22.22.2
Linux typhon.promethean-dynamic.com 6.8.0-110-generic x86_64 GNU/Linux
Gateway command: /usr/bin/node /home/argus/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Gateway service: systemd user service openclaw-gateway.service
Gateway bind: 127.0.0.1:18789
Channels enabled: Discord, Telegram
LCM plugin enabled; compaction summarization model: openai-codex/gpt-5.4-mini
memory-lancedb-pro enabled
RAW_BUFFERClick to expand / collapse

Summary

On a Linux gateway running OpenClaw 2026.4.23, embedded agent turns can repeatedly produce reasoning-only responses with no visible content. OpenClaw detects this and retries with visible-answer continuation, but when retry exhaustion is reached it surfaces an incomplete-turn error. In the observed incident, this was followed by cron error backoff / cron-delivery resolution failure and then a gateway-level disruption: Discord gateway websocket clean-close/reconnects and clients drop. Earlier tonight one gateway restart produced an unhandled promise rejection in the journal (CIAO PROBING CANCELLED).

The most suspicious sequence is a cron-triggered embedded turn using openai-codex/gpt-5.4-mini where runId === sessionId.

Confirmed crash / disruption sequence observed

  1. A cron job triggers an embedded agent turn using model openai-codex/gpt-5.4-mini.
  2. gpt-5.4-mini returns a reasoning-only response (no visible content tokens).
  3. OpenClaw detects this and retries up to 2 times with visible-answer continuation.
  4. After 2 retries exhausted, logs show: reasoning-only retries exhausted — surfacing incomplete-turn error.
  5. Immediately after: cron applies error backoff, and cron-delivery fails to resolve channel.
  6. ~60-90 seconds later: gateway-level clients see drops/reconnects; Discord gateway shows clean close 1000 and reconnect.

Exact evidence from logs

From /tmp/openclaw/openclaw-2026-04-25.log:

2026-04-25T02:27:57.767-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 sessionId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 provider=openai-codex/gpt-5.4-mini — retrying 1/2 with visible-answer continuation
2026-04-25T02:28:02.406-07:00 [lcm] bootstrap: session file rotated for session c09342c4-6329-4e1a-a38d-c4a62ea59a87: "/home/argus/.openclaw/agents/argus/sessions/4466637f-b827-4c5a-bd71-b6e1af179242.jsonl" → "/home/argus/.openclaw/agents/argus/sessions/c09342c4-6329-4e1a-a38d-c4a62ea59a87.jsonl" — resetting conversation 1906
2026-04-25T02:28:02.744-07:00 [lcm] bootstrap: purged 25 messages and all summaries for conversation 1906
2026-04-25T02:28:29.543-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 sessionId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 provider=openai-codex/gpt-5.4-mini — retrying 2/2 with visible-answer continuation
2026-04-25T02:28:45.156-07:00 [agent/embedded] reasoning-only retries exhausted: runId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 sessionId=c09342c4-6329-4e1a-a38d-c4a62ea59a87 provider=openai-codex/gpt-5.4-mini attempts=2/2 — surfacing incomplete-turn error
2026-04-25T02:28:45.629-07:00 [cron] cron: applying error backoff jobId=86e8709c-85c5-455c-81f7-17cfef97be96 consecutiveErrors=4 backoffMs=900000
2026-04-25T02:28:45.663-07:00 [cron-delivery] cron: failed to resolve failure destination target: Channel is required when multiple channels are configured: discord, telegram Set delivery.channel explicitly or use a main session with a previous channel.
2026-04-25T02:29:51.555-07:00 [discord] gateway: Gateway websocket closed: 1000
2026-04-25T02:29:51.557-07:00 [discord] gateway: Gateway reconnect scheduled in 991ms (close, resume=true)

Also observed nearby incomplete-turn errors with payloads=0:

2026-04-25T02:26:50.352-07:00 [agent/embedded] incomplete turn detected: runId=3358e6f0-1ad7-45c6-bf6a-038574a14c31 sessionId=3358e6f0-1ad7-45c6-bf6a-038574a14c31 stopReason=stop payloads=0 — surfacing error to user

Additional instances found in recent logs

Searched /tmp/openclaw/openclaw-2026-04-24.log, /tmp/openclaw/openclaw-2026-04-25.log, and ~/.openclaw/logs.

Counts in recent logs:

  • reasoning-only assistant turn detected: 26 total
  • reasoning-only retries exhausted: 1 total
  • incomplete turn detected ... payloads=0: 3 total
  • cron: applying error backoff: 18 total
  • cron: failed to resolve failure destination target: 16 total
  • Channel is required when multiple channels are configured: 18 total
  • lcm bootstrap: session file rotated: 36 total
  • lcm bootstrap: purged ... messages: 36 total

Model/provider variation:

  • Many earlier reasoning-only detections used openai-codex/gpt-5.5 and recovered after 1 retry.
  • Later detections used openai-codex/gpt-5.4-mini.
  • The only observed full attempts=2/2 retry exhaustion was openai-codex/gpt-5.4-mini, run/session c09342c4-6329-4e1a-a38d-c4a62ea59a87.
  • In all detected reasoning-only runs, runId === sessionId.

Other gpt-5.4-mini examples:

2026-04-25T01:41:15.698-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=e07660d5-46ad-4216-9865-e459ff0b7943 sessionId=e07660d5-46ad-4216-9865-e459ff0b7943 provider=openai-codex/gpt-5.4-mini — retrying 1/2 with visible-answer continuation
2026-04-25T01:41:42.249-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=e07660d5-46ad-4216-9865-e459ff0b7943 sessionId=e07660d5-46ad-4216-9865-e459ff0b7943 provider=openai-codex/gpt-5.4-mini — retrying 2/2 with visible-answer continuation
2026-04-25T02:07:09.334-07:00 [agent/embedded] reasoning-only assistant turn detected: runId=4466637f-b827-4c5a-bd71-b6e1af179242 sessionId=4466637f-b827-4c5a-bd71-b6e1af179242 provider=openai-codex/gpt-5.4-mini — retrying 2/2 with visible-answer continuation
2026-04-25T02:07:54.001-07:00 [agent/embedded] incomplete turn detected: runId=4466637f-b827-4c5a-bd71-b6e1af179242 sessionId=4466637f-b827-4c5a-bd71-b6e1af179242 stopReason=stop payloads=0 — surfacing error to user

stderr / journal findings

Checked:

  • journalctl --user -u openclaw-gateway --since '2026-04-25 00:00:00'
  • /tmp/openclaw/openclaw-2026-04-25.log
  • /tmp/openclaw/openclaw-2026-04-24.log
  • ~/.openclaw/logs/*
  • ~/.pm2/pm2.log
  • nohup.out, *.err, *stderr*, *error*.log under home and /tmp/openclaw

There was no specific stack trace tied to the reasoning-only retries exhausted event. The only unhandled rejection found in the journal during the broader incident window was from a prior gateway stop/restart:

2026-04-25T02:12:13.783-07:00 [openclaw] Unhandled promise rejection: CIAO PROBING CANCELLED

Systemd recorded that prior process exit as:

openclaw-gateway.service: Main process exited, code=exited, status=1/FAILURE

Expected behavior

Reasoning-only retry exhaustion in an embedded/cron agent turn should be contained to that turn/job:

  • surface a normal incomplete-turn error to the caller/job
  • apply cron backoff if needed
  • not destabilize the gateway process or channel websocket clients
  • emit enough diagnostic context if the gateway exits/restarts

Actual behavior

After reasoning-only retry exhaustion / incomplete-turn handling, the system enters a gateway-level disruption window. In the observed case, Discord gateway websocket clean-close/reconnects occurred about 66 seconds after retry exhaustion, and connected clients dropped/reconnected. No useful stack trace was emitted for that sequence.

Environment

OpenClaw 2026.4.23 (a979721)
Node.js v22.22.2
Linux typhon.promethean-dynamic.com 6.8.0-110-generic x86_64 GNU/Linux
Gateway command: /usr/bin/node /home/argus/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789
Gateway service: systemd user service openclaw-gateway.service
Gateway bind: 127.0.0.1:18789
Channels enabled: Discord, Telegram
LCM plugin enabled; compaction summarization model: openai-codex/gpt-5.4-mini
memory-lancedb-pro enabled

Notes / suspected contributing factors

  • cron-delivery has a persistent config issue when multiple channels are configured and no delivery channel is explicit. This fires on every failure, but it appears to be secondary; the suspicious trigger is the reasoning-only / incomplete-turn path.
  • LCM bootstrap often rotates session files and purges messages immediately after the first reasoning-only retry. In the fully exhausted run, it rotated 4466637f...jsonl to c09342c4...jsonl and purged 25 messages.
  • The run/session identity pattern (runId === sessionId) is consistent across these embedded turns.
  • Earlier gpt-5.5 reasoning-only turns usually recovered after retry; gpt-5.4-mini produced the only observed exhaustion.

extent analysis

TL;DR

The issue can be mitigated by addressing the cron-delivery config issue and ensuring a proper delivery channel is set when multiple channels are configured.

Guidance

  • Investigate and resolve the cron-delivery configuration issue to prevent error backoff and failure resolution issues.
  • Review the LCM bootstrap process to understand why it rotates session files and purges messages after the first reasoning-only retry, and consider adjusting this behavior to prevent potential data loss.
  • Verify that the openai-codex/gpt-5.4-mini model is properly configured and functioning as expected, as it is the only model that has produced the observed retry exhaustion.
  • Consider adding additional logging or diagnostic context to help identify the root cause of the gateway disruption and websocket clean-close/reconnects.

Example

No specific code example is provided, as the issue appears to be related to configuration and process flow rather than a specific code snippet.

Notes

The issue is complex and may have multiple contributing factors. Resolving the cron-delivery config issue and adjusting the LCM bootstrap process may help mitigate the problem, but further investigation is needed to fully understand and resolve the issue.

Recommendation

Apply workaround: Address the cron-delivery configuration issue and ensure a proper delivery channel is set when multiple channels are configured. This may help prevent the error backoff and failure resolution issues that are contributing to the gateway disruption.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Reasoning-only retry exhaustion in an embedded/cron agent turn should be contained to that turn/job:

  • surface a normal incomplete-turn error to the caller/job
  • apply cron backoff if needed
  • not destabilize the gateway process or channel websocket clients
  • emit enough diagnostic context if the gateway exits/restarts

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING