openclaw - 💡(How to fix) Fix Session lane not released after LLM provider timeout, blocking all subsequent messages [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#81335Fetched 2026-05-14 03:33:11
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
1
Author
Timeline (top)
commented ×1

When an LLM provider (OpenAI Codex / gpt-5.5) returns consecutive server_error timeouts during a group chat agent run, the session lane is not properly released. All subsequent messages to the same session are queued indefinitely behind the "ghost run", effectively freezing the session until a full gateway restart.

Error Message

15:21:37 WARN embedded_run_agent_end: isError=true, error="LLM error server_error", failoverReason="timeout", model="gpt-5.5", provider="openai-codex" 15:22:34 WARN embedded_run_agent_end: isError=true, error="LLM error server_error" (second timeout) 15:19:08 WARN lane wait exceeded: lane=session:agent:main:feishu:group:oc_d497bd2b11243f8c001af50d983504ba waitedMs=403570 queueAhead=1

  1. Error recovery: When an agent run ends with isError: true due to provider timeout, ensure the session lane is properly released

Root Cause

When an LLM provider (OpenAI Codex / gpt-5.5) returns consecutive server_error timeouts during a group chat agent run, the session lane is not properly released. All subsequent messages to the same session are queued indefinitely behind the "ghost run", effectively freezing the session until a full gateway restart.

Fix Action

Workaround

Restart the gateway: openclaw gateway restart or systemctl --user restart openclaw-gateway.service

Code Example

15:21:37 WARN embedded_run_agent_end: isError=true, error="LLM error server_error", failoverReason="timeout", model="gpt-5.5", provider="openai-codex"
15:22:34 WARN embedded_run_agent_end: isError=true, error="LLM error server_error" (second timeout)
15:19:08 WARN lane wait exceeded: lane=session:agent:main:feishu:group:oc_d497bd2b11243f8c001af50d983504ba waitedMs=403570 queueAhead=1
15:22:46 INFO feishu[sanqi]: dispatching to agent (session=...)
15:22:46 INFO feishu[sanqi]: dispatch complete (queuedFinal=false, replies=0)
RAW_BUFFERClick to expand / collapse

Description

When an LLM provider (OpenAI Codex / gpt-5.5) returns consecutive server_error timeouts during a group chat agent run, the session lane is not properly released. All subsequent messages to the same session are queued indefinitely behind the "ghost run", effectively freezing the session until a full gateway restart.

Environment

  • OpenClaw version: 2026.5.7
  • OS: Linux 6.17.0-20-generic (x64), Ubuntu
  • Node: v24.14.1
  • Channel: Feishu (group chat, 4 bot accounts: sanqi/taiyi/shengongbao/nezha)
  • Model: openai-codex/gpt-5.5
  • Session type: agent:main:feishu:group:<chat_id>

Steps to Reproduce

  1. Configure a Feishu group chat with requireMention: true
  2. Have the agent perform a multi-step task in the group chat (involving tool calls like exec, edit)
  3. While the agent is executing, the LLM provider (Codex) returns consecutive server_error / timeout errors
  4. The agent run ends with isError: true
  5. Send a new message to the group chat (with @mention)

Expected Behavior

The session lane should be released after the failed agent run, allowing new messages to trigger a fresh agent run.

Actual Behavior

  • The session lane remains locked
  • New messages are dispatched but queued (queueAhead=1)
  • Diagnostic log shows: lane wait exceeded: lane=session:agent:main:feishu:group:... waitedMs=403570 queueAhead=1
  • dispatch complete (queuedFinal=false, replies=0) — messages are accepted but never processed
  • Only a full openclaw gateway restart recovers the session

Relevant Logs

15:21:37 WARN embedded_run_agent_end: isError=true, error="LLM error server_error", failoverReason="timeout", model="gpt-5.5", provider="openai-codex"
15:22:34 WARN embedded_run_agent_end: isError=true, error="LLM error server_error" (second timeout)
15:19:08 WARN lane wait exceeded: lane=session:agent:main:feishu:group:oc_d497bd2b11243f8c001af50d983504ba waitedMs=403570 queueAhead=1
15:22:46 INFO feishu[sanqi]: dispatching to agent (session=...)
15:22:46 INFO feishu[sanqi]: dispatch complete (queuedFinal=false, replies=0)

Impact

  • Group chat sessions become completely unresponsive after LLM provider errors
  • No CLI command or API endpoint exists to abort a stuck session lane (openclaw sessions only has cleanup, no abort)
  • Users must restart the entire gateway to recover, which affects all sessions

Suggested Fix

  1. Error recovery: When an agent run ends with isError: true due to provider timeout, ensure the session lane is properly released
  2. Lane timeout: Add a configurable lane lock timeout (e.g., 5 minutes) — if a run doesn't complete within the timeout, force-release the lane
  3. CLI command: Add openclaw session abort <session-key> to allow manual recovery without full gateway restart
  4. Diagnostic: Log a warning when a lane has been held for an unusually long time

Workaround

Restart the gateway: openclaw gateway restart or systemctl --user restart openclaw-gateway.service

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING