openclaw - 💡(How to fix) Fix Discord channel lanes intermittently held after run completes — "lane wait exceeded" with stale per-session jsonl mtime [1 participants]

openclaw2026-04-26 23:30:40

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#72421•Fetched 2026-04-27 05:30:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jessewunderlich

Participants

jessewunderlich

Error Message

Lane release path appears to miss a branch when a run completes via cron / heartbeat / system-event entry rather than a user-initiated reply. Multiple paths in the bundled dist do call sessionLock.release() (e.g. compact, wait-for-idle-before-flush, transcript-rewrite, selection), but at least one terminal path leaves the in-memory lock held without an exception/abort. Fresh user messages then pile up behind a lane that thinks it is still processing.

Fix Action

Workaround

cp ~/.openclaw/agents/main/sessions/sessions.json{,.bak}
python3 - <<'PY'
import json
p = '<sessions.json path>'
d = json.load(open(p))
del d['agent:main:discord:channel:<channel-id>']
json.dump(d, open(p,'w'), indent=2)
PY
openclaw gateway restart

I've shipped a detection-only watchdog on my side that scans the diagnostic log for lane wait exceeded and alerts to a Discord ops channel — happy to PR that as a CLI subcommand if useful.

Code Example

lane wait exceeded: lane=session:agent:main:discord:channel:<id> waitedMs=<>200000 queueAhead=0|1

---

cp ~/.openclaw/agents/main/sessions/sessions.json{,.bak}
python3 - <<'PY'
import json
p = '<sessions.json path>'
d = json.load(open(p))
del d['agent:main:discord:channel:<channel-id>']
json.dump(d, open(p,'w'), indent=2)
PY
openclaw gateway restart

RAW_BUFFERClick to expand / collapse

Symptom

User-sent Discord messages stop being processed in a channel even though the gateway is healthy and other channels work. Diagnostic log emits repeatedly:

lane wait exceeded: lane=session:agent:main:discord:channel:<id> waitedMs=<>200000 queueAhead=0|1

with no concurrent LLM activity (per-session <sessionId>.jsonl mtime is 5+ minutes stale, gateway CPU low). Manually deleting the session entry from agents/main/sessions/sessions.json and restarting unblocks the channel; new messages then process normally.

Repro hypothesis

Concrete observation: in a 24h window we saw lane wait exceeded entries for 3 distinct Discord channels with queueAhead=0 (head-of-queue stuck) and waitedMs ranging 220s–600s, with the per-session jsonl untouched for the entire wait period.

Environment

OpenClaw 2026.4.24 (homebrew)
Node 25.9.0
macOS arm64
Discord plugin, multiple guilds/channels, several main-session crons feeding through

Repro steps (best-effort)

Configure several payload.kind=systemEvent crons targeted at sessionTarget=main plus current-bound agentTurn jobs that share Discord channel session keys.
Run for hours.
Eventually one channel goes silent: queued user messages don't trigger a reply, gateway is otherwise healthy.

I have not been able to produce a tight unit-test repro — it appears to be a race between a run terminating and the lock-release callback in certain delivery paths.

Workaround

cp ~/.openclaw/agents/main/sessions/sessions.json{,.bak}
python3 - <<'PY'
import json
p = '<sessions.json path>'
d = json.load(open(p))
del d['agent:main:discord:channel:<channel-id>']
json.dump(d, open(p,'w'), indent=2)
PY
openclaw gateway restart

I've shipped a detection-only watchdog on my side that scans the diagnostic log for lane wait exceeded and alerts to a Discord ops channel — happy to PR that as a CLI subcommand if useful.

What would help

Confirm whether sessionLock.release() is invoked from every terminal path of runAgentTurn / processSystemEvent — particularly the cron-delivered and heartbeat-delivered branches.
Add a defensive watchdog inside the gateway: any lane held >N minutes with no in-flight LLM call should be auto-released (with a warning log).
Expose lane state via openclaw status --lanes for easier diagnosis.

Happy to provide gateway logs / sessions.json snapshots privately if helpful.

extent analysis

TL;DR

Manually deleting the stuck session entry from sessions.json and restarting the gateway may temporarily resolve the issue, but a more permanent fix requires ensuring sessionLock.release() is called from every terminal path of runAgentTurn and processSystemEvent.

Guidance

Review the code paths of runAgentTurn and processSystemEvent to confirm that sessionLock.release() is invoked from every terminal branch, especially those triggered by cron or heartbeat events.
Consider implementing a watchdog mechanism to auto-release lanes held for an extended period (>N minutes) without in-flight LLM calls, logging a warning to indicate potential issues.
Expose lane state via openclaw status --lanes to facilitate easier diagnosis of stuck lanes.
Temporarily apply the provided workaround script to delete stuck session entries and restart the gateway until a permanent fix is implemented.

Example

No code snippet is provided as the issue requires a review of the existing codebase rather than introducing new code.

Notes

The root cause appears to be a race condition between run termination and lock-release callbacks in certain delivery paths. A thorough review of the code and the addition of a defensive watchdog are necessary to prevent regressions.

Recommendation

Apply the workaround script to temporarily resolve stuck lanes until a permanent fix is implemented, which should involve ensuring sessionLock.release() is called from every terminal path and adding a watchdog mechanism to prevent future occurrences.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#LLM response #prompt template #agent execution #callback error #memory management

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Discord channel lanes intermittently held after run completes — "lane wait exceeded" with stale per-session jsonl mtime [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Workaround

Code Example

Symptom

Repro hypothesis

Environment

Repro steps (best-effort)

Workaround

What would help

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Discord channel lanes intermittently held after run completes — "lane wait exceeded" with stale per-session jsonl mtime [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Error Message

Fix Action

Workaround

Code Example

Symptom

Repro hypothesis

Environment

Repro steps (best-effort)

Workaround

What would help

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING