openclaw - 💡(How to fix) Fix Discord channel lanes intermittently held after run completes — "lane wait exceeded" with stale per-session jsonl mtime [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72421Fetched 2026-04-27 05:30:18
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

Error Message

Lane release path appears to miss a branch when a run completes via cron / heartbeat / system-event entry rather than a user-initiated reply. Multiple paths in the bundled dist do call sessionLock.release() (e.g. compact, wait-for-idle-before-flush, transcript-rewrite, selection), but at least one terminal path leaves the in-memory lock held without an exception/abort. Fresh user messages then pile up behind a lane that thinks it is still processing.

Fix Action

Workaround

cp ~/.openclaw/agents/main/sessions/sessions.json{,.bak}
python3 - <<'PY'
import json
p = '<sessions.json path>'
d = json.load(open(p))
del d['agent:main:discord:channel:<channel-id>']
json.dump(d, open(p,'w'), indent=2)
PY
openclaw gateway restart

I've shipped a detection-only watchdog on my side that scans the diagnostic log for lane wait exceeded and alerts to a Discord ops channel — happy to PR that as a CLI subcommand if useful.

Code Example

lane wait exceeded: lane=session:agent:main:discord:channel:<id> waitedMs=<>200000 queueAhead=0|1

---

cp ~/.openclaw/agents/main/sessions/sessions.json{,.bak}
python3 - <<'PY'
import json
p = '<sessions.json path>'
d = json.load(open(p))
del d['agent:main:discord:channel:<channel-id>']
json.dump(d, open(p,'w'), indent=2)
PY
openclaw gateway restart
RAW_BUFFERClick to expand / collapse

Symptom

User-sent Discord messages stop being processed in a channel even though the gateway is healthy and other channels work. Diagnostic log emits repeatedly:

lane wait exceeded: lane=session:agent:main:discord:channel:<id> waitedMs=<>200000 queueAhead=0|1

with no concurrent LLM activity (per-session <sessionId>.jsonl mtime is 5+ minutes stale, gateway CPU low). Manually deleting the session entry from agents/main/sessions/sessions.json and restarting unblocks the channel; new messages then process normally.

Repro hypothesis

Lane release path appears to miss a branch when a run completes via cron / heartbeat / system-event entry rather than a user-initiated reply. Multiple paths in the bundled dist do call sessionLock.release() (e.g. compact, wait-for-idle-before-flush, transcript-rewrite, selection), but at least one terminal path leaves the in-memory lock held without an exception/abort. Fresh user messages then pile up behind a lane that thinks it is still processing.

Concrete observation: in a 24h window we saw lane wait exceeded entries for 3 distinct Discord channels with queueAhead=0 (head-of-queue stuck) and waitedMs ranging 220s–600s, with the per-session jsonl untouched for the entire wait period.

Environment

  • OpenClaw 2026.4.24 (homebrew)
  • Node 25.9.0
  • macOS arm64
  • Discord plugin, multiple guilds/channels, several main-session crons feeding through

Repro steps (best-effort)

  1. Configure several payload.kind=systemEvent crons targeted at sessionTarget=main plus current-bound agentTurn jobs that share Discord channel session keys.
  2. Run for hours.
  3. Eventually one channel goes silent: queued user messages don't trigger a reply, gateway is otherwise healthy.

I have not been able to produce a tight unit-test repro — it appears to be a race between a run terminating and the lock-release callback in certain delivery paths.

Workaround

cp ~/.openclaw/agents/main/sessions/sessions.json{,.bak}
python3 - <<'PY'
import json
p = '<sessions.json path>'
d = json.load(open(p))
del d['agent:main:discord:channel:<channel-id>']
json.dump(d, open(p,'w'), indent=2)
PY
openclaw gateway restart

I've shipped a detection-only watchdog on my side that scans the diagnostic log for lane wait exceeded and alerts to a Discord ops channel — happy to PR that as a CLI subcommand if useful.

What would help

  1. Confirm whether sessionLock.release() is invoked from every terminal path of runAgentTurn / processSystemEvent — particularly the cron-delivered and heartbeat-delivered branches.
  2. Add a defensive watchdog inside the gateway: any lane held >N minutes with no in-flight LLM call should be auto-released (with a warning log).
  3. Expose lane state via openclaw status --lanes for easier diagnosis.

Happy to provide gateway logs / sessions.json snapshots privately if helpful.

extent analysis

TL;DR

Manually deleting the stuck session entry from sessions.json and restarting the gateway may temporarily resolve the issue, but a more permanent fix requires ensuring sessionLock.release() is called from every terminal path of runAgentTurn and processSystemEvent.

Guidance

  • Review the code paths of runAgentTurn and processSystemEvent to confirm that sessionLock.release() is invoked from every terminal branch, especially those triggered by cron or heartbeat events.
  • Consider implementing a watchdog mechanism to auto-release lanes held for an extended period (>N minutes) without in-flight LLM calls, logging a warning to indicate potential issues.
  • Expose lane state via openclaw status --lanes to facilitate easier diagnosis of stuck lanes.
  • Temporarily apply the provided workaround script to delete stuck session entries and restart the gateway until a permanent fix is implemented.

Example

No code snippet is provided as the issue requires a review of the existing codebase rather than introducing new code.

Notes

The root cause appears to be a race condition between run termination and lock-release callbacks in certain delivery paths. A thorough review of the code and the addition of a defensive watchdog are necessary to prevent regressions.

Recommendation

Apply the workaround script to temporarily resolve stuck lanes until a permanent fix is implemented, which should involve ensuring sessionLock.release() is called from every terminal path and adding a watchdog mechanism to prevent future occurrences.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING