openclaw - 💡(How to fix) Fix [Bug]: Gateway deadlocks all sessions and channels when ACP/opencode quota is exhausted [2 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68823Fetched 2026-04-19 15:07:02
View on GitHub
Comments
2
Participants
1
Timeline
2
Reactions
0
Participants
Timeline (top)
commented ×2

When an ACP (opencode) subagent call exhausts its provider quota, the gateway process acquires multiple session .jsonl.lock files but never releases them, deadlocking all sessions across all channels (Telegram, WeChat, WhatsApp), requiring manual lock deletion + gateway restart to recover.

Error Message

  • Lock should be released on API error (try/finally or equivalent)
  • Error entry should be written to the session transcript
  • User should receive an error message (e.g., "Model API quota exceeded, try again later")
  • Session should remain functional after the error

Root Cause

This is related to issue #18060. That report covers a single-session lock from an Anthropic quota failure. This report extends the scope: an ACP/opencode quota failure can cascade into multiple locks held by the same gateway PID, taking down the entire gateway process across all channels. Same root cause (lock not released on API failure), but wider blast radius.

Code Example

~/.openclaw/agents/main/sessions/5e488226-43fe-459b-97f3-161068e4f0e5.jsonl.lock  pid=78263 (alive) age=9m26s
~/.openclaw/agents/main/sessions/9e1150f3-680a-4fef-bc1e-efdca23d675b.jsonl.lock  pid=78263 (alive) age=22m2s
~/.openclaw/agents/main/sessions/f9e06db9-3416-4ad9-af79-e89f37eed7a1.jsonl.lock  pid=78263 (alive) age=23m54s

---

~/.openclaw/agents/main/sessions/5e488226-43fe-459b-97f3-161068e4f0e5.jsonl.lock │ pid=78263 (alive) age=9m26s stale=no
~/.openclaw/agents/main/sessions/9e1150f3-680a-4fef-bc1e-efdca23d675b.jsonl.lock │ pid=78263 (alive) age=22m2s stale=no
~/.openclaw/agents/main/sessions/f9e06db9-3416-4ad9-af79-e89f37eed7a1.jsonl.lock │ pid=78263 (alive) age=23m54s stale=no
RAW_BUFFERClick to expand / collapse

Summary

When an ACP (opencode) subagent call exhausts its provider quota, the gateway process acquires multiple session .jsonl.lock files but never releases them, deadlocking all sessions across all channels (Telegram, WeChat, WhatsApp), requiring manual lock deletion + gateway restart to recover.

Steps to reproduce

  1. Run an ACP task via sessions_spawn(runtime="acp", agentId="opencode") with a provider that has a quota limit (e.g., glm-5.1 via Bailian)
  2. Send messages until the provider quota is exceeded mid-API-call
  3. The opencode process reports quota exhaustion, but OpenClaw gateway never receives a response
  4. Multiple session .jsonl.lock files are created and never released
  5. Observe all sessions deadlocked on the stale locks — /reset and /new commands are unresponsive
  6. All channels (Telegram, WeChat, WhatsApp) stop processing messages

Expected behavior

  • Lock should be released on API error (try/finally or equivalent)
  • Error entry should be written to the session transcript
  • User should receive an error message (e.g., "Model API quota exceeded, try again later")
  • Session should remain functional after the error
  • At minimum, gateway restart should clean stale locks automatically

Actual behavior

Gateway process (PID 78263) held three .jsonl.lock files simultaneously:

~/.openclaw/agents/main/sessions/5e488226-43fe-459b-97f3-161068e4f0e5.jsonl.lock  pid=78263 (alive) age=9m26s
~/.openclaw/agents/main/sessions/9e1150f3-680a-4fef-bc1e-efdca23d675b.jsonl.lock  pid=78263 (alive) age=22m2s
~/.openclaw/agents/main/sessions/f9e06db9-3416-4ad9-af79-e89f37eed7a1.jsonl.lock  pid=78263 (alive) age=23m54s

All sessions across all channels were deadlocked. /reset and /new commands were completely unresponsive. Recovery required manual deletion of all 3 .lock files followed by openclaw gateway restart.

OpenClaw version

2026.4.15 (041266a)

Operating system

macOS 15.6 (Darwin 24.6.0, x64)

Install method

npm global

Model

opencode-go/glm-5.1 (via Bailian API)

Provider / routing chain

openclaw → opencode (ACP harness) → Bailian API (glm-5.1)

Additional provider/model setup details

  • Running with runtime="acp" + agentId="opencode" for coding tasks
  • ACP session spawned via sessions_spawn with streamTo="parent"
  • Quota exhaustion reported by opencode's own UI/interface, but OpenClaw gateway never received the failure response back

Logs, screenshots, and evidence

Lock file evidence (from recovery notes, 2026-04-19):

~/.openclaw/agents/main/sessions/5e488226-43fe-459b-97f3-161068e4f0e5.jsonl.lock │ pid=78263 (alive) age=9m26s stale=no
~/.openclaw/agents/main/sessions/9e1150f3-680a-4fef-bc1e-efdca23d675b.jsonl.lock │ pid=78263 (alive) age=22m2s stale=no
~/.openclaw/agents/main/sessions/f9e06db9-3416-4ad9-af79-e89f37eed7a1.jsonl.lock │ pid=78263 (alive) age=23m54s stale=no

All three locks held by same PID (gateway process). Locks grew progressively older (9m → 23m) as subsequent sessions tried to acquire and blocked.

Impact and severity

  • Affected: All sessions and all channels simultaneously (Telegram, WeChat, WhatsApp)
  • Severity: Critical — complete gateway hang, no messaging recovery without manual intervention
  • Frequency: 2/2 observed occurrences (also happened 1–2 weeks prior)
  • Consequence: All inbound messages are blocked until restart; time-sensitive channels (investment monitoring, cron jobs) fail silently

This is related to issue #18060. That report covers a single-session lock from an Anthropic quota failure. This report extends the scope: an ACP/opencode quota failure can cascade into multiple locks held by the same gateway PID, taking down the entire gateway process across all channels. Same root cause (lock not released on API failure), but wider blast radius.

extent analysis

TL;DR

The OpenClaw gateway process fails to release .jsonl.lock files after an ACP subagent call exhausts its provider quota, causing a deadlock across all sessions and channels, and can be mitigated by ensuring locks are released on API errors.

Guidance

  • Review the OpenClaw gateway's error handling mechanism to ensure that locks are released when an API error occurs, such as quota exhaustion.
  • Implement a try/finally block or equivalent to guarantee lock release, even if an exception is thrown.
  • Consider adding a timeout or retry mechanism to handle transient API errors and prevent lock accumulation.
  • Verify that the gateway process properly cleans up stale locks during restart to prevent similar deadlocks in the future.

Example

try:
    # ACP subagent call
    response = opencode_call()
except APIError as e:
    # Release lock on API error
    release_lock()
    # Log error and notify user
    log_error(e)
    send_error_message("Model API quota exceeded, try again later")
finally:
    # Ensure lock is released regardless of outcome
    release_lock()

Notes

This issue is related to #18060, which reported a single-session lock issue due to an Anthropic quota failure. The current issue extends the scope to multiple locks held by the same gateway PID, caused by an ACP/opencode quota failure.

Recommendation

Apply a workaround to ensure locks are released on API errors, such as implementing a try/finally block, until a permanent fix is available in a future OpenClaw version.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Lock should be released on API error (try/finally or equivalent)
  • Error entry should be written to the session transcript
  • User should receive an error message (e.g., "Model API quota exceeded, try again later")
  • Session should remain functional after the error
  • At minimum, gateway restart should clean stale locks automatically

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Gateway deadlocks all sessions and channels when ACP/opencode quota is exhausted [2 comments, 1 participants]