openclaw - 💡(How to fix) Fix Gateway can deadlock on nested openclaw sessions tool call; diagnostics report recovery=none

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

A live Telegram agent turn wedged when a bash tool call ran:

openclaw sessions --agent dex --limit 10 --json

The command appears to query session state through the same gateway/session system that is executing the active tool call. The result was a blocked tool call that stalled the topic, survived normal restart drain attempts, and required systemd SIGKILL after the drain timeout.

Root Cause

A live Telegram agent turn wedged when a bash tool call ran:

openclaw sessions --agent dex --limit 10 --json

The command appears to query session state through the same gateway/session system that is executing the active tool call. The result was a blocked tool call that stalled the topic, survived normal restart drain attempts, and required systemd SIGKILL after the drain timeout.

Fix Action

Fix / Workaround

Local mitigation

Code Example

openclaw sessions --agent dex --limit 10 --json

---

OpenClaw 2026.5.14-beta.1 (cef4145)

---

openclaw-gateway.service - OpenClaw Gateway (v2026.5.14-beta.1)
node /home/clawadmin/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789

---

2026-05-14T19:01:03.584-06:00 [diagnostic] stalled session: sessionId=991d536d-8ea4-4058-b02a-a0cf45ed9f14 sessionKey=agent:main:telegram:group:-1003821464158:topic:4836 state=processing age=142s queueDepth=1 reason=blocked_tool_call classification=blocked_tool_call activeWorkKind=tool_call lastProgress=codex_app_server:notification:rawResponseItem/completed lastProgressAge=142s activeTool=bash activeToolCallId=exec-7c0d240d-fc1a-44c7-b98e-0c09f0aa9061 activeToolAge=147s terminalProgressStale=true recovery=none

---

2026-05-14T19:03:57.796-06:00 [gateway] draining 4 active task(s) and 2 active embedded run(s) before restart with timeout 300000ms
2026-05-14T19:04:27.798-06:00 [gateway] still draining 4 active task(s) and 2 active embedded run(s) before restart
openclaw-gateway.service: State 'stop-sigterm' timed out. Killing.
openclaw-gateway.service: Killing process 260558 (node) with signal SIGKILL.

---

2026-05-14T19:07:31.965-06:00 [diagnostic] stalled session: sessionId=991d536d-8ea4-4058-b02a-a0cf45ed9f14 sessionKey=agent:main:telegram:group:-1003821464158:topic:4836 state=processing age=134s queueDepth=1 reason=blocked_tool_call classification=blocked_tool_call activeWorkKind=tool_call lastProgress=codex_app_server:notification:item/completed lastProgressAge=134s activeTool=bash activeToolCallId=exec-7b8662e0-1940-4afd-93f4-bca1a71ca8bf activeToolAge=138s recovery=none
RAW_BUFFERClick to expand / collapse

Summary

A live Telegram agent turn wedged when a bash tool call ran:

openclaw sessions --agent dex --limit 10 --json

The command appears to query session state through the same gateway/session system that is executing the active tool call. The result was a blocked tool call that stalled the topic, survived normal restart drain attempts, and required systemd SIGKILL after the drain timeout.

Version

OpenClaw 2026.5.14-beta.1 (cef4145)

Gateway service:

openclaw-gateway.service - OpenClaw Gateway (v2026.5.14-beta.1)
node /home/clawadmin/.npm-global/lib/node_modules/openclaw/dist/index.js gateway --port 18789

Evidence

Diagnostics repeatedly detected the stuck state but had no recovery path:

2026-05-14T19:01:03.584-06:00 [diagnostic] stalled session: sessionId=991d536d-8ea4-4058-b02a-a0cf45ed9f14 sessionKey=agent:main:telegram:group:-1003821464158:topic:4836 state=processing age=142s queueDepth=1 reason=blocked_tool_call classification=blocked_tool_call activeWorkKind=tool_call lastProgress=codex_app_server:notification:rawResponseItem/completed lastProgressAge=142s activeTool=bash activeToolCallId=exec-7c0d240d-fc1a-44c7-b98e-0c09f0aa9061 activeToolAge=147s terminalProgressStale=true recovery=none

Restart attempted to drain active work instead of killing/reaping the stale tool call quickly:

2026-05-14T19:03:57.796-06:00 [gateway] draining 4 active task(s) and 2 active embedded run(s) before restart with timeout 300000ms
2026-05-14T19:04:27.798-06:00 [gateway] still draining 4 active task(s) and 2 active embedded run(s) before restart
openclaw-gateway.service: State 'stop-sigterm' timed out. Killing.
openclaw-gateway.service: Killing process 260558 (node) with signal SIGKILL.

The same pattern appeared again after restart:

2026-05-14T19:07:31.965-06:00 [diagnostic] stalled session: sessionId=991d536d-8ea4-4058-b02a-a0cf45ed9f14 sessionKey=agent:main:telegram:group:-1003821464158:topic:4836 state=processing age=134s queueDepth=1 reason=blocked_tool_call classification=blocked_tool_call activeWorkKind=tool_call lastProgress=codex_app_server:notification:item/completed lastProgressAge=134s activeTool=bash activeToolCallId=exec-7b8662e0-1940-4afd-93f4-bca1a71ca8bf activeToolAge=138s recovery=none

Expected behavior

  • Tool calls should have bounded timeout/cancel behavior.
  • Gateway restart should force-kill stale active tool calls after a short drain window.
  • If openclaw sessions is unsafe from inside an active agent turn, it should fail fast with a clear diagnostic.
  • If diagnostics can classify blocked_tool_call, recovery should not be none.

Actual behavior

  • The topic remained stuck waiting for bash.
  • Bash was waiting on openclaw sessions.
  • Gateway restart tried to drain the stuck work and only recovered after systemd killed the process.
  • The diagnostic correctly identified blocked_tool_call but had no recovery path.

Local mitigation

We added a local wrapper guard that blocks only openclaw sessions... from live agent/tool-call ancestry and logs the block. This is a temporary circuit breaker, not a product fix. Normal openclaw --version still works, and openclaw sessions --help works when the local guard is disabled.

Suggested fixes

  1. Add hard timeout/cancellation around tool calls.
  2. Make openclaw sessions safe from active turns or explicitly reject it in that context.
  3. On restart, force-reap stale tool calls after a short drain period.
  4. Add recovery behavior for classification=blocked_tool_call instead of recovery=none.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

  • Tool calls should have bounded timeout/cancel behavior.
  • Gateway restart should force-kill stale active tool calls after a short drain window.
  • If openclaw sessions is unsafe from inside an active agent turn, it should fail fast with a clear diagnostic.
  • If diagnostics can classify blocked_tool_call, recovery should not be none.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Gateway can deadlock on nested openclaw sessions tool call; diagnostics report recovery=none