openclaw - 💡(How to fix) Fix [Bug]: Session stuck after gateway restart during tool-use loop — stale lock not recovered [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70555Fetched 2026-04-24 05:56:34
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×2commented ×1cross-referenced ×1

After a gateway restart triggered by a config change, a session in an active tool-use loop is not resumed — the session remains permanently stuck in "running" state with tool results persisted but no follow-up assistant message generated.

Root Cause

After a gateway restart triggered by a config change, a session in an active tool-use loop is not resumed — the session remains permanently stuck in "running" state with tool results persisted but no follow-up assistant message generated.

Fix Action

Fix / Workaround

Current workaround: Manually reset the session or send a new message to trigger recovery. Neither is ideal for production use.

Code Example

Gateway log snippet:
16:37:51.889 [reload] config change requires gateway restart (plugins.entries.tavily.enabled) — deferring until 4 operation(s), 2 reply(ies), 2 embedded run(s) complete
16:38:42.780 [gateway] removed stale session lock: /Users/maidou/.openclaw/agents/main/sessions/0a7cc9e9-...jsonl.lock (dead-pid)

Session state after restart:
{
  "status": "running",
  "abortedLastRun": false,
  "updatedAt": 1776846975307,
  "totalTokens": 89694
}

Timeline:
- 16:36:50Assistant sends last tool calls (2 × web_search)
- 16:37:50Both tool results returned
- 16:37:51Config change detected → gateway restart queued
- 16:38:42Gateway completes restart, stale lock removed
- After that — No further assistant messages (stuck 18+ hours)
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

After a gateway restart triggered by a config change, a session in an active tool-use loop is not resumed — the session remains permanently stuck in "running" state with tool results persisted but no follow-up assistant message generated.

Steps to reproduce

  1. Agent is in a tool-use loop with parallel web_search calls
  2. While tool results are returning, a config change triggers gateway restart (plugins.entries.tavily.enabled)
  3. Gateway restarts, stale session lock is cleaned as dead-pid: removed stale session lock: ...jsonl.lock (dead-pid)
  4. Tool results are persisted to the session transcript
  5. No follow-up assistant message is generated; session remains status: "running" indefinitely

Expected behavior

After gateway restart, the session should:

  1. Detect that tool results are available but no assistant follow-up exists
  2. Resume the tool-use loop by sending the tool results + context to the model
  3. Generate the assistant's follow-up response

Actual behavior

The session remains in "running" state with stale data. No recovery occurs. The session is effectively dead until manually reset. User sees "当前还在忙,你的新消息已经排队,上一条完成后我马上继续。 (in english:I'm currently busy. Your new message is already in the queue. I'll get back to you right after finishing the previous one.)" and cannot get a response.

OpenClaw version

2026.4.14 (323493f)

Operating system

macOS 25.3.0 (arm64)

Install method

npm global

Model

bailian/qwen3.6-plus

Provider / routing chain

openclaw -> DashScope compatible-mode API -> bailian/qwen3.6-plus

Additional provider/model setup details

Channel: dingtalk-connector (DM session) Model accessed via DashScope compatible-mode API.

Logs, screenshots, and evidence

Gateway log snippet:
16:37:51.889 [reload] config change requires gateway restart (plugins.entries.tavily.enabled) — deferring until 4 operation(s), 2 reply(ies), 2 embedded run(s) complete
16:38:42.780 [gateway] removed stale session lock: /Users/maidou/.openclaw/agents/main/sessions/0a7cc9e9-...jsonl.lock (dead-pid)

Session state after restart:
{
  "status": "running",
  "abortedLastRun": false,
  "updatedAt": 1776846975307,
  "totalTokens": 89694
}

Timeline:
- 16:36:50 — Assistant sends last tool calls (2 × web_search)
- 16:37:50 — Both tool results returned
- 16:37:51 — Config change detected → gateway restart queued
- 16:38:42 — Gateway completes restart, stale lock removed
- After that — No further assistant messages (stuck 18+ hours)

Impact and severity

Affected: DingTalk DM users on OpenClaw 2026.4.14 Severity: High — blocks all interaction in the affected session; agent appears permanently busy Frequency: Reproduced in this instance; occurs when gateway restart happens during an active tool-use loop Consequence: Users see "当前还在忙,你的新消息已经排队,上一条完成后我马上继续。(In Enligsh,such as: I'm currently busy. Your new message is already in the queue. I'll get back to you right after finishing the previous one.)", session context accumulates tokens without progress, work is lost unless session is manually reset

Additional information

Suggested fix:

  1. On startup, scan all "running" sessions for incomplete tool-use loops
  2. If tool results exist but no follow-up assistant message, resume the loop
  3. Add a timeout mechanism: if a session has been "running" for > N minutes without progress, flag it for recovery or notify the user

Current workaround: Manually reset the session or send a new message to trigger recovery. Neither is ideal for production use.

extent analysis

TL;DR

Implement a mechanism to scan and resume incomplete tool-use loops after a gateway restart to prevent sessions from getting stuck in a "running" state.

Guidance

  • Identify sessions in a "running" state after a gateway restart and check for incomplete tool-use loops by verifying the presence of tool results without a follow-up assistant message.
  • Resume the tool-use loop by sending the tool results and context to the model to generate the assistant's follow-up response.
  • Consider implementing a timeout mechanism to flag sessions for recovery or notify the user if a session has been "running" for an extended period without progress.
  • Review the suggested fix provided in the issue for a potential implementation approach.

Example

No code snippet is provided as the issue does not contain sufficient implementation details.

Notes

The provided guidance is based on the information given in the issue and may require adjustments based on the specific implementation and requirements of the OpenClaw system.

Recommendation

Apply the suggested workaround of manually resetting the session or sending a new message to trigger recovery until a permanent fix can be implemented, as upgrading to a fixed version is not mentioned as an option in the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After gateway restart, the session should:

  1. Detect that tool results are available but no assistant follow-up exists
  2. Resume the tool-use loop by sending the tool results + context to the model
  3. Generate the assistant's follow-up response

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING