openclaw - 💡(How to fix) Fix [Bug]: QQBot session stuck in running state after subagent spawn failure/timeout [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63478Fetched 2026-04-09 07:53:16
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

QQBot channel sessions get permanently stuck in running state when spawned subagents fail or timeout. The parent session never transitions back to done/idle, making it completely unresponsive to new messages. This requires manual intervention (deleting transcript or editing sessions.json) to recover.

This is the same root cause as #63440 but observed specifically on the QQBot channel with subagent spawn failures rather than direct LLM timeouts.

Root Cause

This is the same root cause as #63440 but observed specifically on the QQBot channel with subagent spawn failures rather than direct LLM timeouts.

Fix Action

Fix / Workaround

Workarounds

Workaround 1: Edit sessions.json (from #63440)

Manually set status: "done" and abortedLastRun: true in sessions.json, then restart gateway. Preserves conversation history.

Workaround 2: Delete transcript

Delete the session .jsonl transcript file and restart gateway. Loses conversation history.

Code Example

{
  "status": "running",
  "outputTokens": 37,
  "abortedLastRun": false,
  "contextTokens": 200000
}
RAW_BUFFERClick to expand / collapse

Bug type

Regression / Unhandled edge case

Summary

QQBot channel sessions get permanently stuck in running state when spawned subagents fail or timeout. The parent session never transitions back to done/idle, making it completely unresponsive to new messages. This requires manual intervention (deleting transcript or editing sessions.json) to recover.

This is the same root cause as #63440 but observed specifically on the QQBot channel with subagent spawn failures rather than direct LLM timeouts.

Environment

  • OpenClaw Version: 2026.4.5 → 2026.4.8 (9ece252) — issue persists across both versions
  • OS: Windows 10 (x64)
  • Node: v24.14.0
  • Channel: QQBot (DM, appId configured, regular messages work)
  • Model: agent.cc/claude-opus-4-6 (primary), agent/MiniMax-M2.7-highspeed (fallback)

Steps to Reproduce

  1. Configure QQBot channel with a working bot
  2. Send complex tasks that trigger subagent spawns (e.g., code generation, multi-step analysis)
  3. Wait for a subagent to fail or timeout (e.g., model timeout, context overflow)
  4. Observe the parent QQBot session status remains running permanently
  5. Send new messages — no response

Evidence from sessions.json

{
  "status": "running",
  "outputTokens": 37,
  "abortedLastRun": false,
  "contextTokens": 200000
}

19 subagents spawned from the QQBot session:

  • 14 done
  • 4 failed
  • 1 timeout

The parent session never recovered after the failed/timed-out subagents.

Expected Behavior

When a subagent fails or times out, the parent session should:

  1. Transition back to done/idle state
  2. Set abortedLastRun: true
  3. Be ready to accept new inbound messages

Workarounds

Workaround 1: Edit sessions.json (from #63440)

Manually set status: "done" and abortedLastRun: true in sessions.json, then restart gateway. Preserves conversation history.

Workaround 2: Delete transcript

Delete the session .jsonl transcript file and restart gateway. Loses conversation history.

Workaround 3: Automated watchdog (what we use)

A cron job that checks every 3 minutes for sessions stuck in running with no token progress, and automatically resets the status via sessions.json edit + gateway restart.

Impact

This makes QQBot (and likely any channel) unusable for complex tasks that involve subagent spawning. The session becomes permanently dead until manual intervention. For QQBot users in China where this is a primary messaging channel, this is a significant reliability issue.

Frequency

Occurs multiple times per day with moderate usage. In our case, 4 out of 19 subagent spawns resulted in failures that could trigger this bug.

Related Issues

  • #63440 — Same root cause (session stuck in running), reported for Telegram + MiniMax timeout
  • #60265 — sessions_spawn fails with "pairing required" (related subagent spawn failure mode)

extent analysis

TL;DR

Manually editing the sessions.json file to set status to "done" and abortedLastRun to true, then restarting the gateway, can temporarily resolve the issue of QQBot channel sessions getting stuck in the running state.

Guidance

  • Identify and isolate the stuck sessions by checking the sessions.json file for sessions with a status of "running" and no recent token progress.
  • Apply one of the provided workarounds: manually editing sessions.json, deleting the transcript file, or implementing an automated watchdog script to periodically check for and reset stuck sessions.
  • Consider monitoring the frequency and impact of subagent failures and timeouts to determine the root cause and potential long-term solutions.
  • Review related issues, such as #63440 and #60265, to understand the broader context and potential connections to the current problem.

Example

// Example of manually editing sessions.json to resolve a stuck session
{
  "status": "done",
  "outputTokens": 37,
  "abortedLastRun": true,
  "contextTokens": 200000
}

Notes

The provided workarounds can help mitigate the issue, but a more permanent solution may require addressing the underlying causes of subagent failures and timeouts. The frequency and impact of these issues should be carefully monitored to determine the best course of action.

Recommendation

Apply Workaround 1: Edit sessions.json, as it preserves conversation history and can be implemented immediately to resolve the issue. This workaround allows for temporary resolution while a more permanent solution is developed to address the root cause of subagent failures and timeouts.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING