openclaw - 💡(How to fix) Fix Subagent Session Timeout and Unresponsiveness in Production Use [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58649Fetched 2026-04-08 01:59:45
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Subagent sessions (mode: "session" + thread: true + timeoutSeconds: 0) consistently become unresponsive after periods of inactivity, making them unsuitable for long-running agent workflows. Sessions that should persist indefinitely instead timeout or enter done state, requiring repeated respawning.

Error Message

| "terminated" error | 30% | Concurrent subagent operations |

  • #4355 - Sub-agents terminate prematurely with 'terminated' error
  1. Improve error messages - Surface actual lock timeout vs generic "terminated"

Root Cause

Root Cause Analysis (Suspected)

Fix Action

Fix / Workaround

Workarounds Attempted

WorkaroundResult
Reduce maxConcurrent to 2No improvement
Set timeoutSeconds: 0No improvement
Set idleTimeoutMs: 0No improvement
Gateway restartTemporary fix, issue recurs

Code Example

{
  "mode": "session",
  "thread": true,
  "timeoutSeconds": 0,
  "idleTimeoutMs": 0
}

---

sessions_spawn({
  task: "...",
  mode: "session",
  thread: true,
  timeoutSeconds: 0,
  agentId: "..."
})
RAW_BUFFERClick to expand / collapse

Summary

Subagent sessions (mode: "session" + thread: true + timeoutSeconds: 0) consistently become unresponsive after periods of inactivity, making them unsuitable for long-running agent workflows. Sessions that should persist indefinitely instead timeout or enter done state, requiring repeated respawning.

Environment

  • OpenClaw Version: 2026.3.x (latest)
  • Node.js: v22.22.0
  • OS: macOS Darwin 25.3.0 (arm64)
  • Install Method: npm global
  • Configuration: agents.defaults.subagents.maxConcurrent: 2

Configuration Attempted

{
  "mode": "session",
  "thread": true,
  "timeoutSeconds": 0,
  "idleTimeoutMs": 0
}

Expected: Session persists indefinitely, accepting new messages via thread Actual: Session becomes unresponsive, status shows done, messages timeout

Reproduction Steps

  1. Spawn subagent with persistent session configuration:
sessions_spawn({
  task: "...",
  mode: "session",
  thread: true,
  timeoutSeconds: 0,
  agentId: "..."
})
  1. Subagent completes initial task successfully
  2. Wait 2-24 hours (or immediate in some cases)
  3. Send message to subagent session via sessions_send
  4. Result: Timeout, no response
  5. Check sessions_list: Status shows done despite timeoutSeconds: 0

Symptoms Observed

SymptomFrequencyContext
sessions_send timeout100%After subagent idle period
Session status done100%Despite timeoutSeconds: 0
Spawn fails after multiple attempts50%Gateway cache issues
"terminated" error30%Concurrent subagent operations

Related Issues

This appears to be related to or a regression of:

  • #41155 - Subagent announce timeout regression in 2026.3.8
  • #4355 - Sub-agents terminate prematurely with 'terminated' error
  • #7666 - Sub-agent announce fails with gateway timeout

Root Cause Analysis (Suspected)

Based on community research and logs:

  1. Session Write Lock Timeout (10s default) - Multiple subagents block each other
  2. Gateway RPC Timeout (10s) - Spawn calls timeout
  3. Command Lane Contention - Double-nested serialization through global + session lanes
  4. Idle Timeout Not Honored - idleTimeoutMs: 0 not preventing session termination

Workarounds Attempted

WorkaroundResult
Reduce maxConcurrent to 2No improvement
Set timeoutSeconds: 0No improvement
Set idleTimeoutMs: 0No improvement
Gateway restartTemporary fix, issue recurs

Impact

Subagent architecture currently unusable for production multi-agent workflows. Forced to fall back to single-agent (manager-direct) mode, losing the benefits of:

  • Parallel task execution
  • Specialized agent roles
  • Session persistence
  • Async background processing

Suggested Fixes

  1. Honor timeoutSeconds: 0 and idleTimeoutMs: 0 - Sessions should truly persist
  2. Increase session lock timeout - Make configurable, default 60s+
  3. Isolate subagent session files - Prevent lock contention
  4. Improve error messages - Surface actual lock timeout vs generic "terminated"
  5. State synchronization - Ensure sessions_list reflects actual session state

Labels

bug, subagent, session, timeout, regression

extent analysis

TL;DR

  • Implementing a fix to honor timeoutSeconds: 0 and idleTimeoutMs: 0 may resolve the subagent session unresponsiveness issue.

Guidance

  • Verify that the timeoutSeconds: 0 and idleTimeoutMs: 0 configuration is correctly applied to the subagent sessions.
  • Investigate the session write lock timeout and consider increasing it to a higher value (e.g., 60s) to prevent multiple subagents from blocking each other.
  • Check for any gateway RPC timeout issues and consider adjusting the timeout value to prevent spawn calls from timing out.
  • Review the command lane contention and consider implementing a solution to prevent double-nested serialization through global and session lanes.

Example

  • No code snippet is provided as the issue does not explicitly mention a specific code-related fix.

Notes

  • The issue appears to be related to a regression in the OpenClaw version 2026.3.x, and the suggested fixes may need to be implemented in the underlying codebase.
  • The workarounds attempted so far have not shown significant improvement, and a more thorough investigation of the root cause is necessary.

Recommendation

  • Apply workaround: Implement a fix to honor timeoutSeconds: 0 and idleTimeoutMs: 0 to prevent subagent sessions from becoming unresponsive.
  • Reason: This fix addresses the suspected root cause of the issue and may resolve the problem without requiring a full upgrade or significant code changes.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING