openclaw - 💡(How to fix) Fix [Bug]: Subagent killed by LLM provider streaming idle timeout (~60s) — no retry or graceful handling [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59946Fetched 2026-04-08 02:38:26
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Timeline (top)
commented ×1

Error Message

The subagent dies without any error message, retry, or notification to the parent session. From the parent's perspective, the subagent simply "timed out" even though runTimeoutSeconds was set much higher (e.g., 900s).

Fix Action

Workaround

Currently splitting tasks into smaller chunks to ensure continuous token output within the idle window. This works but limits the complexity of tasks that can be delegated to subagents.

RAW_BUFFERClick to expand / collapse

Problem

When a subagent is processing a task that requires extended thinking (e.g., reading large codebases, complex analysis), the LLM provider's streaming connection may have an idle timeout (e.g., ~60s for GitHub Copilot API). If the model is thinking and produces no tokens within this window, the connection is silently severed.

The subagent dies without any error message, retry, or notification to the parent session. From the parent's perspective, the subagent simply "timed out" even though runTimeoutSeconds was set much higher (e.g., 900s).

Observed Behavior

  • Subagent starts processing a complex task
  • Model enters extended thinking/reasoning phase (>60s without token output)
  • LLM provider drops the streaming connection due to idle timeout
  • Subagent session ends with timed out status
  • No retry attempt is made
  • Parent session receives a completion event with truncated output

Evidence

Measured across multiple runs:

  • Failed runs: max gap between streaming events = 60.9s (connection dropped)
  • Successful runs: events are continuous with no gaps >30s
  • runTimeoutSeconds: 900 has no effect — the kill comes from the provider, not OpenClaw

Expected Behavior

  1. Detection: OpenClaw should distinguish between runTimeoutSeconds expiry and provider-level connection drops
  2. Retry: On provider connection drop, retry the request (at least once) before declaring failure
  3. Transparency: The completion event should indicate the failure reason ("provider connection dropped" vs "run timeout exceeded")
  4. Workaround config: Allow configuring provider-level keepalive or heartbeat tokens to prevent idle disconnection

Environment

  • OpenClaw: v2026.4.2
  • Provider: GitHub Copilot API (but likely affects any provider with streaming idle timeouts)
  • Model: claude-opus-4.6 via custom-localhost proxy

Related Issues

  • #44925 — Subagent completion silently lost
  • #58786 — Subagent announce timeout destabilize gateway
  • #53202 — Subagent announce hits repeated gateway timeouts in cron

Workaround

Currently splitting tasks into smaller chunks to ensure continuous token output within the idle window. This works but limits the complexity of tasks that can be delegated to subagents.

extent analysis

TL;DR

Implement a retry mechanism for subagent requests when the LLM provider's streaming connection is dropped due to idle timeout.

Guidance

  • Investigate the LLM provider's API to see if there's an option to configure a keepalive or heartbeat mechanism to prevent idle disconnections.
  • Modify the subagent to detect when a connection drop occurs and retry the request at least once before declaring failure.
  • Consider implementing a more sophisticated retry strategy, such as exponential backoff, to handle repeated connection drops.
  • Review the OpenClaw configuration to ensure that the runTimeoutSeconds setting is properly propagated to the subagent and that it's not being overridden by a default value.

Example

No code example is provided as the issue does not contain sufficient information about the implementation details.

Notes

The provided workaround of splitting tasks into smaller chunks can help mitigate the issue, but it may not be a scalable solution for complex tasks. A more robust solution would involve implementing a retry mechanism and configuring the LLM provider to prevent idle disconnections.

Recommendation

Apply a workaround by implementing a retry mechanism for subagent requests, as the root cause of the issue is the LLM provider's idle timeout, and a fix would require changes to the OpenClaw implementation or the LLM provider's API.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING