openclaw - 💡(How to fix) Fix Gateway should enforce runTimeoutSeconds and emit terminal child.timeout event

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

Any multi-lane orchestration is fragile under transient failures. Parent agents cannot make progress decisions because the contract they were given (timeout) is not honored.

Fix Action

Workaround

Today the only recovery is gateway restart. A subagents action='kill' would be the manual escape hatch (filed separately).

RAW_BUFFERClick to expand / collapse

Problem

When sessions_spawn is called with runTimeoutSeconds, the value is accepted but does not appear to be enforced as a hard release signal. Today (2026-05-27) two subagent runs in the same workspace stalled past their declared timeouts:

  • TSK-20260527-0006 tester subagent: 1800s timeout, still showed status='active (waiting on 1 child)' over an hour past the deadline.
  • TSK-20260527-0010 v1 coder subagent: 1800s timeout, no fs activity for 90 minutes, parent orchestrator never received any completion event.

The parent was parked via sessions_yield expecting a push-based completion event. None arrived. Recovery required a full gateway restart.

Expected behavior

At exactly runTimeoutSeconds after spawn, the gateway should:

  1. Mark the child run as failed (reason=timeout).
  2. Emit a synthetic completion event to the parent session so sessions_yield unparks.
  3. Free the slot in the active subagents list.

This should happen regardless of whether the underlying agent process is still alive — runTimeoutSeconds is a contract with the parent, not a hint to the child.

Repro sketch

  1. Spawn a child via sessions_spawn with runTimeoutSeconds: 60.
  2. Have the child intentionally hang (e.g. infinite sleep loop on a tool call) or simulate a model-side drop.
  3. Parent calls sessions_yield.
  4. Observe: parent remains parked indefinitely; subagents action=list still reports the child as active well past 60s.

Workaround

Today the only recovery is gateway restart. A subagents action='kill' would be the manual escape hatch (filed separately).

Impact

Any multi-lane orchestration is fragile under transient failures. Parent agents cannot make progress decisions because the contract they were given (timeout) is not honored.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

At exactly runTimeoutSeconds after spawn, the gateway should:

  1. Mark the child run as failed (reason=timeout).
  2. Emit a synthetic completion event to the parent session so sessions_yield unparks.
  3. Free the slot in the active subagents list.

This should happen regardless of whether the underlying agent process is still alive — runTimeoutSeconds is a contract with the parent, not a hint to the child.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING