openclaw - 💡(How to fix) Fix Aborted subagent with empty content silently marked `done`, never auto-announces — parent black-holes [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72293Fetched 2026-04-27 05:31:58
View on GitHub
Comments
1
Participants
2
Timeline
1
Reactions
0
Author
Timeline (top)
commented ×1

When a subagent's underlying inference call gets aborted mid-response (Codex AbortError, OpenAI Responses API timeout, network drop, etc.), the subagent emits an empty assistant message (content: []) with stopReason: "aborted" and errorMessage: "This operation was aborted". The gateway then marks the run as status: done with abortedLastRun: false, and the auto-announce queue silently drops the empty result. The parent agent waits indefinitely for an announcement that never comes.

Error Message

20:28:01 AST Spawn subagent (Codex GPT-5.5, mode=run) 20:44:21 AST Last successful tool call (exec) 20:44:22.570 Codex emits openclaw:prompt-error error: "This operation was aborted | This operation was aborted" 20:44:22.614 Final assistant message: { content: [], // <-- empty stopReason: "aborted", errorMessage: "This operation was aborted", usage: { totalTokens: 0 } } 20:44:24.876 Gateway: status=done, abortedLastRun=false Auto-announce: silently swallowed (no content to relay) Parent: waits forever, no completion event arrives

Root Cause

When a subagent's underlying inference call gets aborted mid-response (Codex AbortError, OpenAI Responses API timeout, network drop, etc.), the subagent emits an empty assistant message (content: []) with stopReason: "aborted" and errorMessage: "This operation was aborted". The gateway then marks the run as status: done with abortedLastRun: false, and the auto-announce queue silently drops the empty result. The parent agent waits indefinitely for an announcement that never comes.

Fix Action

Workaround

Parent must poll sessions_list with kind=subagent and check for stale "done" entries with empty output, instead of trusting that auto-announce will fire. We're adding a heartbeat-time watchdog on our side, but this defeats the purpose of the announce mechanism.

Code Example

20:28:01 AST  Spawn subagent (Codex GPT-5.5, mode=run)
20:44:21 AST  Last successful tool call (exec)
20:44:22.570  Codex emits openclaw:prompt-error
              error: "This operation was aborted | This operation was aborted"
20:44:22.614  Final assistant message:
              {
                content: [],            // <-- empty
                stopReason: "aborted",
                errorMessage: "This operation was aborted",
                usage: { totalTokens: 0 }
              }
20:44:24.876  Gateway: status=done, abortedLastRun=false
              Auto-announce: silently swallowed (no content to relay)
              Parent: waits forever, no completion event arrives
RAW_BUFFERClick to expand / collapse

Summary

When a subagent's underlying inference call gets aborted mid-response (Codex AbortError, OpenAI Responses API timeout, network drop, etc.), the subagent emits an empty assistant message (content: []) with stopReason: "aborted" and errorMessage: "This operation was aborted". The gateway then marks the run as status: done with abortedLastRun: false, and the auto-announce queue silently drops the empty result. The parent agent waits indefinitely for an announcement that never comes.

Reproduction

  1. Spawn a long-running subagent with mode: "run" and runtime: "subagent" that does heavy reasoning over many tool calls (in our case Codex GPT-5.5 doing 16 minutes of upstream-source archaeology)
  2. Wait for the underlying inference call to hit its provider-internal timeout (Codex's per-response timeout, distinct from runTimeoutSeconds)
  3. Observe the subagent's transcript: final entry is an openclaw:prompt-error custom event followed by an empty content: [] assistant message
  4. The gateway flags status: done, but no completion event is delivered to the parent

Forensic timeline (real example, 2026-04-25)

20:28:01 AST  Spawn subagent (Codex GPT-5.5, mode=run)
20:44:21 AST  Last successful tool call (exec)
20:44:22.570  Codex emits openclaw:prompt-error
              error: "This operation was aborted | This operation was aborted"
20:44:22.614  Final assistant message:
              {
                content: [],            // <-- empty
                stopReason: "aborted",
                errorMessage: "This operation was aborted",
                usage: { totalTokens: 0 }
              }
20:44:24.876  Gateway: status=done, abortedLastRun=false
              Auto-announce: silently swallowed (no content to relay)
              Parent: waits forever, no completion event arrives

Why it's a bug

Two distinct gateway invariants are violated:

  1. abortedLastRun: false is wrong when the assistant message has stopReason: "aborted" and errorMessage: "This operation was aborted". The assistant message itself reports the abort; the gateway should propagate that into abortedLastRun: true (or a richer status like aborted).
  2. Silent announce drop: the auto-announce queue should not produce a no-op for an aborted-with-empty-content terminal state. Either deliver a failure announcement ("subagent X aborted at step Y, no result available") or surface as a parent-visible error.

Either fix would prevent the parent from black-holing.

Suggested fix sketch

In the subagent runtime path (something like subagent-announce.ts / pi-embedded-runner/run/attempt.ts), when the final assistant message has both stopReason: "aborted" and zero output tokens:

  • Set runRecord.status = "aborted" (not "done")
  • Set abortedLastRun = true
  • Either:
    • (a) Auto-retry once with a fresh inference call, or
    • (b) Synthesize a completion event with status=failed and the error message, so the parent can act on it

Workaround

Parent must poll sessions_list with kind=subagent and check for stale "done" entries with empty output, instead of trusting that auto-announce will fire. We're adding a heartbeat-time watchdog on our side, but this defeats the purpose of the announce mechanism.

Environment

  • OpenClaw 2026.4.24
  • Subagent: Codex openai-codex/gpt-5.5 (free via Pro subscription)
  • Mode: run, runtime: subagent
  • runTimeoutSeconds: 18000 global default (so OpenClaw timeout was NOT the trigger — the abort came from inside the inference layer)

extent analysis

TL;DR

The most likely fix involves updating the subagent runtime to correctly handle aborted inference calls by setting runRecord.status to "aborted" and abortedLastRun to true, and either auto-retrying or synthesizing a completion event with a failure status.

Guidance

  • Review the subagent runtime code (e.g., subagent-announce.ts or pi-embedded-runner/run/attempt.ts) to ensure it correctly handles cases where the final assistant message has stopReason: "aborted" and zero output tokens.
  • Consider implementing a retry mechanism or synthesizing a completion event with a failure status to notify the parent agent of the abortion.
  • Verify that the gateway's abortedLastRun flag is being set correctly based on the subagent's final message.
  • Evaluate the current workaround of polling sessions_list with kind=subagent and checking for stale "done" entries, and consider replacing it with a more reliable solution.

Example

if (finalMessage.stopReason === "aborted" && finalMessage.content.length === 0) {
  runRecord.status = "aborted";
  abortedLastRun = true;
  // Either auto-retry or synthesize a completion event with failure status
  // ...
}

Notes

The provided fix sketch and example code assume that the subagent runtime code has access to the necessary variables and functions to update the runRecord and handle the abortion correctly. Additional modifications may be required to integrate this fix into the existing codebase.

Recommendation

Apply the suggested fix to update the subagent runtime to correctly handle aborted inference calls, as it addresses the root cause of the issue and provides a more reliable solution than the current workaround.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Aborted subagent with empty content silently marked `done`, never auto-announces — parent black-holes [1 comments, 2 participants]