openclaw - 💡(How to fix) Fix `openclaw agent` CLI exits before parallel sub-agents finish; reported `finalAssistantVisibleText` only captures the parent's first turn [2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#76145Fetched 2026-05-03 04:41:51
View on GitHub
Comments
2
Participants
3
Timeline
4
Reactions
2
Timeline (top)
commented ×2cross-referenced ×1unsubscribed ×1

Root Cause

  • The CLI output misrepresents the agent. Operators see "Waiting on findings now…" and conclude the agent is fire-and-forget. The actual synthesis is high-quality (matches a senior SRE writeup), but it never surfaces in JSON.
  • Sub-agents get killed mid-turn. Telemetry consistently doesn't get its reply in. The parent never has the full picture, even though it could.
  • JSON consumers cannot tell partial from complete runs. livenessState: "working" + replayInvalid: true is the only signal, and neither is documented as "this output is incomplete."

Fix Action

Fix / Workaround

On 2026-04-29 (same scenario, same plugin head f1e03aa, same openclaw 2026.4.12, same workarounds), Stage 9 took several minutes wall, produced 7 LLM calls (4 SRE + 1 each sub-agent), 62 total spans, and finalAssistantVisibleText contained the full triage report. Today, the same invocation across 5 attempts produces 5–7 LLM calls, 28–44 spans, and finalAssistantVisibleText is always the announce.

Code Example

# Move sessions aside for a clean run
docker exec <container> sh -c '
for a in sre telemetry backend db; do
  mv /data/.openclaw/agents/$a/sessions /data/.openclaw/agents/$a/sessions.bak.$(date -u +%Y%m%dT%H%M%SZ) 2>/dev/null
done
'

# Launch SRE-MAS triage
NEW_SID=$(uuidgen)
docker exec <container> openclaw agent \
  --agent sre \
  --session-id "$NEW_SID" \
  --json \
  --message "Customers report checkout failures... <full incident scenario>" \
  > /tmp/sre.out 2> /tmp/sre.err

# Result: CLI returns at ~30s with finalAssistantVisibleText:
#   "I've engaged telemetry, backend, and db. Waiting on their findings now…"
#
# But the agent's session jsonl contains 3-4 follow-up assistant turns with real RCA:
#   - "DB findings are in. Pool saturated at 200/200, 47 deadlocks in 5 min..."
#   - "Backend findings are in. order-service v2.4.1 deployed 14min before incident..."
#   - (telemetry's reply often gets cut off — never lands)

# Inspect what really happened:
SESSION=$(docker exec <container> ls -t /data/.openclaw/agents/sre/sessions/*.jsonl | head -1)
docker exec <container> jq -r 'select(.message.role=="assistant") | .message.content | map(select(.type=="text") | .text) | .[]' "$SESSION"
RAW_BUFFERClick to expand / collapse

Problem

When the parent agent uses sessions_spawn to engage sub-agents in parallel, the CLI's foreground process exits as soon as the parent emits its first text turn (the "announce" turn — e.g., "I've engaged telemetry, backend, and db. Waiting on findings…"), even though:

  1. Sub-agents are still mid-LLM-call when the CLI returns
  2. The parent agent itself goes on to produce detailed synthesis turns after each sub-agent reply (visible in the agent's session jsonl)
  3. The reported meta.livenessState is "working" and replayInvalid: true

The reported meta.finalAssistantVisibleText field captures the first assistant text turn (the announce), not the last one. Anyone reading the CLI's JSON output sees only the placeholder "Waiting on findings now…" message and never the actual synthesis.

Tested-against

  • openclaw 2026.4.12 (1c0672b)
  • Plugin pinned at two heads (28c6e3f and a revert to f1e03aa) — same behavior on both, confirming this is not plugin-related
  • Scenario: SRE-MAS triage with 3 sub-agents (telemetry / backend / db) spawned via sessions_spawn, db_api up and reachable
  • 5 separate runs over ~2 hours, all reproduced

Repro

# Move sessions aside for a clean run
docker exec <container> sh -c '
for a in sre telemetry backend db; do
  mv /data/.openclaw/agents/$a/sessions /data/.openclaw/agents/$a/sessions.bak.$(date -u +%Y%m%dT%H%M%SZ) 2>/dev/null
done
'

# Launch SRE-MAS triage
NEW_SID=$(uuidgen)
docker exec <container> openclaw agent \
  --agent sre \
  --session-id "$NEW_SID" \
  --json \
  --message "Customers report checkout failures... <full incident scenario>" \
  > /tmp/sre.out 2> /tmp/sre.err

# Result: CLI returns at ~30s with finalAssistantVisibleText:
#   "I've engaged telemetry, backend, and db. Waiting on their findings now…"
#
# But the agent's session jsonl contains 3-4 follow-up assistant turns with real RCA:
#   - "DB findings are in. Pool saturated at 200/200, 47 deadlocks in 5 min..."
#   - "Backend findings are in. order-service v2.4.1 deployed 14min before incident..."
#   - (telemetry's reply often gets cut off — never lands)

# Inspect what really happened:
SESSION=$(docker exec <container> ls -t /data/.openclaw/agents/sre/sessions/*.jsonl | head -1)
docker exec <container> jq -r 'select(.message.role=="assistant") | .message.content | map(select(.type=="text") | .text) | .[]' "$SESSION"

Why this matters

  • The CLI output misrepresents the agent. Operators see "Waiting on findings now…" and conclude the agent is fire-and-forget. The actual synthesis is high-quality (matches a senior SRE writeup), but it never surfaces in JSON.
  • Sub-agents get killed mid-turn. Telemetry consistently doesn't get its reply in. The parent never has the full picture, even though it could.
  • JSON consumers cannot tell partial from complete runs. livenessState: "working" + replayInvalid: true is the only signal, and neither is documented as "this output is incomplete."

Comparison to a known-good run

On 2026-04-29 (same scenario, same plugin head f1e03aa, same openclaw 2026.4.12, same workarounds), Stage 9 took several minutes wall, produced 7 LLM calls (4 SRE + 1 each sub-agent), 62 total spans, and finalAssistantVisibleText contained the full triage report. Today, the same invocation across 5 attempts produces 5–7 LLM calls, 28–44 spans, and finalAssistantVisibleText is always the announce.

The change between 2026-04-29 and 2026-05-01 in our environment was a docker restart of the openclaw container (which retriggered openclaw's startup path, including doctor). Nothing else materially changed. So either:

  • The 2026.4.12 npm package on disk has been updated in place since 04-29
  • A different setTimeout / lifecycle path now fires earlier
  • Some local state (sessions, locks) is influencing the agent runtime exit

Suspected root cause

The CLI's "agent done" signal is likely tied to the parent's first turn emitting stopReason: "stop". Under push-based auto-announce, that stop is expected (the parent says "engaged, will synthesize on push"), but the CLI shouldn't treat it as terminal — the run continues async. The runtime may be tearing down the agent process when the parent returns from its first turn, which kills sub-agents in flight.

Proposed fix shape

Two paths worth maintainer comment:

  1. Wait for livenessState: "completed". The CLI should hold the foreground process until the runtime reports completion, not just stop, especially when sub-agents are pending.

  2. Capture finalAssistantVisibleText from the LAST assistant turn, not the first. This is independently a valid fix even if (1) is rejected.

  3. Document that replayInvalid: true means "output is incomplete." Many CLI consumers parse the JSON and expect a complete result.

Severity

High for anyone using openclaw to evaluate MAS behavior. The CLI output systematically understates what the agent is doing.

extent analysis

TL;DR

The CLI's foreground process exits prematurely when the parent agent emits its first text turn, causing sub-agents to be killed mid-turn and resulting in incomplete output.

Guidance

  • Verify that the livenessState is indeed "working" and replayInvalid is true when the CLI exits, indicating that the output is incomplete.
  • Investigate the setTimeout and lifecycle paths in the openclaw code to determine if a different path is being taken, causing the agent runtime to exit earlier than expected.
  • Consider modifying the CLI to wait for livenessState: "completed" before exiting, to ensure that all sub-agents have finished their turns.
  • Capture finalAssistantVisibleText from the last assistant turn, rather than the first, to provide a complete and accurate representation of the agent's output.

Example

No code example is provided, as the issue is more related to the overall workflow and lifecycle of the openclaw agent, rather than a specific code snippet.

Notes

The issue seems to be related to a change in the openclaw environment, possibly due to a docker restart or an update to the npm package. Further investigation is needed to determine the root cause and implement a fix.

Recommendation

Apply a workaround to wait for livenessState: "completed" before exiting the CLI, to ensure that all sub-agents have finished their turns and the output is complete. This will provide a more accurate representation of the agent's behavior and prevent sub-agents from being killed mid-turn.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix `openclaw agent` CLI exits before parallel sub-agents finish; reported `finalAssistantVisibleText` only captures the parent's first turn [2 comments, 3 participants]