openclaw - 💡(How to fix) Fix [Bug]: sessions_yield lacks fallback resume after subagent announce failure or timeout

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

sessions_yield can leave the parent session stuck when a subagent completion announce fails, retries are exhausted, or a child run times out.

Error Message

Observed evidence from gateway log on 2026-05-26:

  • 10:44 CST: Parent workflow spawned 3 QC subagents and called sessions_yield.
  • 10:47 CST: One child exceeded runTimeoutSeconds=180 and timed out after 184s.
  • 10:47:34 CST: Subagent completion direct announce failed with SessionWriteLockTimeoutError: session file locked (timeout 60000ms).
  • 10:47:40 CST: Subagent announce gave up after retry-limit, retries=3.
  • 10:48:37-10:50:01 CST: Additional subagent completion direct announce attempts failed with SessionWriteLockTimeoutError.
  • 10:51:00 CST: Parent session still failed before reply with session file locked (timeout 60000ms).
  • Gateway restart was required to recover the usable session.

Representative log lines:

[2026-05-26T10:47:34.753+08:00] [INFO] [] [warn] Subagent completion direct announce failed for run 2fc98c2d-16d4-434b-a234-e6533ff9f530: SessionWriteLockTimeoutError: session file locked (timeout 60000ms)

[2026-05-26T10:47:40.561+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=2fc98c2d-16d4-434b-a234-e6533ff9f530 child=agent:main:subagent:6351dfbc-a71e-4582-9899-c90eac9aaf2c requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:48:37.891+08:00] [INFO] [] [warn] Subagent completion direct announce failed for run 50c2a78c-890f-47f8-b313-c02104db884a: SessionWriteLockTimeoutError: session file locked (timeout 60000ms)

[2026-05-26T10:48:43.280+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=50c2a78c-890f-47f8-b313-c02104db884a child=agent:main:subagent:6094839c-8680-4fbb-891d-60b807067603 requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:50:01.326+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=9ca22528-3eb2-4e6e-afbc-637ca2bea6f7 child=agent:main:subagent:af34ca37-ca83-4524-8a6e-487073f82a0b requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:51:00.229+08:00] [ERROR] [] Embedded agent failed before reply: session file locked (timeout 60000ms): pid=62821 /root/.openclaw/agents/main/sessions/07db13d3-5af2-40a8-8c03-84b222e175f5.jsonl.lock

Root Cause

sessions_yield can leave the parent session stuck when a subagent completion announce fails, retries are exhausted, or a child run times out.

Code Example

Observed evidence from gateway log on 2026-05-26:

- 10:44 CST: Parent workflow spawned 3 QC subagents and called sessions_yield.
- 10:47 CST: One child exceeded runTimeoutSeconds=180 and timed out after 184s.
- 10:47:34 CST: Subagent completion direct announce failed with SessionWriteLockTimeoutError: session file locked (timeout 60000ms).
- 10:47:40 CST: Subagent announce gave up after retry-limit, retries=3.
- 10:48:37-10:50:01 CST: Additional subagent completion direct announce attempts failed with SessionWriteLockTimeoutError.
- 10:51:00 CST: Parent session still failed before reply with session file locked (timeout 60000ms).
- Gateway restart was required to recover the usable session.

Representative log lines:

[2026-05-26T10:47:34.753+08:00] [INFO] [] [warn] Subagent completion direct announce failed for run 2fc98c2d-16d4-434b-a234-e6533ff9f530: SessionWriteLockTimeoutError: session file locked (timeout 60000ms)

[2026-05-26T10:47:40.561+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=2fc98c2d-16d4-434b-a234-e6533ff9f530 child=agent:main:subagent:6351dfbc-a71e-4582-9899-c90eac9aaf2c requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:48:37.891+08:00] [INFO] [] [warn] Subagent completion direct announce failed for run 50c2a78c-890f-47f8-b313-c02104db884a: SessionWriteLockTimeoutError: session file locked (timeout 60000ms)

[2026-05-26T10:48:43.280+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=50c2a78c-890f-47f8-b313-c02104db884a child=agent:main:subagent:6094839c-8680-4fbb-891d-60b807067603 requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:50:01.326+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=9ca22528-3eb2-4e6e-afbc-637ca2bea6f7 child=agent:main:subagent:af34ca37-ca83-4524-8a6e-487073f82a0b requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:51:00.229+08:00] [ERROR] [] Embedded agent failed before reply: session file locked (timeout 60000ms): pid=62821 /root/.openclaw/agents/main/sessions/07db13d3-5af2-40a8-8c03-84b222e175f5.jsonl.lock
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

sessions_yield can leave the parent session stuck when a subagent completion announce fails, retries are exhausted, or a child run times out.

Steps to reproduce

Observed path, not a minimal synthetic repro:

  1. Run OpenClaw 2026.5.22 in WebChat / dashboard.
  2. Start a parent workflow that spawns multiple subagents with sessions_spawn.
  3. Have the parent call sessions_yield after spawning the children.
  4. In the observed run, one child exceeded runTimeoutSeconds=180 and finished as status=timeout after 184s.
  5. Subagent completion announce attempts then failed with SessionWriteLockTimeoutError.
  6. After 3 retries, the announce path gave up.
  7. The parent session did not resume and the dashboard session became unusable until gateway restart.

Expected behavior

The parent session should not remain indefinitely dependent on a single completion/announce delivery path.

Expected behavior:

  • If a child finishes, errors, times out, or its announce retries are exhausted, the parent should receive a consumable wake event.
  • sessions_yield should resume the parent with structured partial/failure status.
  • The result should include completed, timed-out, failed, announce-failed, and unknown children.
  • A child timeout or announce failure should not leave the parent session stuck or require a gateway restart.

Actual behavior

The parent session remained stuck after subagent completion announce failures.

Observed results:

  • The parent session stopped writing normally after the sessions_yield run.
  • A child run timed out after 184s while runTimeoutSeconds was 180s.
  • Completion direct announce failed repeatedly with SessionWriteLockTimeoutError.
  • Announce retried 3 times and then gave up.
  • Subsequent requests failed with session file locked (timeout 60000ms).
  • Refreshing / creating a new session did not recover the workflow.
  • Gateway restart was required.

OpenClaw version

2026.5.22

Operating system

Kali Linux

Install method

npm global

Model

NOT_ENOUGH_INFO

Provider / routing chain

NOT_ENOUGH_INFO

Additional provider/model setup details

This does not appear to be model-specific based on the available evidence. The failure is in the session/write-lock/subagent completion announce path.

Logs, screenshots, and evidence

Observed evidence from gateway log on 2026-05-26:

- 10:44 CST: Parent workflow spawned 3 QC subagents and called sessions_yield.
- 10:47 CST: One child exceeded runTimeoutSeconds=180 and timed out after 184s.
- 10:47:34 CST: Subagent completion direct announce failed with SessionWriteLockTimeoutError: session file locked (timeout 60000ms).
- 10:47:40 CST: Subagent announce gave up after retry-limit, retries=3.
- 10:48:37-10:50:01 CST: Additional subagent completion direct announce attempts failed with SessionWriteLockTimeoutError.
- 10:51:00 CST: Parent session still failed before reply with session file locked (timeout 60000ms).
- Gateway restart was required to recover the usable session.

Representative log lines:

[2026-05-26T10:47:34.753+08:00] [INFO] [] [warn] Subagent completion direct announce failed for run 2fc98c2d-16d4-434b-a234-e6533ff9f530: SessionWriteLockTimeoutError: session file locked (timeout 60000ms)

[2026-05-26T10:47:40.561+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=2fc98c2d-16d4-434b-a234-e6533ff9f530 child=agent:main:subagent:6351dfbc-a71e-4582-9899-c90eac9aaf2c requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:48:37.891+08:00] [INFO] [] [warn] Subagent completion direct announce failed for run 50c2a78c-890f-47f8-b313-c02104db884a: SessionWriteLockTimeoutError: session file locked (timeout 60000ms)

[2026-05-26T10:48:43.280+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=50c2a78c-890f-47f8-b313-c02104db884a child=agent:main:subagent:6094839c-8680-4fbb-891d-60b807067603 requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:50:01.326+08:00] [INFO] [] [warn] Subagent announce give up (retry-limit) run=9ca22528-3eb2-4e6e-afbc-637ca2bea6f7 child=agent:main:subagent:af34ca37-ca83-4524-8a6e-487073f82a0b requester=agent:main:dashboard:main retries=3 end

[2026-05-26T10:51:00.229+08:00] [ERROR] [] Embedded agent failed before reply: session file locked (timeout 60000ms): pid=62821 /root/.openclaw/agents/main/sessions/07db13d3-5af2-40a8-8c03-84b222e175f5.jsonl.lock

Impact and severity

P1 candidate.

Affected area:

  • WebChat / dashboard parent sessions using sessions_spawn + sessions_yield
  • Multi-subagent workflows
  • Gateway-backed session transcript writing

Severity:

  • Blocks the parent workflow.
  • Parent session becomes unusable.
  • Subagent completion can be lost after retry exhaustion.
  • Gateway restart may be required to recover.

Frequency:

  • Observed once on OpenClaw 2026.5.22.
  • Reproduction appears conditional on child timeout or completion announce failure.

Consequence:

  • The parent session does not resume with partial or failure status.
  • The user cannot reliably recover from the same session.
  • Long-running multi-agent workflows can hang or lose results.

Additional information

This appears to be a sessions_yield fallback-resume contract gap, not only an announce-delivery bug.

Observed source-level behavior:

  • sessions_yield only accepts message.
  • Its execute path calls await opts.onYield(message).
  • There is no maxWaitMs, timeoutSeconds, onTimeout, or unresolved-child summary.
  • SESSIONS_YIELD_ABORT_SETTLE_TIMEOUT_MS is used for abort-settle handling, not for waiting on child completion.
  • sessions_spawn.runTimeoutSeconds controls child execution timeout, not parent yield recovery.

Related issues / PRs:

  • #44925: completion announce loss and timeout without notification
  • #82140: specific announce path failure leaving parent in sessions_yield wait state
  • #79051: removed sessions_yield guidance from normal sessions_spawn accepted note, but did not change runtime behavior
  • #19506: requested spawn-and-wait semantics with timeout, closed as not planned

This is related to #44925 but not a duplicate. #44925 focuses on lost completion delivery. This issue focuses on the missing sessions_yield fallback-resume contract when delivery fails.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The parent session should not remain indefinitely dependent on a single completion/announce delivery path.

Expected behavior:

  • If a child finishes, errors, times out, or its announce retries are exhausted, the parent should receive a consumable wake event.
  • sessions_yield should resume the parent with structured partial/failure status.
  • The result should include completed, timed-out, failed, announce-failed, and unknown children.
  • A child timeout or announce failure should not leave the parent session stuck or require a gateway restart.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: sessions_yield lacks fallback resume after subagent announce failure or timeout