openclaw - 💡(How to fix) Fix Runtime: parallel sessions_send can orphan faster callee pingback when sibling call is slow [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When a caller agent fires multiple sessions_send tool calls in parallel in the same assistant message, and one callee is slow, a faster callee's [TASK-COMPLETE] pingback can be orphaned.

Specifically: the faster callee's pingback is queued to the caller's run-scoped session key with delivery: pending, but that queued message is never surfaced into the caller's LLM context before the run ends.

This breaks reliable cron-driven multi-specialist workflows.

Root Cause

The April 2026 utility breakdown cron never produced billing output because the caller waited for a re-portal occupancy pingback that had already been accepted as delivery: pending but was never delivered into context. Tenants were not billed on time.

Fix Action

Fixed

RAW_BUFFERClick to expand / collapse

Summary

When a caller agent fires multiple sessions_send tool calls in parallel in the same assistant message, and one callee is slow, a faster callee's [TASK-COMPLETE] pingback can be orphaned.

Specifically: the faster callee's pingback is queued to the caller's run-scoped session key with delivery: pending, but that queued message is never surfaced into the caller's LLM context before the run ends.

This breaks reliable cron-driven multi-specialist workflows.

Impact

The April 2026 utility breakdown cron never produced billing output because the caller waited for a re-portal occupancy pingback that had already been accepted as delivery: pending but was never delivered into context. Tenants were not billed on time.

Real incident / repro

  • Cron: monthly-utility-breakdown (8babbec2-92e3-4f96-b7d2-e793d19aa71e), fired 2026-05-24 09:00 ET
  • Caller session: agent:leasing-ops:cron:8babbec2-…, run/session e08e19ff-e18e-43ac-aa67-830d41d9d03d
  • At 13:01:16 UTC the caller fired two parallel sessions_send tool calls:
    • call_LMQ3jr1lW7idzAnsBHCwUmzjagent:re-portal:telegram:direct:8035547811
    • call_mgVHgrIkzXs3iyGsZUTIG5GHagent:web-research:telegram:direct:8035547811
  • Re-portal completed and sent [TASK-COMPLETE] pingback at ~13:04:17 UTC, targeting agent:leasing-ops:cron:…:run:e08e19ff-…. Runtime accepted it with delivery: pending.
  • Web-research's web_fetch against the Atlanta municode site stalled until 13:11:21 UTC.
  • Gateway diagnostics repeatedly logged the caller as blocked_tool_call. After re-portal had finished, diagnostics showed lastProgress=tool:sessions_send:ended while activeTool=sessions_send activeToolCallId=call_mgVH... was still running, which suggests the completed re-portal tool result was held behind the still-running sibling parallel call.
  • At 13:11:21 UTC both tool results surfaced together. Re-portal's result had delivery: pending and no inline reply; web-research's result had an inline reply.
  • At 13:11:42 UTC the caller yielded: Waiting on re-portal occupancy result for April 2026 utility billing.
  • The already-queued re-portal pingback was never delivered into the caller context. The run ended at 13:11:51 UTC. The pingback was orphaned.

Evidence

Local evidence paths from the affected host:

  • Gateway log: /tmp/openclaw/openclaw-2026-05-24.log lines 454–650
  • Caller transcript: /home/josephpogue/.openclaw/agents/leasing-ops/sessions/e08e19ff-e18e-43ac-aa67-830d41d9d03d.jsonl
  • Re-portal transcript: /home/josephpogue/.openclaw/agents/re-portal/sessions/01284b11-54e7-4529-9d24-9cbc5039622f.jsonl

Notable caller transcript entries:

  • line 24: assistant emits both parallel sessions_send calls
  • line 25: re-portal sessions_send result returns only at 13:11:21 with delivery.status=pending and no inline reply
  • line 26: web-research sessions_send result returns at 13:11:21 with inline reply
  • lines 29–31: caller yields waiting for the re-portal result that was already queued but never surfaced

Notable gateway log entries:

  • lines 595, 602, 608, 611, 619, 625, 630, 636 show repeated stalled-session warnings with:
    • lastProgress=tool:sessions_send:ended
    • activeTool=sessions_send
    • activeToolCallId=call_mgVHgrIkzXs3iyGsZUTIG5GH...

Expected behavior

A faster sessions_send result/pingback should not be effectively lost because another parallel sessions_send call is still waiting.

The caller should receive either:

  1. an independently resolved tool result for each sessions_send as it completes, or
  2. queued inter-session messages should be delivered into the caller LLM context as soon as the caller resumes from the tool-result wait, or
  3. queued delivery: announce pingbacks should be persisted/rerouted so they surface on the next run if the run-scoped target has already ended.

Suggested fix areas

Priority order:

  1. Stop holding parallel sessions_send results as a batch. Each sessions_send should resolve independently once its callee replies or its inline-reply window expires, even if sibling parallel tool calls are still pending.
  2. Deliver queued inter-session messages into LLM context when the caller run resumes from a tool-result wait, not only when the session is fully idle.
  3. Make delivery: announce pingbacks resilient to run termination. If the target run has ended, reroute the queued message to the session's next run or surface it as a system/session message on resume rather than dropping/orphaning it.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A faster sessions_send result/pingback should not be effectively lost because another parallel sessions_send call is still waiting.

The caller should receive either:

  1. an independently resolved tool result for each sessions_send as it completes, or
  2. queued inter-session messages should be delivered into the caller LLM context as soon as the caller resumes from the tool-result wait, or
  3. queued delivery: announce pingbacks should be persisted/rerouted so they surface on the next run if the run-scoped target has already ended.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING