openclaw - 💡(How to fix) Fix Gateway crashes with 'Agent listener invoked outside active run' during multi-story orchestration loops [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#61817Fetched 2026-04-08 02:54:07
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
closed ×1

Error Message

[openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run
    at Agent.processEvents (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent.ts:533:10)
    at file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent.ts:380:21
    at Object.onUpdate (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent-loop.ts:539:7)
    at emitUpdate (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1524:8)
    at handleStdout (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1546:4)
    at Object.onSupervisorStdout [as onStdout] (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1610:3)
    at file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1007:21
    at Socket.<anonymous> (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:568:4)
    at Socket.emit (node:events:519:28)
    at addChunk (node:internal/streams/readable:561:12)

Root Cause

The crash appears to be a race condition in pi-agent-core where:

  1. The subagent completion announce saturates or blocks the gateway WebSocket during CI polling (long-running 60s poll loops over several minutes)
  2. When the orchestration agent's turn resumes and it tries to spawn a new subagent, the Agent.processEvents listener fires but the agent run context has already been invalidated
  3. The unhandled rejection kills the gateway process

The crash is 100% reproducible in any long-running orchestration loop that involves subagent spawning after a CI polling step.

Fix Action

Fix / Workaround

Attempted Mitigations (none resolved the issue)

Code Example

[openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run
    at Agent.processEvents (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent.ts:533:10)
    at file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent.ts:380:21
    at Object.onUpdate (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent-loop.ts:539:7)
    at emitUpdate (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1524:8)
    at handleStdout (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1546:4)
    at Object.onSupervisorStdout [as onStdout] (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1610:3)
    at file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1007:21
    at Socket.<anonymous> (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:568:4)
    at Socket.emit (node:events:519:28)
    at addChunk (node:internal/streams/readable:561:12)

---

Subagent announce completion direct announce agent call transient failure, retrying 2/4 in 5s: gateway timeout after 120000ms
Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 120000ms
Subagent announce completion direct announce agent call transient failure, retrying 4/4 in 20s: gateway timeout after 120000ms

---

[diagnostic] lane wait exceeded: lane=session:agent:agent-3:main waitedMs=485113 queueAhead=0
RAW_BUFFERClick to expand / collapse

Bug Description

The gateway consistently crashes with an unhandled promise rejection (Agent listener invoked outside active run) when an orchestration agent runs a multi-story execution loop involving subagent spawning. The crash occurs at the boundary between completing one story's PR/CI/merge cycle and spawning the next story's subagent.

Environment

  • OpenClaw: 2026.4.5 (3e72c03)
  • Node.js: v22.22.1
  • OS: Windows 10 Pro 10.0.26200 (MINGW64)
  • Agent model: anthropic/claude-opus-4-6
  • Gateway mode: local, loopback binding

Reproduction

  1. Configure an orchestration agent that runs a multi-step workflow (story creation → validation → implementation → code review → PR/CI/merge)
  2. Each step is delegated to a subagent (one step per subagent, fresh context each time)
  3. Step 8 creates a PR, polls CI every 60s, and merges when CI passes
  4. After merge, the orchestration agent attempts to spawn the next story's subagent
  5. Gateway crashes consistently at this transition point

The crash has been reproduced 3 times across different runs:

  • Run 1: Crashed after R-1 merge, attempting R-2 spawn
  • Run 2: Crashed after R-10 merge, attempting R-3 spawn
  • Run 3: Crashed after R-3/R-4/R-5 batch merge, attempting R-6 spawn

Stack Trace

[openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run
    at Agent.processEvents (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent.ts:533:10)
    at file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent.ts:380:21
    at Object.onUpdate (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/src/agent-loop.ts:539:7)
    at emitUpdate (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1524:8)
    at handleStdout (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1546:4)
    at Object.onSupervisorStdout [as onStdout] (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1610:3)
    at file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:1007:21
    at Socket.<anonymous> (file:///C:/Users/lanna/AppData/Local/nvm/v22.22.1/node_modules/openclaw/dist/exec-defaults-uj0McX2k.js:568:4)
    at Socket.emit (node:events:519:28)
    at addChunk (node:internal/streams/readable:561:12)

Preceding Gateway Warnings

Before each crash, the subagent announce completion times out repeatedly:

Subagent announce completion direct announce agent call transient failure, retrying 2/4 in 5s: gateway timeout after 120000ms
Subagent announce completion direct announce agent call transient failure, retrying 3/4 in 10s: gateway timeout after 120000ms
Subagent announce completion direct announce agent call transient failure, retrying 4/4 in 20s: gateway timeout after 120000ms

Also observed before the crash:

[diagnostic] lane wait exceeded: lane=session:agent:agent-3:main waitedMs=485113 queueAhead=0

Attempted Mitigations (none resolved the issue)

  1. Added 15s then 30s cooldown between PR merge and next subagent spawn — crash still occurs
  2. Delegated PR/CI/merge to a dedicated subagent to keep the orchestration agent's connection clean — crash still occurs
  3. All runs on gateway --force with fresh startup

Analysis

The crash appears to be a race condition in pi-agent-core where:

  1. The subagent completion announce saturates or blocks the gateway WebSocket during CI polling (long-running 60s poll loops over several minutes)
  2. When the orchestration agent's turn resumes and it tries to spawn a new subagent, the Agent.processEvents listener fires but the agent run context has already been invalidated
  3. The unhandled rejection kills the gateway process

The crash is 100% reproducible in any long-running orchestration loop that involves subagent spawning after a CI polling step.

Expected Behavior

The gateway should gracefully handle subagent announce timeouts without corrupting the listener state, and allow the orchestration agent to continue spawning new subagents after PR/merge operations.

extent analysis

TL;DR

The gateway crash is likely due to a race condition in pi-agent-core caused by subagent completion announce timeouts during CI polling, and can be mitigated by adjusting the gateway's WebSocket timeout or the subagent announce completion retry mechanism.

Guidance

  • Investigate increasing the WebSocket timeout in the gateway configuration to allow for longer CI polling periods without saturating the gateway.
  • Consider modifying the subagent announce completion retry mechanism to use exponential backoff or a more robust retry strategy to reduce the likelihood of timeouts.
  • Review the pi-agent-core code to identify potential issues with the Agent.processEvents listener and its interaction with the agent run context.
  • Test the orchestration loop with a shorter CI polling period to see if the crash still occurs, which can help determine if the issue is indeed related to the polling duration.

Example

No code snippet is provided as the issue is more related to configuration and potential issues in the pi-agent-core library.

Notes

The provided analysis suggests a race condition in pi-agent-core, but without access to the library's code, it's difficult to provide a definitive solution. The suggested mitigations are based on the provided information and may require further investigation to fully resolve the issue.

Recommendation

Apply a workaround by adjusting the gateway's WebSocket timeout or the subagent announce completion retry mechanism, as upgrading to a fixed version of pi-agent-core is not mentioned as an option in the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING