openclaw - ✅(Solved) Fix Gateway crash: background exec output after subagent run completes triggers unhandled rejection in pi-agent-core (exec-after-run race) [3 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62520Fetched 2026-04-08 03:03:08
View on GitHub
Comments
0
Participants
1
Timeline
3
Reactions
2
Participants
Timeline (top)
cross-referenced ×2referenced ×1

When a subagent's exec tool spawns a background process (e.g. tsc --noEmit, npm run build) and the agent run completes, any late stdout/stderr from the still-running background process triggers Agent.processEvents() in pi-agent-core. At that point activeRun is undefined (cleared by finishRun()), so processEvents() throws — this becomes an unhandled promise rejection and crashes the gateway.

Observed: OpenClaw 2026.4.5, macOS (Darwin 25.4.0 arm64), crash at 2026-04-07 20:14:43 GMT+7.


Error Message

[openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run at Agent.processEvents (@mariozechner/pi-agent-core/dist/agent.js:388) at ...onUpdate callback at emitUpdate (exec-defaults-.js:1524) at handleStdout (exec-defaults-.js:1546) at Object.onSupervisorStdout at Socket.<anonymous>

Root Cause

The emitUpdate() guard checks session.backgrounded || session.exited but this does not cover the case where:

  • The exec session is not marked backgrounded (foreground exec mode), and
  • The process is still running (not yet exited), but
  • The parent agent run has already finished (activeRun is undefined)

This is a race between the subprocess lifecycle and the agent run lifecycle, specifically triggered in the subagent + background-spawning exec pattern.


PR fix notes

PR #62815: fix(exec): prevent gateway crash from agent listener error

Description (problem / solution / changelog)

Summary

Fixes critical gateway crash when subprocess stdout arrives after agent run ends, causing unhandled promise rejection "Agent listener invoked outside active run".

Changes:

  • Add try-catch wrapper around opts.onUpdate() in emitUpdate()
  • Add suppression flag to prevent repeated errors after first failure
  • Proactively disable updates when exec process completes
  • Add comprehensive test coverage (5 new tests, all passing)
  • Add detailed inline documentation explaining the race condition

Fixes

  • #62746 - Gateway crash: Unhandled promise rejection
  • #62520 - Background exec output after subagent run completes
  • #62477 - Subprocess stdout arrives after agent run ends
  • #61741 - Race condition in subagent/session cleanup
  • #62435 - Gateway crash: Agent listener invoked outside active run

Root Cause

Race condition between subprocess lifecycle (exec-runtime) and agent run lifecycle (pi-agent-core). When an agent run completes and clears activeRun, any subprocess still producing stdout triggers opts.onUpdate() which calls Agent.processEvents(). Since activeRun is undefined, it throws an error that becomes an unhandled promise rejection, crashing the gateway.

Solution

Defensive guard with smart suppression:

  1. Try-catch wrapper prevents crash
  2. Suppression flag stops repeated errors
  3. Proactive disabling in promise handlers
  4. Smart logging (warn first, debug subsequent)

Test Plan

  • ✅ All 5 new tests pass (race condition, suppression, PTY, normal operation, backgrounded)
  • ✅ All 164 existing bash-tools tests pass
  • ✅ Verified exec process continues normally after error suppression
  • ✅ Verified logging provides clear context

Impact

  • Prevents gateway crashes occurring every 20-30 minutes under multi-lane operation
  • Affects all platforms (Windows, macOS, Linux)
  • Affects all channels (Telegram, Discord, Slack, WhatsApp, CLI)
  • Resolves 20+ duplicate issues

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/bash-tools.exec-runtime.ts (modified, +58/-11)
  • src/agents/bash-tools.exec.on-update-lifecycle.test.ts (added, +106/-0)
  • src/agents/bash-tools.exec.pty-cleanup.test.ts (modified, +6/-6)

PR #62821: fix(exec): disable onUpdate after run settlement to prevent gateway crash

Description (problem / solution / changelog)

Summary

  • Problem: When a subagent exec tool (tsc --noEmit, npm run build, etc.) produces stdout/stderr output after the agent run has already ended (via normal completion, abort, or timeout), emitUpdate() in bash-tools.exec-runtime.ts (line 583–603) calls opts.onUpdate() which propagates to pi-agent-core's processEvents(). Since finishRun() has already cleared activeRun to undefined, processEvents() throws "Agent listener invoked outside active run", resulting in an unhandled Promise rejection that crashes the gateway process.

  • Root Cause: emitUpdate() guards only check session.backgrounded || session.exited — both reflect the exec session's own lifecycle, not the agent run's lifecycle. These two lifecycles are independent and have a race window:

    i. Post-exit race: managedRun.wait() resolves (process exits) → .then() callback starts executing → but markExited() hasn't flipped session.exited yet → a queued stdout data event fires in the same event-loop tick → emitUpdate() passes the guard → calls onUpdate() into a potentially disposed agent run. ii. Subagent abort/timeout race: The parent agent run ends (finishRun() clears activeRun) while the exec process is still running (session.exited = false, session.backgrounded = false) → stdout arrives → emitUpdate() passes all guards → onUpdate() produces a rejected Promise that becomes an unhandled rejection.

    The existing fix(exec): stop emitting tool updates after session is backgrounded #61627 fix (merged Apr 6) only covered the backgrounded || exited path and cannot reach these two race windows.

  • Fix: Two-layer approach targeting the root cause at different lifecycle boundaries:

    i. Layer 1 — deterministic cutoff on process exit: Set updatesDisabled = true at the top of .then() and .catch() in the managedRun.wait() promise chain, before markExited(). This closes race window 1 by ensuring no emitUpdate() can fire during the micro-tick gap between promise settlement and session.exited flipping. ii. Layer 2 — abort-signal cutoff: Expose disableUpdates() on ExecProcessHandle and call it from onAbortSignal() in bash-tools.exec.ts. When the agent run ends (abort/timeout), the signal fires and immediately suppresses all future onUpdate calls, preventing any late stdout/stderr from producing a rejected Promise inside pi-agent-core's updateEvents array.

    Why not try-catch? pi-agent-core's onUpdate closure internally calls updateEvents.push(Promise.resolve(emit(event))) where emitprocessEvents is an async function. When activeRun is undefined, processEvents throws synchronously inside the async body, but per ES2017 semantics this is converted to a rejected Promise, not a synchronous exception. A try-catch around opts.onUpdate() would never catch anything — the rejected Promise escapes into updateEvents and becomes an Unhandled Promise Rejection that crashes the Gateway. The correct fix is to prevent the call from ever happening, which is what updatesDisabled achieves.

  • What changed:

    • src/agents/bash-tools.exec-runtime.ts:
      • Added updatesDisabled flag and || updatesDisabled guard in emitUpdate()
      • Added disableUpdates() method to ExecProcessHandle type and return value, exposing the internal updatesDisabled flag to callers
      • Set updatesDisabled = true in .then() and .catch() of the promise chain before markExited()
    • src/agents/bash-tools.exec.ts:
      • Added run.disableUpdates() as first statement in onAbortSignal() to suppress updates immediately when abort signal fires
    • src/agents/bash-tools.test.ts:
      • Added two regression tests in the existing "exec backgrounded onUpdate suppression" describe block: (1) foreground exec does not invoke onUpdate after process exits, (2) onUpdate is suppressed after abort signal fires
  • What did NOT change (scope boundary):

    • handleStdout / handleStderr / onSupervisorStdout output accumulation logic — unaffected
    • bash-process-registry.tsmarkExited() and markBackgrounded() remain unchanged
    • supervisor.ts, child.ts adapter lifecycle — not touched
    • pi-agent-core (agent.js, agent-loop.js) — external dependency, not modified; processEvents() contract preserved
    • No changes to PTY-specific code paths (PTY adapter already disposes listeners on exit)
    • Backgrounded session process poll/log output retrieval — disableUpdates only affects onUpdate callback, not output buffers
    • onAbortSignal kill/background logic — preserved exactly, disableUpdates() is additive
    • No CHANGELOG entry (left for maintainers per project convention)

Reproduction

  1. Configure a subagent with an exec tool
  2. Run a command that produces output over time (e.g., tsc --noEmit on a large project, or npm run build)
  3. Have the subagent complete its run while the process is still producing output
  4. Observe gateway crash with: Error: Agent listener invoked outside active run

Alternatively, simulate with a long-running command and abort the agent run mid-execution.

Risk / Mitigation

  • Risk: updatesDisabled could suppress legitimate updates if set too early.
  • Mitigation: In Layer 1, the flag is only set in .then()/.catch() after the process has exited and the promise is settling. In Layer 2, the flag is set when the abort signal fires, meaning the agent run is ending and no consumer exists for tool_execution_update events. No legitimate update path is affected.
  • Risk: disableUpdates() in onAbortSignal fires unconditionally (including for backgrounded sessions that survive abort). Could this suppress legitimate updates?
  • Mitigation: After abort, the agent run is ending — no consumer exists for tool_execution_update events. Backgrounded sessions use process poll/log for output retrieval, which reads from session.tail/session.aggregated (unaffected by updatesDisabled). The flag only suppresses the onUpdate callback path.

Change Type (select all)

  • Bug fix

Scope (select all touched areas)

  • Gateway
  • Agents
  • Exec runtime
  • Exec tool

Linked Issue/PR

Fixes #62520

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/bash-tools.exec-runtime.ts (modified, +28/-1)
  • src/agents/bash-tools.exec.ts (modified, +8/-0)
  • src/agents/bash-tools.test.ts (modified, +45/-0)

Code Example

Background exec process emits stdout
handleStdout() [exec-defaults-*.js:1546]
emitUpdate() [line 1524, opts.onUpdate callback]
Agent.processEvents() [@mariozechner/pi-agent-core/dist/agent.js:388]
        → checks this.activeRun?.abortController.signal
        → activeRun is undefined → throws Error("Agent listener invoked outside active run")
Unhandled promise rejection
Node.js fatal → gateway crash

---

[openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run
 at Agent.processEvents (@mariozechner/pi-agent-core/dist/agent.js:388)
 at ...onUpdate callback
 at emitUpdate (exec-defaults-*.js:1524)
 at handleStdout (exec-defaults-*.js:1546)
 at Object.onSupervisorStdout
 at Socket.<anonymous>
RAW_BUFFERClick to expand / collapse

Summary

When a subagent's exec tool spawns a background process (e.g. tsc --noEmit, npm run build) and the agent run completes, any late stdout/stderr from the still-running background process triggers Agent.processEvents() in pi-agent-core. At that point activeRun is undefined (cleared by finishRun()), so processEvents() throws — this becomes an unhandled promise rejection and crashes the gateway.

Observed: OpenClaw 2026.4.5, macOS (Darwin 25.4.0 arm64), crash at 2026-04-07 20:14:43 GMT+7.


Verified Code Path

Background exec process emits stdout
  → handleStdout() [exec-defaults-*.js:1546]
    → emitUpdate() [line 1524, opts.onUpdate callback]
      → Agent.processEvents() [@mariozechner/pi-agent-core/dist/agent.js:388]
        → checks this.activeRun?.abortController.signal
        → activeRun is undefined → throws Error("Agent listener invoked outside active run")
          → Unhandled promise rejection
            → Node.js fatal → gateway crash

File: node_modules/@mariozechner/pi-agent-core/dist/agent.js line 388


Steps to Reproduce

  1. Spawn a subagent that uses the exec tool to run a background process (e.g. tsc --noEmit or any compile/lint step that may take a few seconds)
  2. The subagent's agent run completes (finishRun() clears activeRun)
  3. The background process is still running and emits stdout output
  4. Late stdout arrives → handleStdoutemitUpdateopts.onUpdate
  5. pi-agent-core throws "Agent listener invoked outside active run"
  6. Unhandled promise rejection → gateway crashes, must be restarted manually

Error Output

[openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run
 at Agent.processEvents (@mariozechner/pi-agent-core/dist/agent.js:388)
 at ...onUpdate callback
 at emitUpdate (exec-defaults-*.js:1524)
 at handleStdout (exec-defaults-*.js:1546)
 at Object.onSupervisorStdout
 at Socket.<anonymous>

Root Cause Analysis

The emitUpdate() guard checks session.backgrounded || session.exited but this does not cover the case where:

  • The exec session is not marked backgrounded (foreground exec mode), and
  • The process is still running (not yet exited), but
  • The parent agent run has already finished (activeRun is undefined)

This is a race between the subprocess lifecycle and the agent run lifecycle, specifically triggered in the subagent + background-spawning exec pattern.


Suggested Fix

Either approach would resolve this:

Option A (in pi-agent-core): In processEvents(), return early (graceful no-op) when this.activeRun is undefined instead of throwing. Late events after run completion are benign and should be silently discarded.

Option B (in OpenClaw exec layer): Wrap opts.onUpdate() calls in a try-catch. On first failure, set an updateSuppressed flag to skip subsequent updates for that exec session. Log a debug/warn message rather than propagating the exception.

Option C (lifecycle fix): Ensure exec sessions are fully drained and session.exited is set before finishRun() clears activeRun. This is the most correct but also most complex fix.


Related Issues

This is a specific variant (background exec in subagent) of a class of bugs that many users are hitting on 2026.4.5:

  • #62477 — Closest match: subprocess stdout after agent run ends (same stack trace, confirmed root cause)
  • #61741 — Race condition in subagent/session cleanup → late child stdout → missing-session-entry, orphaned processes
  • #62256 — Same unhandled promise rejection (labeled regression)
  • #62137 — Same crash in exec/PTY flows
  • #62301 — Same with openai-codex/gpt-5.4 provider
  • #62378 — Gateway crashes when background PTY output arrives after run inactive
  • #61912 — 2026.4.5 regression: crash in exec/PTY flows with same error

Open PRs with Proposed Fixes

  • #62265 — fix(exec): catch onUpdate errors when agent run ends while exec still produces output — try-catch + updateSuppressed flag (Option B above)
  • #62340 — fix(exec): ignore late onUpdate errors after run exit — similar guard approach with regression test

Both PRs are open and appear to implement the correct fix. Merging either (or both) should resolve this crash class.


Environment

  • OpenClaw: 2026.4.5
  • OS: macOS Darwin 25.4.0 (arm64)
  • Node: v25.8.2
  • Model at time of crash: anthropic/claude-sonnet-4-6 (subagent)
  • Trigger: subagent exec tool running background TypeScript compilation (tsc --noEmit)

extent analysis

TL;DR

The most likely fix is to implement a try-catch block in the opts.onUpdate() calls to suppress subsequent updates for an exec session after the first failure, or to modify processEvents() in pi-agent-core to return early when this.activeRun is undefined.

Guidance

  • Implement Option B (in OpenClaw exec layer): Wrap opts.onUpdate() calls in a try-catch block, setting an updateSuppressed flag to skip subsequent updates for that exec session after the first failure.
  • Consider Option A (in pi-agent-core): Modify processEvents() to return early when this.activeRun is undefined, silently discarding late events after run completion.
  • Review the open PRs (#62265 and #62340) that propose fixes for this issue, as merging either of them may resolve the crash.

Example

// Example of try-catch block in opts.onUpdate() (Option B)
opts.onUpdate = (update) => {
  try {
    // existing update handling code
  } catch (error) {
    if (error.message === "Agent listener invoked outside active run") {
      // set updateSuppressed flag to skip subsequent updates
      updateSuppressed = true;
      console.debug("Suppressing subsequent updates for this exec session");
    } else {
      throw error;
    }
  }
};

Notes

The root cause of the issue is a race condition between the subprocess lifecycle and the agent run lifecycle, specifically triggered in the subagent + background-spawning exec pattern. The proposed fixes aim to mitigate this issue by either suppressing late updates or modifying the processEvents() behavior.

Recommendation

Apply Option B (in OpenClaw exec layer), as it is a more straightforward and less invasive fix compared to modifying pi-agent-core. This approach will prevent the unhandled promise rejection and gateway crash, while also providing a clear debug message for late updates.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Gateway crash: background exec output after subagent run completes triggers unhandled rejection in pi-agent-core (exec-after-run race) [3 pull requests, 1 participants]