openclaw - 💡(How to fix) Fix Gateway: unhandled rejection from pi-agent-core Agent.processEvents after run abort corrupts in-memory run state [1 participants]

Official PRs (…)
ON THIS PAGE

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#63220Fetched 2026-04-09 07:56:46
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Participants
Timeline (top)
cross-referenced ×1

When a model request times out or is aborted mid-stream, the embedded @mariozechner/pi-agent-core library can fire listener callbacks after the run has already been closed. The check inside Agent.processEvents (agent.js:388) raises Error: Agent listener invoked outside active run, which surfaces as an unhandled promise rejection in the gateway. This corrupts the gateway's in-memory run state and cascades failures down any remaining fallback chain — even models that would otherwise succeed.

We hit this repeatedly during a 24-hour cascade incident on 2026-04-06/07; the gateway only fully recovered after a restart.

Error Message

[2026-04-07T07:31:36.765-05:00] [openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run at Agent.processEvents (file:///opt/homebrew/lib/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/dist/agent.js:388:19)

Root Cause

When a model request times out or is aborted mid-stream, the embedded @mariozechner/pi-agent-core library can fire listener callbacks after the run has already been closed. The check inside Agent.processEvents (agent.js:388) raises Error: Agent listener invoked outside active run, which surfaces as an unhandled promise rejection in the gateway. This corrupts the gateway's in-memory run state and cascades failures down any remaining fallback chain — even models that would otherwise succeed.

We hit this repeatedly during a 24-hour cascade incident on 2026-04-06/07; the gateway only fully recovered after a restart.

Fix Action

Fix / Workaround

Workaround we are using

  • Manual gateway restart when the cascade is detected (clears in-memory corruption).
  • A plugin-level "recovery clock" in our fallback-router that forces chainIndex = 0 after 15 min of being stuck on a fallback (helps the next run recover even if the current one is corrupted).

Code Example

[2026-04-07T07:31:36.765-05:00] [openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run
    at Agent.processEvents (file:///opt/homebrew/lib/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/dist/agent.js:388:19)

---

2026-04-07T08:04:38.138-05:00 Embedded agent failed before reply: All models failed (2):
  openai-codex/gpt-5.4: LLM error api_error: Internal server error (timeout)
  zai/glm-5.1: LLM error api_error: Internal server error (timeout)
RAW_BUFFERClick to expand / collapse

Upstream Bug 1 — pi-agent-core lifecycle race: "Agent listener invoked outside active run"

Repo: github.com/openclaw/openclaw Suggested labels: bug, gateway, pi-agent-core, stability OpenClaw version: 2026.4.5 Node: 22.22.1 OS: macOS (Apple Silicon)


Title

Gateway: unhandled promise rejection from pi-agent-core Agent.processEvents after run abort/timeout corrupts in-memory run state

Summary

When a model request times out or is aborted mid-stream, the embedded @mariozechner/pi-agent-core library can fire listener callbacks after the run has already been closed. The check inside Agent.processEvents (agent.js:388) raises Error: Agent listener invoked outside active run, which surfaces as an unhandled promise rejection in the gateway. This corrupts the gateway's in-memory run state and cascades failures down any remaining fallback chain — even models that would otherwise succeed.

We hit this repeatedly during a 24-hour cascade incident on 2026-04-06/07; the gateway only fully recovered after a restart.

Environment

  • [email protected]
  • Embedded dep: @mariozechner/pi-agent-core (located at /opt/homebrew/lib/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/dist/agent.js)
  • macOS 14, Node 22.22.1
  • LaunchAgent-managed gateway daemon

Reproduction

  1. Configure an agent with a multi-tier fallback chain (e.g. anthropic/claude-opus-4-6openai-codex/gpt-5.4zai/glm-5.1).
  2. Run a heavy-context turn that triggers compaction (~92% prompt usage works reliably).
  3. Force the primary model to fail or stall (we hit this naturally with a 60s idle timeout under load).
  4. Observe the gateway log: a fallback handoff fires, and shortly after — sometimes after the next model has already started — the late listener event from the aborted primary run fires.
  5. Result: unhandled rejection bubbles up, in-memory run state is corrupted, and subsequent models in the same chain fail spuriously.

We saw this 5+ times in a 24-hour window, always immediately after timeout/abort sequences during compaction or handoff transitions.

Stack trace (from ~/.openclaw/logs/gateway.err.log)

[2026-04-07T07:31:36.765-05:00] [openclaw] Unhandled promise rejection: Error: Agent listener invoked outside active run
    at Agent.processEvents (file:///opt/homebrew/lib/node_modules/openclaw/node_modules/@mariozechner/pi-agent-core/dist/agent.js:388:19)

Identical stack appeared at:

  • 2026-04-06 14:36 CT
  • 2026-04-06 15:20 CT
  • 2026-04-06 15:43 CT
  • 2026-04-06 15:55 CT
  • 2026-04-06 16:06 CT
  • 2026-04-07 07:31 CT …etc.

Expected behavior

Late listener events fired after a run is closed should be dropped silently (or logged at debug level) — they should not throw an unhandled rejection that crashes/corrupts the outer gateway run state.

Actual behavior

  • Unhandled rejection bubbles up to the gateway process
  • Subsequent fallback attempts in the same chain fail (we believe due to corrupted in-memory run state — the symptom is "all models failed" even when the next chain entry would normally be healthy)
  • Only a full gateway restart clears it

Cascade evidence

After the lifecycle race fires once, we routinely see:

2026-04-07T08:04:38.138-05:00 Embedded agent failed before reply: All models failed (2):
  openai-codex/gpt-5.4: LLM error api_error: Internal server error (timeout)
  zai/glm-5.1: LLM error api_error: Internal server error (timeout)

…even when the underlying providers are independently healthy when curled directly. Gateway restart immediately restores the chain.

Suggested fix

In pi-agent-core's Agent.processEvents():

  1. Add a guard at the top of the function: if the run is already closed, return early instead of throwing.
  2. Or, attach an 'error' handler in the gateway that drops these specific errors at the run-controller level.

A defense-in-depth option in OpenClaw itself: register a process.on('unhandledRejection') handler that logs but does not propagate Agent listener invoked outside active run errors, since they are known-safe to ignore.

Workaround we are using

  • Manual gateway restart when the cascade is detected (clears in-memory corruption).
  • A plugin-level "recovery clock" in our fallback-router that forces chainIndex = 0 after 15 min of being stuck on a fallback (helps the next run recover even if the current one is corrupted).

Related

  • Sister issue: gateway session-resume returns modelApplied: true even when the actual inference runs on a stale resumed model (filed separately).
  • Full incident report: ~/.openclaw/workspace/output/post-restart-fallback-cascade-incident-report.md (local; happy to share excerpts on request).

extent analysis

TL;DR

Implement a guard in pi-agent-core's Agent.processEvents() to return early when the run is already closed, or attach an error handler in the gateway to drop specific errors.

Guidance

  • Review the Agent.processEvents() function in pi-agent-core to understand the current error handling mechanism.
  • Consider adding a guard at the top of Agent.processEvents() to check if the run is already closed, and return early if so.
  • Alternatively, attach an 'error' handler in the gateway to catch and drop Agent listener invoked outside active run errors.
  • Evaluate the defense-in-depth option of registering a process.on('unhandledRejection') handler in OpenClaw to log but not propagate these specific errors.

Example

// Example guard in Agent.processEvents()
if (!this.isActiveRun) {
  return; // Return early if run is already closed
}

Notes

The suggested fix and workaround are based on the provided issue description and may require further testing and validation to ensure correctness.

Recommendation

Apply the suggested fix by adding a guard in pi-agent-core's Agent.processEvents() to return early when the run is already closed, as this addresses the root cause of the issue and prevents unhandled promise rejections.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Late listener events fired after a run is closed should be dropped silently (or logged at debug level) — they should not throw an unhandled rejection that crashes/corrupts the outer gateway run state.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING