openclaw - 💡(How to fix) Fix [Bug]:[Bug] preserveGatewayHookRunner causes handler stacking + N-fold delivery after subagent hot-reload cycles (2026.5.22 regression)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

OpenClaw session .jsonl files use a truncate-first write pattern. If SIGTERM arrives mid-write during rapid restart sequences (such as auto-updates), the file is left truncated and unrecoverable on next load.

Error Message

  1. On next OpenClaw load, session reports a JSON parse error — last entry truncated mid-JSON.

Root Cause

Current pattern: truncate file → write content → (SIGTERM here) → partial unrecoverable file

Correct pattern: write to .tmp → fsync → rename() to target

rename() is atomic on POSIX same-filesystem. A crash during write-to-tmp leaves the previous file intact.

Fix Action

Workaround

Manual recovery only: identify corrupted .jsonl by JSON parse failure on load, delete it, let agent restart with a fresh session. No data recovery path — partial write is unrecoverable.


Environment

  • macOS 26.3
  • OpenClaw 2026.5.22 (regression vs. 2026.5.19)
  • Node.js runtime
  • Plugin: api.on('before_prompt_build', ...)
  • Trigger: overnight auto-update to 2026.5.22 → LaunchAgent restart cascade → rapid plugin re-registration cycles

Code Example

Root cause identified in loader-CZB9kQVT.js (~line 4949):

  const preserveGatewayHookRunner = runtimeSubagentMode === "default"
    && getActivePluginRuntimeSubagentMode() === "gateway-bindable"
    && getGlobalHookRunner() !== null;

  if (!preserveGatewayHookRunner) initializeGlobalHookRunner(registry);

After subagent completes, all three conditions are true → initializeGlobalHookRunner skipped → handlers stack.

Customer impact: 4-6x duplicate message delivery confirmed on connected mobile clients after overnight restart cascade on 2026.5.22. Two agent session .jsonl files corrupted by mid-write SIGTERM. Both agents required manual session reset.

Workaround applied: pinned to 2026.5.19, added globalThis[BEFORE_PROMPT_KEY] teardown guard in plugin.

Related: #72536, #72176, #25859, #12571
RAW_BUFFERClick to expand / collapse

Bug type

Regression (worked before, now fails)

Beta release blocker

No

Summary

Bug 1 — preserveGatewayHookRunner incorrectly suppresses initializeGlobalHookRunner causing handler stacking on hot-reload cycles

Severity: Critical OC version where regression introduced: 2026.5.22 OC version where behavior is correct: 2026.5.19

Summary

In loader-CZB9kQVT.js, the preserveGatewayHookRunner condition introduced in 2026.5.22 incorrectly evaluates to true after subagent hot-reload cycles, causing initializeGlobalHookRunner(registry) to be skipped. Each restart cycle stacks another handler rather than replacing it. After N restarts, every message delivers N times.

Steps to Reproduce

  1. Run OpenClaw 2026.5.22 with a plugin that registers a before_prompt_build handler via api.on().
  2. Trigger a subagent (causes a hot-reload cycle).
  3. Restart the gateway or trigger another plugin registration cycle.
  4. Observe: before_prompt_build fires N times per prompt build, where N = number of restart cycles since last clean boot.

Root Cause

In loader-CZB9kQVT.js (~line 4949):

const preserveGatewayHookRunner = runtimeSubagentMode === "default" && getActivePluginRuntimeSubagentMode() === "gateway-bindable" && getGlobalHookRunner() !== null;

if (!preserveGatewayHookRunner) initializeGlobalHookRunner(registry);

After a subagent completes, all three conditions evaluate to true → preserveGatewayHookRunner = true → initializeGlobalHookRunner is skipped. The activePluginHookRegistrations map accumulates handler registrations. Every api.on() call adds a handler without removing the previous one.

Impact

  • Every plugin using api.on('before_prompt_build', ...) stacks handlers on restart
  • After N restart cycles, every prompt triggers N context injection passes
  • Subagent announce delivery fires N times — N duplicate messages delivered to clients

Customer-Facing Impact

After auto-update to 2026.5.22 triggered a rapid restart sequence overnight, our relay delivered 4–6 duplicate copies of every agent message to connected mobile clients. Invisible server-side — only manifested on end-user devices. We reverted to 2026.5.19 and added a plugin-side workaround guard.

Related known issues: #72536 (duplicate WebChat messages, 100% reproducible macOS arm64), #72176 (all-channel duplicate delivery), #25859 (hook registration accumulates on reconnect), #12571 (isolated sessions bleed into main session).

Expected Behavior

initializeGlobalHookRunner(registry) should run on every plugin registration cycle after a subagent completes. The preserveGatewayHookRunner guard should only apply during an active subagent transition, not after runtimeSubagentMode settles back to "default".

Workaround

Pinned to 2026.5.19. Added a globalThis[BEFORE_PROMPT_KEY] guard to our plugin that manually tears down the previous before_prompt_build handler before each registration.

Bug 2 — Agent session .jsonl files are not written atomically, causing corruption on rapid restart

Severity: Medium

Summary

OpenClaw session .jsonl files use a truncate-first write pattern. If SIGTERM arrives mid-write during rapid restart sequences (such as auto-updates), the file is left truncated and unrecoverable on next load.

Steps to Reproduce

  1. Run an agent with an active session.
  2. Trigger rapid restarts during a write-heavy phase (e.g. long subagent completion flushing to session file).
  3. On next OpenClaw load, session reports a JSON parse error — last entry truncated mid-JSON.

Root Cause

Current pattern: truncate file → write content → (SIGTERM here) → partial unrecoverable file

Correct pattern: write to .tmp → fsync → rename() to target

rename() is atomic on POSIX same-filesystem. A crash during write-to-tmp leaves the previous file intact.

Customer-Facing Impact

Two named agents corrupted during the 2026.5.22 auto-update restart cascade. Both required manual session reset and loss of in-progress context. Any OpenClaw update that triggers restarts is a potential data-loss event for production operators.

Workaround

Manual recovery only: identify corrupted .jsonl by JSON parse failure on load, delete it, let agent restart with a fresh session. No data recovery path — partial write is unrecoverable.


Environment

  • macOS 26.3
  • OpenClaw 2026.5.22 (regression vs. 2026.5.19)
  • Node.js runtime
  • Plugin: api.on('before_prompt_build', ...)
  • Trigger: overnight auto-update to 2026.5.22 → LaunchAgent restart cascade → rapid plugin re-registration cycles

Steps to reproduce

  1. Run OpenClaw 2026.5.22 with a plugin that registers a before_prompt_build handler via api.on().
  2. Trigger a subagent (causes a hot-reload cycle — runtimeSubagentMode transitions through states).
  3. Restart the gateway or trigger another plugin registration cycle.
  4. Observe: before_prompt_build fires N times per prompt build, where N = number of restart cycles since last clean boot. Connected clients receive N duplicate messages.

Confirmed trigger: overnight auto-update to 2026.5.22 while host was idle → LaunchAgent restart cascade → rapid plugin re-registration cycles.

Regression: worked correctly on 2026.5.19.

Expected behavior

initializeGlobalHookRunner(registry) should run on every plugin registration cycle after a subagent has completed, resetting the hook runner to the new registry. The preserveGatewayHookRunner guard should only apply during an active subagent transition, not after runtimeSubagentMode settles back to "default". Handlers should replace, not stack.

Related: #72536, #72176, #25859, #12571

Actual behavior

After upgrading to 2026.5.22 and triggering subagents overnight, the before_prompt_build handler stacked on each restart cycle. Connected mobile clients received 4–6 duplicate copies of every agent message. Two named agent sessions (.jsonl files) were corrupted by mid-write SIGTERM during the restart cascade and required manual recovery. The duplication was invisible in server-side logs — only visible on end-user devices.

OpenClaw version

2026.5.22 (regression introduced) — confirmed working on 2026.5.19

Operating system

macOS 26.3 arm64 (Apple Silicon)

Install method

npm global

Model

anthropic/claude-sonnet-4-6

Provider / routing chain

openclaw -> anthropic

Additional provider/model setup details

Bug is not model-specific. The handler stacking occurs in the OpenClaw plugin system regardless of which model is in use. Plugin registers api.on('before_prompt_build', ...) — this handler stacks on each gateway restart cycle due to the preserveGatewayHookRunner regression.

Logs, screenshots, and evidence

Root cause identified in loader-CZB9kQVT.js (~line 4949):

  const preserveGatewayHookRunner = runtimeSubagentMode === "default"
    && getActivePluginRuntimeSubagentMode() === "gateway-bindable"
    && getGlobalHookRunner() !== null;

  if (!preserveGatewayHookRunner) initializeGlobalHookRunner(registry);

After subagent completes, all three conditions are true → initializeGlobalHookRunner skipped → handlers stack.

Customer impact: 4-6x duplicate message delivery confirmed on connected mobile clients after overnight restart cascade on 2026.5.22. Two agent session .jsonl files corrupted by mid-write SIGTERM. Both agents required manual session reset.

Workaround applied: pinned to 2026.5.19, added globalThis[BEFORE_PROMPT_KEY] teardown guard in plugin.

Related: #72536, #72176, #25859, #12571

Impact and severity

Affected: All operators running plugins with api.on('before_prompt_build', ...) on 2026.5.22 Severity: Critical (customer-facing duplicate message delivery + session data loss) Frequency: Always — every gateway restart cycle after a subagent has run Consequence: N-fold duplicate message delivery to end-user devices, agent session file corruption requiring manual recovery, forced rollback to prior version

Additional information

Last known good version: 2026.5.19 First known bad version: 2026.5.22

Two separate bugs in this report:

  1. preserveGatewayHookRunner handler stacking (Critical) — causes duplicate delivery
  2. Non-atomic session .jsonl writes (Medium) — causes session corruption on SIGTERM

Both triggered by the same root event: 2026.5.22 auto-update → LaunchAgent restart cascade. Workaround for Bug 1: globalThis teardown guard in plugin + pinned to 2026.5.19. Workaround for Bug 2: atomic write implemented in our own gateway bridge layer.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

initializeGlobalHookRunner(registry) should run on every plugin registration cycle after a subagent has completed, resetting the hook runner to the new registry. The preserveGatewayHookRunner guard should only apply during an active subagent transition, not after runtimeSubagentMode settles back to "default". Handlers should replace, not stack.

Related: #72536, #72176, #25859, #12571

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]:[Bug] preserveGatewayHookRunner causes handler stacking + N-fold delivery after subagent hot-reload cycles (2026.5.22 regression)