openclaw - 💡(How to fix) Fix Bug: update.run SIGUSR1 restart can be ignored, then future gateway.restart coalesces as already in-flight [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

During an attended gateway.update.run from OpenClaw 2026.5.6 to 2026.5.7, the package swap succeeded but the live gateway did not restart into the new runtime. The first SIGUSR1 emitted by the update runner was ignored as unauthorized, and a subsequent first-class gateway.restart request was coalesced as already in-flight, leaving the gateway restart state stuck.

This appears to be a restart authorization/state-machine bug: an ignored SIGUSR1 can leave an unconsumed in-flight restart token, causing later valid restart requests to be dropped/coalesced indefinitely.

Error Message

Gateway error log:

Root Cause

Suspected root cause

Fix Action

Fixed

Code Example

{
  "commands.restart": true,
  "gateway.reload": {
    "mode": "hot"
  }
}

---

OpenClaw 2026.5.6 (c97b9f7)
openclaw update status --json: registry latestVersion=2026.5.7, availability.available=true
openclaw config validate: Config valid
openclaw update --dry-run: planned package update, plugin sync, completion refresh, gateway restart, doctor checks

---

before.version=2026.5.6
after.version=2026.5.7
global update exitCode=0
global install swap exitCode=0 stdoutTail="replaced openclaw"
openclaw doctor --non-interactive --fix exitCode=0

---

2026-05-09T07:51:30.426+08:00 [gateway] update.run completed actor=gateway-client device=unknown-device ip=unknown-ip conn=73e4998e-2c1e-470c-897e-b2878899dc3b changedPaths=<n/a> restartReason=update.run status=ok
2026-05-09T07:51:30.426+08:00 [ws] ⇄ res ✓ update.run 64019ms conn=73e4998e…dc3b id=8763310c…0c1c
2026-05-09T07:51:30.429+08:00 [gateway] signal SIGUSR1 received

---

2026-05-09T07:51:30.566+08:00 [gateway] SIGUSR1 restart ignored (not authorized; commands.restart=false or use gateway tool).

---

{
  "runtimeVersion": "2026.5.7",
  "gateway_reachable": true,
  "gateway_self_version": "2026.5.6",
  "gateway_pid": 5309,
  "gateway_state": "active"
}

---

2026-05-09T07:52:27.891+08:00 [gateway-tool] gateway tool: restart requested (delayMs=default, reason=Post-update verification found CLI/package at 2026.5.7 but live gateway self.version still 2026.5.6 on old pid 5309; controlled restart needed to load upgraded runtime.)
2026-05-09T07:52:27.893+08:00 [restart] request coalesced (already in-flight) reason=Post-update verification found CLI/package at 2026.5.7 but live gateway self.version still 2026.5.6 on old pid 5309; controlled restart needed to load upgraded runtime. actor=<unknown>
RAW_BUFFERClick to expand / collapse

Summary

During an attended gateway.update.run from OpenClaw 2026.5.6 to 2026.5.7, the package swap succeeded but the live gateway did not restart into the new runtime. The first SIGUSR1 emitted by the update runner was ignored as unauthorized, and a subsequent first-class gateway.restart request was coalesced as already in-flight, leaving the gateway restart state stuck.

This appears to be a restart authorization/state-machine bug: an ignored SIGUSR1 can leave an unconsumed in-flight restart token, causing later valid restart requests to be dropped/coalesced indefinitely.

Environment

  • Host: macOS Darwin 25.4.0 arm64
  • Node: v25.9.0
  • Install kind: package/global npm under /opt/homebrew/lib/node_modules/openclaw
  • Upgrade path: 2026.5.6 (c97b9f7) -> 2026.5.7 (eeef486)
  • Gateway port: 127.0.0.1:18789
  • Config has commands.restart=true and gateway.reload.mode="hot"

Relevant sanitized config excerpt:

{
  "commands.restart": true,
  "gateway.reload": {
    "mode": "hot"
  }
}

Timeline (Asia/Shanghai, UTC+8)

Preflight succeeded:

OpenClaw 2026.5.6 (c97b9f7)
openclaw update status --json: registry latestVersion=2026.5.7, availability.available=true
openclaw config validate: Config valid
openclaw update --dry-run: planned package update, plugin sync, completion refresh, gateway restart, doctor checks

gateway.update.run result succeeded and package was swapped:

before.version=2026.5.6
after.version=2026.5.7
global update exitCode=0
global install swap exitCode=0 stdoutTail="replaced openclaw"
openclaw doctor --non-interactive --fix exitCode=0

Gateway logs:

2026-05-09T07:51:30.426+08:00 [gateway] update.run completed actor=gateway-client device=unknown-device ip=unknown-ip conn=73e4998e-2c1e-470c-897e-b2878899dc3b changedPaths=<n/a> restartReason=update.run status=ok
2026-05-09T07:51:30.426+08:00 [ws] ⇄ res ✓ update.run 64019ms conn=73e4998e…dc3b id=8763310c…0c1c
2026-05-09T07:51:30.429+08:00 [gateway] signal SIGUSR1 received

Gateway error log:

2026-05-09T07:51:30.566+08:00 [gateway] SIGUSR1 restart ignored (not authorized; commands.restart=false or use gateway tool).

Post-update verification showed mixed runtime/package state:

{
  "runtimeVersion": "2026.5.7",
  "gateway_reachable": true,
  "gateway_self_version": "2026.5.6",
  "gateway_pid": 5309,
  "gateway_state": "active"
}

A subsequent first-class gateway restart request was issued via the OpenClaw gateway tool, not raw CLI:

2026-05-09T07:52:27.891+08:00 [gateway-tool] gateway tool: restart requested (delayMs=default, reason=Post-update verification found CLI/package at 2026.5.7 but live gateway self.version still 2026.5.6 on old pid 5309; controlled restart needed to load upgraded runtime.)
2026-05-09T07:52:27.893+08:00 [restart] request coalesced (already in-flight) reason=Post-update verification found CLI/package at 2026.5.7 but live gateway self.version still 2026.5.6 on old pid 5309; controlled restart needed to load upgraded runtime. actor=<unknown>

At that point the package/CLI was 2026.5.7, but the live gateway process remained the old pid/self version.

Expected behavior

After a successful gateway.update.run package swap:

  1. The gateway restart should be authorized and consumed by the running gateway process; or
  2. If the emitted restart signal is ignored/rejected, the restart state should roll back/clear so a later valid gateway.restart can retry; and
  3. The live gateway should eventually report gateway.self.version=2026.5.7 without requiring an external/manual restart.

Actual behavior

  • The update-runner restart SIGUSR1 was ignored as unauthorized.
  • The ignored signal appears to have left restart state marked as in-flight.
  • Later valid first-class restart requests were coalesced and did not retry.
  • The live gateway stayed on the old runtime (gateway.self.version=2026.5.6, pid 5309) while package/CLI status showed 2026.5.7.

Suspected root cause

Based on the current dist code paths:

  • emitGatewayRestart() marks an emitted restart token and authorizes SIGUSR1 before emitting/killing SIGUSR1.
  • run-loop SIGUSR1 handler later calls consumeGatewaySigusr1RestartAuthorization() and only markGatewaySigusr1RestartHandled() after authorization is consumed.
  • In this case, the handler logged SIGUSR1 restart ignored (not authorized...), but the restart module still treated a restart as in-flight, causing scheduleGatewaySigusr1Restart() to coalesce future restart requests.

So either:

  • the authorization state is not shared/visible across the update-runner emission and the live SIGUSR1 handler in this upgrade window, or
  • the handler/restart module does not roll back/clear the emitted token when a SIGUSR1 is ignored.

The misleading commands.restart=false part of the log also obscures diagnosis when config actually has commands.restart=true.

Impact

An attended upgrade can end in a split-brain-ish state:

  • installed package/CLI reports the new version,
  • live gateway keeps running the old code,
  • subsequent first-class restart attempts are dropped as already in-flight,
  • operator must perform an external/supervisor restart to recover.

Suggested fix

  • Ensure update-runner initiated SIGUSR1 restart authorization is consumed by the live gateway handler after package swap.
  • If a SIGUSR1 restart is ignored as unauthorized, clear/roll back the in-flight restart token so future gateway.restart calls can retry.
  • Consider splitting the log message into accurate branches: not authorized vs commands.restart=false vs use gateway tool.
  • Add a post-update self-version check or retry path: if after.version != live gateway.self.version, surface restart failure explicitly instead of returning status=ok.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

After a successful gateway.update.run package swap:

  1. The gateway restart should be authorized and consumed by the running gateway process; or
  2. If the emitted restart signal is ignored/rejected, the restart state should roll back/clear so a later valid gateway.restart can retry; and
  3. The live gateway should eventually report gateway.self.version=2026.5.7 without requiring an external/manual restart.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Bug: update.run SIGUSR1 restart can be ignored, then future gateway.restart coalesces as already in-flight [1 pull requests]