openclaw - ✅(Solved) Fix Feature request: openclaw update should detect when invoked from inside the gateway process tree (PPID-ancestry guard) [2 pull requests, 3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75691Fetched 2026-05-02 05:31:39
View on GitHub
Comments
3
Participants
4
Timeline
20
Reactions
2
Timeline (top)
mentioned ×6subscribed ×6commented ×3cross-referenced ×3

When openclaw update --tag <ver> --yes is invoked from a process whose parent process tree includes the active gateway PID (e.g., from inside an agent's exec subprocess), the package install and doctor migrations complete successfully, but the supervised-mode restart silently fails to take effect. The binary on disk is the new version while the gateway PID and in-memory code remain the old version, leaving the operator with no signal that anything went wrong.

Error Message

Error: openclaw update detected it is running inside the gateway process tree (gateway PID <X> is an ancestor of this process). The supervised-mode restart cannot fire from this context. Run openclaw update from an external shell (SSH, cron, or other detached session). To install without restart from this context, pass --no-restart and manually invoke sudo systemctl restart <unit> from outside.

Root Cause

In dist/cli/gateway-lifecycle.runtime.js::restartGatewayProcessWithFreshPid(), when systemd is detected as the supervisor, the function returns { mode: "supervised" } and the caller is expected to exit. The architectural assumption is that the caller IS the gateway process — exiting triggers systemd's Restart=always to spawn the fresh process.

When the caller is a CHILD of the gateway (an agent's exec subprocess), child exit doesn't propagate to the gateway parent, systemd sees the gateway alive, and no restart fires.

Fix Action

Fix / Workaround

Workaround in place

PR fix notes

PR #75729: fix(cli): block package updates from inside running gateway service

Description (problem / solution / changelog)

Summary

  • Problem: Running openclaw update from inside a managed gateway service process could replace the active OpenClaw dist tree while the live gateway lazy-loads old chunks, leading to runtime errors or inconsistent state.
  • Why it matters: Package updates from within the gateway service are unsafe because the update mutates files that the running process may still reference.
  • What changed:
    • Added detection for when the current process is running inside a gateway service (isRunningInsideGatewayService).
    • Added shouldBlockPackageUpdateFromGatewayServiceEnv to block package updates if the gateway is still running (with conservative fallback when service state cannot be inspected).
    • Extended PrePackageServiceStop to track inspected and running state for accurate guard decisions.
    • Added stripGatewayServiceMarkerEnv to prevent gateway service markers from leaking into the post-core update child process.
    • Added tests covering all three scenarios: blocking when running, allowing when not running, and stripping env markers.
  • What did NOT change: Git-based updates are unaffected. The block only applies to package manager updates (npm/pnpm/bun).

Change Type

  • Bug fix

Scope

  • Gateway / orchestration
  • UI / DX

Linked Issue/PR

  • N/A (discovered during internal review)

Root Cause

  • Root cause: The update command did not distinguish between running inside a gateway service process versus a normal shell. Package updates from within the service process mutate the dist tree in-place, which races with lazy chunk loading.
  • Missing detection / guardrail: No check for OPENCLAW_SERVICE_MARKER / OPENCLAW_SERVICE_KIND env vars before allowing package replacement.

Regression Test Plan

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: src/cli/update-cli.test.ts
  • Scenario the test should lock in: Package update from inside gateway service with --no-restart must be blocked; package update when gateway is not running must be allowed.
  • Why this is the smallest reliable guardrail: The guard is a conditional early-exit in the update command flow; unit tests with mocked service state are the cheapest way to verify all branch combinations.

User-visible / Behavior Changes

  • openclaw update now exits with an error and a helpful message when run from inside a running gateway service process (unless the service is stopped first or --no-restart is used with the service already stopped).

Security Impact

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: Linux
  • Runtime: Node 22

Steps

  1. Start the gateway service (openclaw gateway start).
  2. From within the service context (or with OPENCLAW_SERVICE_MARKER=openclaw OPENCLAW_SERVICE_KIND=gateway), run openclaw update.

Expected

  • Update is blocked with a clear error message explaining the conflict.

Actual

  • Update proceeds, potentially corrupting the running gateway's loaded modules.

Human Verification

  • Verified scenarios:
    • All 70 existing update-cli.test.ts tests pass.
    • New tests cover blocking when running, allowing when stopped, and env stripping.
  • Edge cases checked:
    • Service state inspection failure → conservative block.
    • Service not installed → allow (not inside a managed service).
    • Post-core update process strips markers so child does not inherit gateway identity.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (No)
  • Migration needed? (No)

Risks and Mitigations

  • Risk: Overly aggressive blocking when service state cannot be read.
    • Mitigation: This is intentional conservative behavior; users can run the update from a normal shell or stop the service first.

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/cli/update-cli.test.ts (modified, +134/-3)
  • src/cli/update-cli/update-command.ts (modified, +83/-18)

PR #75819: fix(cli): block gateway-owned package updates

Description (problem / solution / changelog)

Summary

  • Closes #75691 by blocking package-manager updates when openclaw update is running inside the active gateway process tree.
  • Reuses the existing bounded PID ancestry walker from stale-gateway cleanup to compare the managed gateway runtime PID with the caller ancestry.
  • Returns a clear operator-facing error before stopping the managed service or mutating the installed package tree.
  • Keeps ordinary external-shell package updates working, including the existing managed-service stop/restart path.

Validation

  • pnpm exec oxfmt --check --threads=1 src/cli/update-cli/update-command.ts src/cli/update-cli.test.ts src/infra/restart-stale-pids.ts
  • git diff --check
  • pnpm test src/cli/update-cli.test.ts src/infra/restart-stale-pids.test.ts -- --reporter=verbose
  • Live VPS smoke: simulated gateway service context with OPENCLAW_SERVICE_MARKER=openclaw OPENCLAW_SERVICE_KIND=gateway pnpm openclaw update --channel stable --tag latest --yes and confirmed the update aborts before package install/service stop.

Changed files

  • src/cli/update-cli.test.ts (modified, +26/-0)
  • src/cli/update-cli/update-command.ts (modified, +38/-0)
  • src/infra/restart-stale-pids.ts (modified, +1/-1)

Code Example

openclaw update --tag <newer-version> --yes

---

Error: openclaw update detected it is running inside the gateway process tree
(gateway PID <X> is an ancestor of this process). The supervised-mode restart
cannot fire from this context. Run `openclaw update` from an external shell
(SSH, cron, or other detached session). To install without restart from this
context, pass --no-restart and manually invoke `sudo systemctl restart <unit>`
from outside.
RAW_BUFFERClick to expand / collapse

Summary

When openclaw update --tag <ver> --yes is invoked from a process whose parent process tree includes the active gateway PID (e.g., from inside an agent's exec subprocess), the package install and doctor migrations complete successfully, but the supervised-mode restart silently fails to take effect. The binary on disk is the new version while the gateway PID and in-memory code remain the old version, leaving the operator with no signal that anything went wrong.

Reproduction

  1. Run OpenClaw under systemd (system-level unit at /etc/systemd/system/openclaw.service, Restart=always)
  2. From inside an agent's exec tool (gateway is the agent's parent process), run:
    openclaw update --tag <newer-version> --yes
  3. Observe: install logs report success, doctor migrations apply, command exits 0.
  4. Verify in-memory state — the gateway PID is unchanged. The new code is on disk but not loaded.

Root cause

In dist/cli/gateway-lifecycle.runtime.js::restartGatewayProcessWithFreshPid(), when systemd is detected as the supervisor, the function returns { mode: "supervised" } and the caller is expected to exit. The architectural assumption is that the caller IS the gateway process — exiting triggers systemd's Restart=always to spawn the fresh process.

When the caller is a CHILD of the gateway (an agent's exec subprocess), child exit doesn't propagate to the gateway parent, systemd sees the gateway alive, and no restart fires.

Proposal

Make openclaw update walk its own ppid chain at startup. If the active gateway PID appears in the caller's ancestor PIDs:

Option A (defensive, smallest change): Refuse to perform the restart phase. Surface a clear error:

Error: openclaw update detected it is running inside the gateway process tree
(gateway PID <X> is an ancestor of this process). The supervised-mode restart
cannot fire from this context. Run `openclaw update` from an external shell
(SSH, cron, or other detached session). To install without restart from this
context, pass --no-restart and manually invoke `sudo systemctl restart <unit>`
from outside.

Option B (helpful, slightly larger change): Auto-defer the restart via a detached helper. Spawn nohup bash -c 'sleep 2 && systemctl restart <detected-unit>' &, then exit. Document that this only works if the active unit is system-level and the user has appropriate sudo.

Either solves the silent-failure mode. Option A is simpler and more conservative.

Why this matters

Operators running OpenClaw under any agent-orchestration pattern (HZL, Lobster, COO agents, etc.) will frequently invoke updates from inside the gateway-managed lane. The current behavior is a footgun: the update appears to succeed but the gateway keeps running old code. Without external monitoring, this can persist for hours or days.

Empirical evidence

From a 4.29 cutover at 2026-05-01 04:16-04:21 UTC:

  • An agent (Trent) ran openclaw update --tag 2026.4.29 --yes from its gateway-owned exec lane
  • Binary on disk: 2026.4.29 (openclaw --version confirmed)
  • systemctl show openclaw -p MainPID --value: 598350 (the OLD 4.27 process, unchanged)
  • Recovery: sudo systemctl restart openclaw.service from a fresh subprocess → MainPID changed to 605944

Same pattern observed on 4.26 cutover ~2026-04-28.

Workaround in place

Operating SOP "openclaw-update-native-v1" v3 now mandates external-lane invocation as primary, with in-lane B2a/B2b split (--no-restart install + external sudo systemctl restart openclaw.service) as fallback only.

Sister filing

Companion to a separate request for openclaw update to support system-level systemd units explicitly (currently the detached restart-script uses systemctl --user restart which doesn't control system-level units).

extent analysis

TL;DR

To fix the supervised-mode restart failure when running openclaw update from inside an agent's exec subprocess, modify the openclaw update command to detect if it's running inside the gateway process tree and either refuse to perform the restart phase or auto-defer the restart via a detached helper.

Guidance

  • Modify openclaw update to walk its own ppid chain at startup and check if the active gateway PID appears in the caller's ancestor PIDs.
  • If the gateway PID is found, either:
    • Refuse to perform the restart phase and surface a clear error message (Option A).
    • Auto-defer the restart via a detached helper, such as spawning nohup bash -c 'sleep 2 && systemctl restart <detected-unit>' & (Option B).
  • Document the need to run openclaw update from an external shell (SSH, cron, or other detached session) for the supervised-mode restart to work correctly.

Example

// dist/cli/gateway-lifecycle.runtime.js
function restartGatewayProcessWithFreshPid() {
  // ...
  if (isRunningInsideGatewayProcessTree()) {
    // Option A: Refuse to perform the restart phase
    throw new Error('openclaw update detected it is running inside the gateway process tree');
    // Option B: Auto-defer the restart via a detached helper
    // spawnDetachedRestartHelper();
  }
  // ...
}

function isRunningInsideGatewayProcessTree() {
  // Implement ppid chain walking and gateway PID detection
}

Notes

The chosen solution should consider the trade-off between simplicity and helpfulness. Option A is more conservative, while Option B provides a more seamless experience but requires additional documentation and potential sudo permissions.

Recommendation

Apply Option A (defensive, smallest change) to refuse the restart phase when running inside the gateway process tree, as

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix Feature request: openclaw update should detect when invoked from inside the gateway process tree (PPID-ancestry guard) [2 pull requests, 3 comments, 4 participants]