openclaw - 💡(How to fix) Fix [Feature]: Operator-grant gate for auto-upgrades

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Add a configuration option that requires explicit operator approval before OpenClaw applies any platform upgrade, instead of upgrades being applied automatically without a human gate.

Error Message

Regression #1 (still open): Telegram forum thread dispatch fails to deliver messages to forum threads, falling back to DM-only behavior with no surface error. Tracked internally as SL-057. Workaround in place but architecturally unsatisfying. No fix in v2026.5.7. Regression #2 (under investigation): At-kind cron jobs silently disappear from the scheduler with no error logged. Six instances observed and documented over the past week, all sharing similar registration patterns. Forensics ticket open at P1. v2026.5.7 ships a relevant fix (cron CLI computed status in --json output) which would significantly reduce the diagnostic burden, but I cannot adopt v2026.5.7 without exactly the operator-grant gate this feature request is asking for, because a second uncontrolled upgrade during the v2026.5.7 evaluation window would compound the existing instability.

Root Cause

Consequence: When an auto-upgrade introduces a regression, the operator absorbs full diagnostic burden after the fact, often discovers the issue only when downstream symptoms surface (failed dispatches, missing crons, broken delivery), and cannot roll back cleanly because the upgrade has already been applied. In my case, one auto-upgrade event has cost approximately two weeks of operator time across multiple sessions diagnosing the regression family it introduced, plus ongoing workaround maintenance until upstream fixes ship.

Fix Action

Fix / Workaround

On 2026-05-03 at approximately 20:05 PT, my production OpenClaw instance auto-upgraded from 2026.4.24 to 2026.5.2 without prior notification or operator confirmation. The version jump correlates with the onset of a Telegram forum thread dispatch regression that has held DM-only-permanent status since (workaround documented internally as SL-057), and additional regressions in the at-kind cron family that I am still actively diagnosing.

Affected: Production OpenClaw operators running multi-agent deployments with customer-facing automation (cron jobs, scheduled emails, active dispatch).

Consequence: When an auto-upgrade introduces a regression, the operator absorbs full diagnostic burden after the fact, often discovers the issue only when downstream symptoms surface (failed dispatches, missing crons, broken delivery), and cannot roll back cleanly because the upgrade has already been applied. In my case, one auto-upgrade event has cost approximately two weeks of operator time across multiple sessions diagnosing the regression family it introduced, plus ongoing workaround maintenance until upstream fixes ship.

RAW_BUFFERClick to expand / collapse

Summary

Add a configuration option that requires explicit operator approval before OpenClaw applies any platform upgrade, instead of upgrades being applied automatically without a human gate.

Problem to solve

On 2026-05-03 at approximately 20:05 PT, my production OpenClaw instance auto-upgraded from 2026.4.24 to 2026.5.2 without prior notification or operator confirmation. The version jump correlates with the onset of a Telegram forum thread dispatch regression that has held DM-only-permanent status since (workaround documented internally as SL-057), and additional regressions in the at-kind cron family that I am still actively diagnosing.

As an operator running OpenClaw in a multi-agent production environment (four agents: Kaizen, Alex, Harold, Ray, on a Telegram hub-and-spoke configuration) with active customer-facing email cron jobs, I need to control when platform changes are applied so I can pre-stage backups, schedule a rollback window, and verify each release against my own substrate before adoption.

The current auto-upgrade behavior creates unacceptable production risk. It removes the discipline I would apply to any other infrastructure dependency, and when a regression slips through, the diagnostic burden falls entirely on the operator with no opportunity for staged rollout. With v2026.5.7 just shipping (which contains fixes I want, but also change surface area I have not yet evaluated), this gap is now blocking me from adopting upstream improvements on a sensible timeline.

Proposed solution

Add a configuration option (suggested: openclaw.upgrade.requireOperatorGrant: true) that:

  1. Detects available upgrades and surfaces them via a notification channel of the operator's choosing (Telegram DM, CEO email, dedicated webhook).
  2. Holds the upgrade in a pending state until the operator runs an explicit command (e.g., openclaw upgrade --apply <version>) or approves via a callback action.
  3. Logs both the detection event and the operator approval event to the audit trail with timestamp, version delta, and operator identity.
  4. Defaults to enabled for production deployments, or at minimum prompts for the choice during initial setup so operators are aware the gate exists.

This mirrors the standard discipline pattern for production infrastructure (Kubernetes operators, database engines, CI/CD pipelines) where unattended upgrades are opt-in rather than the default behavior.

Alternatives considered

  • Manual version pinning via configuration: works but does not surface available upgrades, creating awareness drift over time.
  • Disabling the auto-upgrade service entirely: works but loses the value of being notified about new releases and security fixes.
  • Containerizing OpenClaw with image pinning: heavier infrastructure burden than a single config flag and changes the deployment model significantly.
  • Operator-side cron job that detects upgrades and DMs me: places the responsibility on every operator to build the same harness, when the platform should provide it once. Filed by a production operator running OpenClaw 2026.5.2 in a multi-agent Telegram hub-and-spoke configuration with active customer-facing email cron jobs. Happy to provide additional detail on the regression family observed post-upgrade if useful for upstream triage. The use case here is not unique to my deployment — any operator running OpenClaw in production with real users will hit the same risk profile the moment an auto-upgrade introduces a regression.

Impact

Affected: Production OpenClaw operators running multi-agent deployments with customer-facing automation (cron jobs, scheduled emails, active dispatch).

Severity: High. Auto-upgrades that introduce regressions create reactive production incidents the operator did not consent to and cannot pre-stage rollback for.

Frequency: Per upgrade event. Auto-upgrades are infrequent in absolute terms (a handful per month based on the release cadence I have observed), but the consequence per event is severe and the operator has zero forward control.

Consequence: When an auto-upgrade introduces a regression, the operator absorbs full diagnostic burden after the fact, often discovers the issue only when downstream symptoms surface (failed dispatches, missing crons, broken delivery), and cannot roll back cleanly because the upgrade has already been applied. In my case, one auto-upgrade event has cost approximately two weeks of operator time across multiple sessions diagnosing the regression family it introduced, plus ongoing workaround maintenance until upstream fixes ship.

Evidence/examples

Specific event: 2026-05-03 at approximately 20:05 PT, my OpenClaw production instance auto-upgraded from version 2026.4.24 to 2026.5.2 with no prior notification and no operator confirmation. I discovered the upgrade only after observing downstream regression symptoms.

Regression #1 (still open): Telegram forum thread dispatch fails to deliver messages to forum threads, falling back to DM-only behavior with no surface error. Tracked internally as SL-057. Workaround in place but architecturally unsatisfying. No fix in v2026.5.7.

Regression #2 (under investigation): At-kind cron jobs silently disappear from the scheduler with no error logged. Six instances observed and documented over the past week, all sharing similar registration patterns. Forensics ticket open at P1. v2026.5.7 ships a relevant fix (cron CLI computed status in --json output) which would significantly reduce the diagnostic burden, but I cannot adopt v2026.5.7 without exactly the operator-grant gate this feature request is asking for, because a second uncontrolled upgrade during the v2026.5.7 evaluation window would compound the existing instability.

Operator workflow today (without this feature):

  1. Discover auto-upgrade happened (usually via downstream symptom, not via notification).
  2. Diagnose what changed by reading release notes after the fact.
  3. Triage any regressions reactively.
  4. Maintain workarounds until upstream fixes ship.
  5. Hesitate to adopt new releases because the same auto-upgrade behavior could repeat at any time.

Operator workflow with this feature:

  1. Receive notification that v2026.5.x is available.
  2. Read release notes, evaluate against my substrate.
  3. Pre-stage backup, schedule upgrade window, run apply command.
  4. Verify post-upgrade health.
  5. Roll back cleanly if anything regresses, since I controlled the timing.

The difference between those two workflows is the difference between operating OpenClaw as a serious production dependency and operating it as a hobby installation.

Additional information

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING