openclaw - 💡(How to fix) Fix [Bug]: Gateway auto-restore fails on critical config corruption (missing-meta-vs-last-good) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#70336Fetched 2026-04-23 07:26:08
View on GitHub
Comments
1
Participants
2
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
labeled ×2closed ×1commented ×1

The gateway's config auto-restore mechanism only triggers for update-channel-only-root errors but ignores more critical errors like missing-meta-vs-last-good and gateway-mode-missing-vs-last-good. This causes infinite crash loops when config is corrupted, requiring manual intervention despite having a valid last-known-good backup available.

Error Message

Proposed fix: Check for ALL critical error types, not just one

Root Cause

Crash loop logs show: restoredFromBackup: false despite valid backup existing Workaround: Pre-flight validation script (ExecStartPre) catches corruption before gateway starts Root cause: resolveConfigReadRecoveryContext() only checks for update-channel-only-root Proposed fix: Check for ALL critical error types, not just one

Fix Action

Fix / Workaround

Crash loop logs show: restoredFromBackup: false despite valid backup existing Workaround: Pre-flight validation script (ExecStartPre) catches corruption before gateway starts Root cause: resolveConfigReadRecoveryContext() only checks for update-channel-only-root Proposed fix: Check for ALL critical error types, not just one

RAW_BUFFERClick to expand / collapse

Bug type

Crash (process/app exits or hangs)

Beta release blocker

No

Summary

The gateway's config auto-restore mechanism only triggers for update-channel-only-root errors but ignores more critical errors like missing-meta-vs-last-good and gateway-mode-missing-vs-last-good. This causes infinite crash loops when config is corrupted, requiring manual intervention despite having a valid last-known-good backup available.

Steps to reproduce

  1. Have a working OpenClaw gateway with valid config
  2. Corrupt the config file by removing metadata (or during a bad edit/write):
    • Remove gateway section, OR
    • Corrupt config structure to trigger "missing-meta-vs-last-good"
  3. Restart the gateway: systemctl --user restart openclaw-gateway
  4. Observe:
    • Gateway detects corruption: valid: false
    • Gateway logs: suspicious: ["missing-meta-vs-last-good", "gateway-mode-missing-vs-last-good"]
    • Gateway does NOT auto-restore: restoredFromBackup: false
    • Gateway crashes
    • Systemd restarts it (Restart=always)
    • Loop repeats 5 times in 60 seconds
    • Service stops due to StartLimitBurst=5
  5. Gateway remains down until manual config restore

Expected behavior

Gateway should auto-restore from last-known-good config for ALL critical config errors, including:

  • missing-meta-vs-last-good
  • gateway-mode-missing-vs-last-good
  • size-drop-vs-last-good:*
  • update-channel-only-root (currently the only one that works) The gateway has a valid backup (last-known-good) and should use it to recover automatically, preventing crash loops. The current behavior where it only restores for less critical errors but not critical corruption is a safety feature failure.

Actual behavior

{ "valid": false, "suspicious": ["missing-meta-vs-last-good", "gateway-mode-missing-vs-last-good"], "restoredFromBackup": false, "restoredBackupPath": null }

OpenClaw version

2026.4.21

Operating system

Ubuntu 24.04.4 LTS

Install method

No response

Model

qwen3.5-397b-a17b

Provider / routing chain

openclaw -> qwen

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact

  • User Impact: Gateway enters crash loop, requires manual intervention
  • Frequency: Every time config is corrupted during manual edit or failed write

Additional information

Crash loop logs show: restoredFromBackup: false despite valid backup existing Workaround: Pre-flight validation script (ExecStartPre) catches corruption before gateway starts Root cause: resolveConfigReadRecoveryContext() only checks for update-channel-only-root Proposed fix: Check for ALL critical error types, not just one

extent analysis

TL;DR

The gateway's auto-restore mechanism should be updated to trigger on all critical config errors, not just update-channel-only-root, to prevent infinite crash loops.

Guidance

  • Review the resolveConfigReadRecoveryContext() function to ensure it checks for all critical error types, including missing-meta-vs-last-good, gateway-mode-missing-vs-last-good, and size-drop-vs-last-good.
  • Update the auto-restore mechanism to use the last-known-good backup for all critical config errors.
  • Consider implementing a pre-flight validation script (e.g., ExecStartPre) to catch config corruption before the gateway starts.
  • Verify that the restoredFromBackup flag is set to true when the gateway auto-restores from a valid backup.

Example

def resolveConfigReadRecoveryContext(errors):
    critical_errors = [
        "missing-meta-vs-last-good",
        "gateway-mode-missing-vs-last-good",
        "size-drop-vs-last-good",
        "update-channel-only-root"
    ]
    for error in errors:
        if error in critical_errors:
            # Trigger auto-restore from last-known-good backup
            restore_from_backup()
            return

Notes

The proposed fix assumes that the resolveConfigReadRecoveryContext() function is the root cause of the issue. However, without the actual code, this is an educated guess. Additionally, the ExecStartPre script may need to be modified to work with the updated auto-restore mechanism.

Recommendation

Apply workaround: Implement a pre-flight validation script (e.g., ExecStartPre) to catch config corruption before the gateway starts, and update the auto-restore mechanism to trigger on all critical config errors. This will prevent infinite crash loops and ensure the gateway can recover automatically from config corruption.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Gateway should auto-restore from last-known-good config for ALL critical config errors, including:

  • missing-meta-vs-last-good
  • gateway-mode-missing-vs-last-good
  • size-drop-vs-last-good:*
  • update-channel-only-root (currently the only one that works) The gateway has a valid backup (last-known-good) and should use it to recover automatically, preventing crash loops. The current behavior where it only restores for less critical errors but not critical corruption is a safety feature failure.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Gateway auto-restore fails on critical config corruption (missing-meta-vs-last-good) [1 comments, 2 participants]