openclaw - ✅(Solved) Fix [Bug]: Config Migration Order Causes Service Failure on Schema Changes [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#68664Fetched 2026-04-19 15:08:54
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Author
Participants
Timeline (top)
referenced ×3cross-referenced ×1

Error Message

// CURRENT (BROKEN) FLOW:

  1. Validate config → ❌ ERROR: "Unrecognized keys: chunkSize, chunkOverlap, maxResults"
  2. Apply migrations → 🚫 NEVER EXECUTES (validation fails first)
  3. Service crashes → 🔄 Restart loop begins

Root Cause

🔍 Root Cause Analysis

Fix Action

Fix / Workaround

🔧 Workaround for Users

Until fixed, users must manually run after updates:

openclaw doctor --fix
systemctl --user restart openclaw-gateway

PR fix notes

PR #68685: fix(config): strip obsolete memorySearch keys before schema validation (#68664)

Description (problem / solution / changelog)

Summary

  • Problem: Obsolete memorySearch keys (chunkSize, chunkOverlap, maxResults) in agents.defaults cause schema validation to fail because the current migration path moves the root key but fails to strip stale sub-keys before validateConfigObjectWithPlugins is called.
  • Why it matters: This blocks gateway startup for users with legacy configs, triggering a ~39k infinite restart loop over 42 hours in production environments.
  • What changed: Implemented agents.defaults.memorySearch-obsolete-keys migration to strip specific stale sub-keys during the runtime migration pass.
  • What did NOT change (scope boundary): Internal logic of memory search and per-agent overrides in agents.list remain untouched.

Change Type (select all)

  • Bug fix
  • Refactor required for the fix

Scope (select all touched areas)

  • Gateway / orchestration
  • API / contracts

Linked Issue/PR

  • Closes #68664
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: The memorySearch sub-keys were deprecated/removed from the Zod schema, but the migration logic only performed a move operation on the top-level memorySearch key. If the keys were already present in agents.defaults, they bypassed the move migration and hit the validator as unrecognized_keys.
  • Missing detection / guardrail: Config migration unit tests lacked coverage for partial/obsolete sub-key state within already-migrated root paths.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
  • Target test or file: src/commands/doctor/shared/legacy-config-migrate.test.ts
  • Scenario the test should lock in: Configs with agents.defaults.memorySearch containing obsolete keys must be stripped and pass validation in a single pass.
  • Why this is the smallest reliable guardrail: Direct unit testing of the migration registry ensures the transformation logic is isolated from IO/Runtime state.

User-visible / Behavior Changes

None (internal config cleanup).

Diagram (if applicable)

Before:
[Load] -> [Move Migration (moves root only)] -> [Validation (Fails on sub-keys)] -> [Crash]

After:
[Load] -> [Move + Strip Migration] -> [Validation (Pass)] -> [Gateway Ready]

Security Impact (required)

  • New permissions/capabilities? (No)
  • Secrets/tokens handling changed? (No)
  • New/changed network calls? (No)
  • Command/tool execution surface changed? (No)
  • Data access scope changed? (No)

Repro + Verification

Environment

  • OS: Windows 11 (PowerShell 7)
  • Runtime/container: Node.js v20.x
  • Model/provider: N/A (Config layer)

Steps

  1. Create config.json5 with agents: { defaults: { memorySearch: { chunkSize: 800 } } }.
  2. Run openclaw gateway.
  3. Observe InvalidConfigError: Unrecognized keys: "chunkSize".

Expected

Gateway initializes successfully after migrating keys.

Actual

Gateway crashes with schema validation error.

Evidence

  • Trace/log snippets: test_results.log (3/3 repro tests passing).

Human Verification (required)

  • Verified scenarios: Baseline strip, no-op on clean config, combined move+strip.
  • Edge cases checked: Partial key presence (e.g., only maxResults present).
  • What you did not verify: Upstream CI/CD integration.

Compatibility / Migration

  • Backward compatible? (Yes)
  • Config/env changes? (Yes - automatic cleanup)
  • Migration needed? (Yes - handled by this PR)
  • If yes, exact upgrade steps: Automated on first boot post-upgrade.

Risks and Mitigations

  • Risk: Migration could theoretically strip valid keys if MEMORY_SEARCH_OBSOLETE_KEYS overlaps with future schema.
    • Mitigation: Keys are strictly scoped to the agents.defaults.memorySearch path. test_results.log

Changed files

  • src/commands/doctor/shared/legacy-config-migrate.test.ts (modified, +106/-0)
  • src/commands/doctor/shared/legacy-config-migrations.runtime.agents.ts (modified, +71/-0)

Code Example

// CURRENT (BROKEN) FLOW:
1. Validate config → ❌ ERROR: "Unrecognized keys: chunkSize, chunkOverlap, maxResults"
2. Apply migrations → 🚫 NEVER EXECUTES (validation fails first)
3. Service crashes → 🔄 Restart loop begins

---

Config invalid
File: ~/.openclaw/openclaw.json
Problem:
  - agents.defaults.memorySearch: Unrecognized keys: "chunkSize", "chunkOverlap", "maxResults"
Run: openclaw doctor --fix

---

// Before (obsolete):
"memorySearch": {
  "enabled": true,
  "chunkSize": 800,        // ❌ No longer recognized
  "chunkOverlap": 100,     // ❌ No longer recognized  
  "maxResults": 5          // ❌ No longer recognized
}

// After (`openclaw doctor --fix`):
"memorySearch": {
  "enabled": true          // ✅ Only valid key remains
}

---

openclaw doctor --fix
systemctl --user restart openclaw-gateway
RAW_BUFFERClick to expand / collapse

🐛 Bug: Config Migration Order Causes Service Failure on Schema Changes

📋 Summary

OpenClaw's configuration validation fails before legacy migrations can run, causing service crashes when configuration schema changes between versions. The service attempted to restart 39,231 times over 42 hours before manual intervention.

🚨 Impact

  • Service downtime: 42 hours (17/04/2026 00:00 - 18/04/2026 18:07)
  • Failed restarts: 39,231 attempts
  • CPU waste: ~3.5 seconds per failed attempt
  • User impact: Complete service unavailability

🔍 Root Cause Analysis

The Problem: When OpenClaw's configuration schema changes (e.g., removing deprecated keys from memorySearch), the current flow is:

// CURRENT (BROKEN) FLOW:
1. Validate config → ❌ ERROR: "Unrecognized keys: chunkSize, chunkOverlap, maxResults"
2. Apply migrations → 🚫 NEVER EXECUTES (validation fails first)
3. Service crashes → 🔄 Restart loop begins

The Code: OpenClaw DOES have a migration system (applyLegacyMigrations in io-CHHRUM9X.js), but it's called AFTER validation in config-guard-BVU7K-aq.js.

Specific Migration That Should Have Run: There's a migration for memorySearch in io-CHHRUM9X.js, but it doesn't remove obsolete keys (chunkSize, chunkOverlap, maxResults) from the legacy object before validation.

📊 Evidence from Production

Error Logs:

Config invalid
File: ~/.openclaw/openclaw.json
Problem:
  - agents.defaults.memorySearch: Unrecognized keys: "chunkSize", "chunkOverlap", "maxResults"
Run: openclaw doctor --fix

Configuration Before/After:

// Before (obsolete):
"memorySearch": {
  "enabled": true,
  "chunkSize": 800,        // ❌ No longer recognized
  "chunkOverlap": 100,     // ❌ No longer recognized  
  "maxResults": 5          // ❌ No longer recognized
}

// After (`openclaw doctor --fix`):
"memorySearch": {
  "enabled": true          // ✅ Only valid key remains
}

🛠️ Proposed Solution

1. Fix the Order (Primary Solution): Change the validation flow to apply migrations BEFORE validation in config-guard-BVU7K-aq.js.

2. Enhance Migration Logic (Secondary): Update the memorySearch migration to also remove obsolete keys.

3. Automatic Migration on Update: Add migration execution to the openclaw update command.

🚀 Implementation Steps

  1. Priority: High - Fix validation order in config-guard-BVU7K-aq.js
  2. Priority: Medium - Add cleanup migrations for common obsolete keys
  3. Priority: Low - Add migration logging for debugging
  4. Priority: Low - Integrate doctor --fix into update process

🔧 Workaround for Users

Until fixed, users must manually run after updates:

openclaw doctor --fix
systemctl --user restart openclaw-gateway

🎯 Related Files

  • /usr/lib/node_modules/openclaw/dist/config-guard-BVU7K-aq.js - Validation logic
  • /usr/lib/node_modules/openclaw/dist/io-CHHRUM9X.js - Migration logic
  • /usr/lib/node_modules/openclaw/dist/legacy.shared-i8CHhuVb.js - Migration utilities

📈 Severity Assessment

  • Impact: Critical (service completely unavailable)
  • Frequency: High (affects all users with config changes)
  • Fix complexity: Low (simple order change)
  • Risk of fix: Low (migrations already exist, just order change)

extent analysis

TL;DR

Change the validation flow to apply migrations before validation in config-guard-BVU7K-aq.js to prevent service crashes due to configuration schema changes.

Guidance

  • Update the config-guard-BVU7K-aq.js file to call applyLegacyMigrations from io-CHHRUM9X.js before performing configuration validation.
  • Verify the fix by introducing a configuration schema change and checking if the service restarts successfully without manual intervention.
  • Consider enhancing the migration logic to remove obsolete keys, such as chunkSize, chunkOverlap, and maxResults, from the memorySearch object.
  • Users can workaround the issue by running openclaw doctor --fix and restarting the openclaw-gateway service after updates.

Example

// Updated flow:
1. Apply migrations (applyLegacyMigrations in io-CHHRUM9X.js)
2. Validate config → ✅ SUCCESS (migrations remove obsolete keys)
3. Service starts → 🔄 No restart loop

Notes

The proposed solution assumes that the migration system is correctly implemented and only the order of operations needs to be changed. Additional testing may be required to ensure the migrations are correctly removing obsolete keys.

Recommendation

Apply the workaround by running openclaw doctor --fix and restarting the openclaw-gateway service until the fix is implemented and verified.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Config Migration Order Causes Service Failure on Schema Changes [1 pull requests, 1 participants]