openclaw - 💡(How to fix) Fix RFC: config.patch safety guardrails — dry-run validation, auto-backup, and post-apply doctor [1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#55556Fetched 2026-04-08 01:38:04
View on GitHub
Comments
1
Participants
1
Timeline
3
Reactions
0
Timeline (top)
closed ×1commented ×1locked ×1

Error Message

  1. LLM hallucination feedback loop: When an agent's config.patch is rejected, the error message may not clearly indicate what went wrong. The agent may retry with a different (also wrong) approach, wasting cycles.
  • Warn if running ACP/subagent sessions would be affected by the restart

Root Cause

In multi-agent setups:

  • Agent dispatches a config.patch to enable a feature
  • The patch is schema-valid, gets written, triggers SIGUSR1
  • A running ACP session (Codex doing a 5-minute coding task) gets killed
  • The agent doesn't know the ACP session was killed because there's no coordination

Fix Action

Fix / Workaround

config.patch is a high-stakes operation in OpenClaw. While the current implementation does validate the merged config against the Zod schema (with .strict() mode rejecting unknown keys) before writing to disk, there are still operational gaps that make config management risky in agent-driven environments:

  1. No dry-run/preview mode: Agents cannot preview what a patch would change before committing. They must apply the patch to discover if it's valid, which triggers a restart even for exploratory changes.
  2. No automatic backup: If a patch passes schema validation but introduces a runtime issue (valid schema, bad runtime behavior), there's no built-in rollback path.
  3. Every successful patch triggers a restart: Even for hot-reloadable paths, config.patch schedules SIGUSR1 (#43803, #46310), which can kill running ACP sessions (#52440).
  4. LLM hallucination feedback loop: When an agent's config.patch is rejected, the error message may not clearly indicate what went wrong. The agent may retry with a different (also wrong) approach, wasting cycles.

These gaps are documented across several open issues:

  • #43803: config.patch sends SIGUSR1 for hot-reloadable paths
  • #46310: config.patch unconditionally schedules restart
  • #52440: ACP sessions killed by gateway restart
  • #43150: Config write race condition causes lost updates

Code Example

Agent: config.patch({ "agents.defaults.timeoutSeconds": 600 }, { dryRun: true })
Gateway: { 
  ok: true, 
  dryRun: true,
  diff: { "agents.defaults.timeoutSeconds": [172800, 600] },
  requiresRestart: true,
  activeAcpSessions: 2,
  warning: "2 active ACP sessions will be terminated by restart"
}
// Agent decides to wait until ACP sessions complete before applying
RAW_BUFFERClick to expand / collapse

Problem

config.patch is a high-stakes operation in OpenClaw. While the current implementation does validate the merged config against the Zod schema (with .strict() mode rejecting unknown keys) before writing to disk, there are still operational gaps that make config management risky in agent-driven environments:

  1. No dry-run/preview mode: Agents cannot preview what a patch would change before committing. They must apply the patch to discover if it's valid, which triggers a restart even for exploratory changes.
  2. No automatic backup: If a patch passes schema validation but introduces a runtime issue (valid schema, bad runtime behavior), there's no built-in rollback path.
  3. Every successful patch triggers a restart: Even for hot-reloadable paths, config.patch schedules SIGUSR1 (#43803, #46310), which can kill running ACP sessions (#52440).
  4. LLM hallucination feedback loop: When an agent's config.patch is rejected, the error message may not clearly indicate what went wrong. The agent may retry with a different (also wrong) approach, wasting cycles.

These gaps are documented across several open issues:

  • #43803: config.patch sends SIGUSR1 for hot-reloadable paths
  • #46310: config.patch unconditionally schedules restart
  • #52440: ACP sessions killed by gateway restart
  • #43150: Config write race condition causes lost updates

Real-world scenario

In multi-agent setups:

  • Agent dispatches a config.patch to enable a feature
  • The patch is schema-valid, gets written, triggers SIGUSR1
  • A running ACP session (Codex doing a 5-minute coding task) gets killed
  • The agent doesn't know the ACP session was killed because there's no coordination

Proposed Improvements

1. Dry-run / preview mode

Add a dryRun: true option to config.patch that:

  • Validates the patch against the schema (same path as current validation)
  • Returns a diff preview showing what paths would change
  • Does not write to disk or schedule restart
  • Returns the full validation result so agents can make informed decisions

This is especially valuable for LLM agents that want to "plan" a config change before executing it.

2. Auto-backup with rollback

Before every config.patch write:

  • Snapshot the current config to a backup (e.g., openclaw.json.bak or timestamped)
  • Keep the last N backups (configurable, default 3)
  • Provide a config.rollback command to restore the previous snapshot

This provides a safety net for valid-but-problematic changes.

3. Restart-aware patching

When config.patch detects that changed paths are hot-reloadable:

  • Skip SIGUSR1 and apply hot-reload instead
  • Only schedule restart for paths that truly require it
  • Warn if running ACP/subagent sessions would be affected by the restart

Example Flow

Agent: config.patch({ "agents.defaults.timeoutSeconds": 600 }, { dryRun: true })
Gateway: { 
  ok: true, 
  dryRun: true,
  diff: { "agents.defaults.timeoutSeconds": [172800, 600] },
  requiresRestart: true,
  activeAcpSessions: 2,
  warning: "2 active ACP sessions will be terminated by restart"
}
// Agent decides to wait until ACP sessions complete before applying

Note on current validation

Credit where due: the current config.patch implementation already validates against the Zod schema with .strict() before writing, so unknown keys are rejected at the gateway level. This RFC focuses on the remaining gaps (preview, backup, restart coordination) rather than schema validation.

extent analysis

Fix Plan

To address the operational gaps in the config.patch operation, we will implement the following changes:

  • Add a dryRun option to config.patch to preview changes without writing to disk
  • Implement automatic backup and rollback for config.patch operations
  • Modify config.patch to skip restart for hot-reloadable paths and warn about affected ACP sessions

Step-by-Step Solution

  1. Add dryRun option:
    • Update the config.patch function to accept a dryRun option
    • If dryRun is true, validate the patch against the schema and return a diff preview without writing to disk
  2. Implement auto-backup and rollback:
    • Before writing to disk, create a backup of the current config
    • Store the last N backups (configurable, default 3)
    • Add a config.rollback command to restore the previous snapshot
  3. Modify restart behavior:
    • Check if changed paths are hot-reloadable
    • If hot-reloadable, apply hot-reload instead of scheduling a restart
    • Warn if running ACP sessions would be affected by the restart

Example Code

// config.patch function with dryRun option
function configPatch(patch, options = {}) {
  const { dryRun = false } = options;
  const validation = validatePatchAgainstSchema(patch);
  if (validation.error) {
    return { ok: false, error: validation.error };
  }
  if (dryRun) {
    const diff = calculateDiff(patch);
    return { ok: true, dryRun: true, diff };
  }
  // Write to disk and schedule restart if necessary
}

// Auto-backup and rollback implementation
function backupConfig() {
  const currentConfig = getConfig();
  const backupPath = `openclaw.json.bak.${Date.now()}`;
  writeConfig(backupPath, currentConfig);
}

function rollbackConfig() {
  const lastBackupPath = getLastBackupPath();
  const backupConfig = readConfig(lastBackupPath);
  writeConfig(`openclaw.json`, backupConfig);
}

// Modified restart behavior
function applyPatch(patch) {
  const changedPaths = getChangedPaths(patch);
  const hotReloadablePaths = getHotReloadablePaths(changedPaths);
  if (hotReloadablePaths.length > 0) {
    applyHotReload(hotReloadablePaths);
  } else {
    scheduleRestart();
  }
  const activeAcpSessions = getActiveAcpSessions();
  if (activeAcpSessions > 0) {
    warnAboutAffectedAcpSessions(activeAcpSessions);
  }
}

Verification

To verify the fix, test the following scenarios:

  • config.patch with dryRun: true returns a diff preview without writing to disk
  • `config.patch

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix RFC: config.patch safety guardrails — dry-run validation, auto-backup, and post-apply doctor [1 comments, 1 participants]