openclaw - ✅(Solved) Fix config: skip writeConfigFile when nextHash === previousHash (no-op short-circuit) [3 pull requests, 1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75534Fetched 2026-05-02 05:33:30
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
2
Author
Timeline (top)
cross-referenced ×5commented ×1

writeConfigFile in src/config/io.ts computes nextHash and previousHash, sees they're equal (no-op write), but still proceeds with the atomic rename + audit log + suspicious-reason check + file-watcher trigger. On a deployment that calls config.patch frequently with semantically-equivalent payloads, this causes a thrash loop:

  1. config.patch writes byte-identical JSON
  2. File-watcher fires
  3. [reload] config change detected lists 25+ "changed" plugin paths (false positives — bytes-different but semantics-same)
  4. SIGUSR1 → gateway full-restart
  5. In-flight Service Bus / WebSocket messages can't settle → dropped
  6. Goto 1 on the next inbound message

Symptom signature in container logs:

Config overwrite: ... (sha256 X -> Y, backup=...)
[reload] config change detected; evaluating reload (plugins.entries.X.config, plugins.entries.Y.config, ...)
[reload] config change requires gateway restart (...)
[gateway] signal SIGUSR1 received

The reload "changed paths" all turn out to be no-ops on inspection.

Root Cause

In a multi-tenant SaaS deployment where the admin layer calls config.patch on every inbound channel message (e.g. to register a per-message agent in allowAgents), the patch is semantically a no-op when the agent ID is already present, but writeConfigFile still rewrites the file because:

Fix Action

Fix / Workaround

writeConfigFile in src/config/io.ts computes nextHash and previousHash, sees they're equal (no-op write), but still proceeds with the atomic rename + audit log + suspicious-reason check + file-watcher trigger. On a deployment that calls config.patch frequently with semantically-equivalent payloads, this causes a thrash loop:

  1. config.patch writes byte-identical JSON
  2. File-watcher fires
  3. [reload] config change detected lists 25+ "changed" plugin paths (false positives — bytes-different but semantics-same)
  4. SIGUSR1 → gateway full-restart
  5. In-flight Service Bus / WebSocket messages can't settle → dropped
  6. Goto 1 on the next inbound message

In a multi-tenant SaaS deployment where the admin layer calls config.patch on every inbound channel message (e.g. to register a per-message agent in allowAgents), the patch is semantically a no-op when the agent ID is already present, but writeConfigFile still rewrites the file because:

PR fix notes

PR #75543: fix(config): skip writeConfigFile disk I/O when config is semantically unchanged

Description (problem / solution / changelog)

Summary

When config.patch is called with a payload that results in no substantive change (only meta.lastTouchedAt / meta.lastTouchedVersion differ), the writeConfigFile function now returns early without touching disk.

Problem

On deployments where an admin layer calls config.patch on every inbound message (e.g., to register a per-message agent in allowAgents), a thrash loop occurs:

  1. config.patch writes byte-different but semantically-identical JSON (only meta.lastTouchedAt changes)
  2. File-watcher fires
  3. [reload] config change detected lists 25+ false-positive "changed" plugin paths
  4. SIGUSR1 → full gateway restart
  5. In-flight Service Bus / WebSocket messages dropped
  6. Repeat on next inbound message

Fix

After computing nextHash and previousHash, the function now:

  1. Fast path: If byte-identical (nextHash === previousHash), return immediately
  2. Semantic path: If hashes differ, strip volatile meta fields (lastTouchedAt, lastTouchedVersion) from both sides and re-compare. If the stable content is identical, short-circuit without disk I/O

When the short-circuit fires, no atomic rename, backup rotation, audit log, or suspicious-reason check is performed — there is nothing to record.

Test

Added regression test in io.write-config.test.ts that verifies:

  • Second write of identical config does not touch disk (mtime unchanged)
  • No "Config overwrite" log is emitted
  • Backup rotation is not triggered

Fixes #75534

Changed files

  • src/config/io.ts (modified, +33/-0)
  • src/config/io.write-config.test.ts (modified, +48/-0)

PR #219: fix(openclaw-config): break agent-create restart cascade (#193)

Description (problem / solution / changelog)

Summary

Closes the dominant remaining cause of #193: every settings save and agent-create triggered 15-30 s of "Agent runtime is not available" downtime, even after the #200 PR cluster (#201, #203, #205, #206) eliminated the original SECRETS_RELOADER_DEGRADED cause. Reproduced on staging 2026-05-01 and locally against the production-image stack.

Root causes addressed (in order of discovery)

  1. channels.telegram.enabled ping-pong. OpenClaw writes enabled: true back on every gateway start during auto-enable. Pinchy's allow-list for telegram-block fields didn't include enabled, so the next regenerateOpenClawConfig stripped it → diff → restart → repeat.
  2. plugins.entries.<provider> ping-pong. Same bug class, same trigger, just for OpenClaw-managed provider plugins (anthropic, openai, google, ollama-cloud). Fixed generically: preserve any plugins.entries.* entry that doesn't start with pinchy-.
  3. meta.lastTouchedAt slipping past the byte-equal early-return. OpenClaw stamps a fresh timestamp on every write, so back-to-back regenerates without DB changes still produced different bytes → unnecessary config.apply RPC. Added a normalize-compare second early-return that ignores OpenClaw-managed metadata.
  4. (workaround) env.* false-positive restart trigger — the dominant cause once 1–3 were fixed. OpenClaw's diffConfigPaths(snapshot, parsed) compares snapshot's runtime-resolved env (sk-ant-...) against Pinchy's template (${ANTHROPIC_API_KEY}); they never match, env.* always lands in changedPaths, and env.* has no rule in BASE_RELOAD_RULES → falls through to default full-restart trigger. Verified in OpenClaw 2026.4.12 (current pin) and 2026.4.29 (latest stable). Tracked upstream in openclaw#75534.
    • Workaround: replace unchanged env values with OpenClaw's __OPENCLAW_REDACTED__ sentinel before config.apply. OpenClaw's restoreRedactedValues runs before the diff and substitutes the sentinel with snapshot's resolved value → no env diff → no restart. Plain templates kept for genuinely new env keys (legit restart). Cleanup tracked in #215.

Layered guardrails

Per the project rule for production bugs (Unit + Integration + Runtime-Validation or E2E):

  • 5 unit tests in openclaw-config.test.ts exercising each fix and the workaround.
  • 1 E2E test in e2e/telegram/agent-create-no-restart.spec.ts against the production-image stack (docker-compose.e2e.yml). Waits for the cold-start cascade to settle (log-scan based, more robust than the WS-connectivity check that caused #203's E2E to be dropped), then asserts no requires gateway restart / received SIGUSR1 / full process restart events appear in the OpenClaw log window after a POST /api/agents.

Validation

  • 3344/3344 unit tests green
  • E2E agent-create-no-restart.spec.ts passes against the local production-image stack
  • RED ↔ GREEN toggle: temporarily reverted the env-redact workaround → E2E goes RED with the expected requires gateway restart (env.ANTHROPIC_API_KEY) log line; restored → GREEN
  • Other 13 Telegram E2E specs still pass; the 1 known-flaky @channel-restart test (already skipped in CI) is unrelated

Test plan

  • CI green
  • Manual click-through on staging after merge: create custom agent → immediately chat → no "Agent runtime is not available" banner, agent responds
  • Inspect staging OpenClaw logs after agent-create: only agents.list reload event, no SIGUSR1/restart

Related

  • #193 — original cascade issue (this PR closes it)
  • #200 — original unknown agent id symptom (closed earlier; this finishes the broader story)
  • #215 — cleanup task once openclaw#75534 lands upstream
  • openclaw#75534 — upstream tracking for the env-diff bug

Changed files

  • docker-compose.test.yml (modified, +10/-0)
  • packages/web/e2e/telegram/agent-create-no-restart.spec.ts (added, +213/-0)
  • packages/web/src/__tests__/lib/openclaw-config.test.ts (modified, +390/-2)
  • packages/web/src/lib/openclaw-config.ts (modified, +159/-12)

PR #254: refactor(openclaw-config): split 1097-line module into focused sub-files

Description (problem / solution / changelog)

Closes #233.

Summary

Splits packages/web/src/lib/openclaw-config.ts (1097 lines, 7 distinct concerns interleaved) into a directory of focused sub-modules. The import path @/lib/openclaw-config is unchanged — all 10 production importers and 14 test files keep working without edits.

File map

FileConcern
build.tsregenerateOpenClawConfig — DB → openclaw.json
targeted.tssanitizeOpenClawConfig, updateIdentityLinks, updateTelegramChannelConfig (narrow-scope writes that bypass full regenerate)
write.tswriteConfigAtomic, readExistingConfig (with the EACCES retry against start-openclaw.sh's chmod loop), pushConfigInBackground
normalize.tsconfigsAreEquivalentUpToOpenClawMetadata, redactUnchangedEnvForApply, OPENCLAW_REDACTED_SENTINEL — the openclaw#75534 workarounds, isolated so #215's removal becomes a one-file delete
secrets-bundle.tsbuildSecretsBundle helper + central home for the Pattern A/B/C secret-handling matrix from CLAUDE.md
paths.tsCONFIG_PATH constant (separate file to avoid write.tsnormalize.ts cycle on redactUnchangedEnvForApplypushConfigInBackground)
index.tsRe-exports the four public symbols

Behaviour preservation (#193 constraints)

This is a pure code-organisation change — same runtime behaviour, same public API.

  • Trailing-newline format (JSON.stringify(...).trimEnd() + "\n") preserved so SHA256 hashes align with OpenClaw's writeConfigFile output (chokidar dedupe via lastAppliedWriteHash).
  • Byte-equal early-return + configsAreEquivalentUpToOpenClawMetadata semantics unchanged.
  • Sentinel redaction (OPENCLAW_REDACTED_SENTINEL / redactUnchangedEnvForApply) unchanged.
  • Atomic write + EACCES retry timing unchanged.
  • The byte-idempotency test from #193 still passes.

Test plan

  • tsc --noEmit — exit 0, no new errors
  • vitest run — 3407 passed, 12 skipped, 3 todo, 0 failed across 259 test files
  • Verified all @/lib/openclaw-config importers still resolve via the new index.ts
  • CI green
  • Manual smoke on staging: settings save → no spurious gateway restart (the #193 invariant)

Notes for reviewers

  • secrets-bundle.ts introduces a tiny buildSecretsBundle({...}) helper that wraps the four-field SecretsBundle literal previously inlined in regenerateOpenClawConfig. The bundle's resulting bytes are identical; the helper exists to give the secret-handling pattern matrix a stable home for future audit/rotation/validation logic.
  • paths.ts exists solely to break a cycle. If you'd prefer the constant inlined into write.ts and redactUnchangedEnvForApply taking existingContent as a parameter instead, happy to re-shape — but the cycle-avoidance via dedicated module felt cleaner.
  • One pre-existing flake in openclaw-config.test.ts ("skips file write and config.apply RPC when only meta.lastTouchedAt differs" / "keeps templates for new env keys not in existing config") was observed during verification. Reproduced ~30% on a clean main checkout (without this refactor). Flagged as a separate issue; not introduced by this PR.

Related

  • #193 — original cascade issue (constraint: don't regress)
  • #215 — removal of the normalize-compare workaround when openclaw#75534 lands; once resolved, normalize.ts is deleted wholesale
  • openclaw/openclaw#75534 — upstream fix being tracked

Changed files

  • packages/web/src/lib/openclaw-config/build.ts (renamed, +9/-381)
  • packages/web/src/lib/openclaw-config/index.ts (added, +17/-0)
  • packages/web/src/lib/openclaw-config/normalize.ts (added, +94/-0)
  • packages/web/src/lib/openclaw-config/paths.ts (added, +5/-0)
  • packages/web/src/lib/openclaw-config/secrets-bundle.ts (added, +24/-0)
  • packages/web/src/lib/openclaw-config/targeted.ts (added, +189/-0)
  • packages/web/src/lib/openclaw-config/write.ts (added, +100/-0)

Code Example

Config overwrite: ... (sha256 X -> Y, backup=...)
[reload] config change detected; evaluating reload (plugins.entries.X.config, plugins.entries.Y.config, ...)
[reload] config change requires gateway restart (...)
[gateway] signal SIGUSR1 received

---

const json = JSON.stringify(stampedOutputConfig, null, 2).trimEnd().concat("\n");
const nextHash = hashConfigRaw(json);
const previousHash = resolveConfigSnapshotHash(snapshot);

---

// No-op short-circuit: bytes are semantically identical to disk. Skip the
// disk write to avoid the file-watcher → SIGUSR1 cascade. The audit
// trail, "Config overwrite" log, and suspicious-reasons check all become
// noise on hash-equal writes.
if (snapshot.exists && previousHash !== null && nextHash === previousHash) {
  return { persistedHash: nextHash, persistedConfig: stampedOutputConfig };
}
RAW_BUFFERClick to expand / collapse

Summary

writeConfigFile in src/config/io.ts computes nextHash and previousHash, sees they're equal (no-op write), but still proceeds with the atomic rename + audit log + suspicious-reason check + file-watcher trigger. On a deployment that calls config.patch frequently with semantically-equivalent payloads, this causes a thrash loop:

  1. config.patch writes byte-identical JSON
  2. File-watcher fires
  3. [reload] config change detected lists 25+ "changed" plugin paths (false positives — bytes-different but semantics-same)
  4. SIGUSR1 → gateway full-restart
  5. In-flight Service Bus / WebSocket messages can't settle → dropped
  6. Goto 1 on the next inbound message

Symptom signature in container logs:

Config overwrite: ... (sha256 X -> Y, backup=...)
[reload] config change detected; evaluating reload (plugins.entries.X.config, plugins.entries.Y.config, ...)
[reload] config change requires gateway restart (...)
[gateway] signal SIGUSR1 received

The reload "changed paths" all turn out to be no-ops on inspection.

Reproduction

In a multi-tenant SaaS deployment where the admin layer calls config.patch on every inbound channel message (e.g. to register a per-message agent in allowAgents), the patch is semantically a no-op when the agent ID is already present, but writeConfigFile still rewrites the file because:

  • The deserialize → modify → JSON.stringify round-trip can produce different byte ordering / formatting than what's on disk
  • Any field that has a different default initialization order between Zod's parse and the legacy on-disk representation will differ on byte
  • Even map iteration order can affect output JSON

Result: every "no-op" patch hits disk, triggering the gateway file-watcher cascade.

Proposed fix

src/config/io.ts already computes both hashes around line 1735-1739:

const json = JSON.stringify(stampedOutputConfig, null, 2).trimEnd().concat("\n");
const nextHash = hashConfigRaw(json);
const previousHash = resolveConfigSnapshotHash(snapshot);

Add a no-op short-circuit immediately after:

// No-op short-circuit: bytes are semantically identical to disk. Skip the
// disk write to avoid the file-watcher → SIGUSR1 cascade. The audit
// trail, "Config overwrite" log, and suspicious-reasons check all become
// noise on hash-equal writes.
if (snapshot.exists && previousHash !== null && nextHash === previousHash) {
  return { persistedHash: nextHash, persistedConfig: stampedOutputConfig };
}

The audit appender + logger are intentionally skipped in this branch — there's nothing to audit (no rename, no permission tighten, no metadata change). Tests can guard against accidental skipping by exercising OPENCLAW_TEST_CONFIG_OVERWRITE_LOG=1 with a hash-different payload.

Why this is a safe change

  • The function's existing return contract is { persistedHash, persistedConfig }. Both values are well-defined in the no-op branch (hash is the existing one; config is what we computed).
  • Backup rotation, permissions tightening, atomic rename — none of those have side effects worth running when the bytes don't change.
  • Audit log is also a no-op semantically (nothing changed). Skipping it is correct.
  • File-watcher will not fire — which is the entire point.

Downstream impact

In a multi-tenant deployment where ~10 inbound messages/hour arrive per tenant, the current behavior causes ~10 gateway restarts/hour, dropping a fraction of in-flight messages each time. With this fix, restarts drop to ~0/hour during normal operation (only real config changes trigger reloads).

References

  • Affected file: src/config/io.ts (function writeConfigFile internal callback applyConfigToConfigPath-equivalent, around line 1700-1900)
  • Hash helpers: hashConfigRaw, resolveConfigSnapshotHash (already exist in same module)
  • File-watcher: [reload] config change detected log emitted by the gateway's config reloader (separate codepath, doesn't need changing — fixing the writer is sufficient)

I have a downstream consumer (https://github.com/sidekykai/openclawmtsaas) hitting this in production. Happy to draft a PR + tests if the maintainers agree on the approach.

extent analysis

TL;DR

The proposed fix involves adding a no-op short-circuit in src/config/io.ts to skip disk writes when the computed hashes are equal, preventing unnecessary file-watcher triggers and gateway restarts.

Guidance

  • Verify that the nextHash and previousHash computations are correct and consistent with the expected behavior.
  • Implement the proposed no-op short-circuit in src/config/io.ts to skip disk writes when the hashes are equal.
  • Test the change with the OPENCLAW_TEST_CONFIG_OVERWRITE_LOG=1 environment variable to ensure the audit log is not skipped unnecessarily.
  • Monitor the downstream impact on gateway restarts and in-flight message drops to confirm the fix is effective.

Example

if (snapshot.exists && previousHash !== null && nextHash === previousHash) {
  return { persistedHash: nextHash, persistedConfig: stampedOutputConfig };
}

Notes

The proposed fix assumes that the hashConfigRaw and resolveConfigSnapshotHash functions are correctly implemented and produce consistent results. Additionally, the fix may not apply to all scenarios, such as when the config file is modified externally.

Recommendation

Apply the proposed workaround by adding the no-op short-circuit in src/config/io.ts, as it is a targeted fix that addresses the specific issue of unnecessary gateway restarts and in-flight message drops.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix config: skip writeConfigFile when nextHash === previousHash (no-op short-circuit) [3 pull requests, 1 comments, 2 participants]