openclaw - ✅(Solved) Fix config: skip writeConfigFile when nextHash === previousHash (no-op short-circuit) [3 pull requests, 1 comments, 2 participants]

pksidekyk · 2026-05-01T08:13:43Z

[openclaw] writeConfigFile in src/config/io.ts computes nextHash and previousHash , sees they're equal no-op write , but still proceeds with the atomic rename… `writeConfigFile` in `src/config/io.ts` computes `nextHash` and `previousHash`, sees they're equal (no-op write), but still proceeds with the atomic rename + audit log + suspicious-reason check + file-watcher trigger. On a deployment that calls `config.patch` frequently with semantically-equivalent payloads, this causes a thrash loop: 1. `config.patch` writes byte-identical JSON 2. File-watcher fires 3. `[reload] config change detected` lists 25+ "changed" plugin paths (false positives — bytes-different but semantics-same) 4. SIGUSR1 → gateway full-restart 5. In-flight Service Bus / WebSocket messages can't settle → dropped 6. Goto 1 on the next inbound message Symptom signature in container logs: ``` Config overwrite: ... (sha256 X -> Y, backup=...) [reload] config change detected; evaluating reload (plugins.entries.X.config, plugins.entries.Y.config, ...) [reload] config change requires gateway restart (...) [gateway] signal SIGUSR1 received ``` The reload "changed paths" all turn out to be no-ops on inspection. # PR #75543: fix(config): skip writeConfigFile disk I/O when config is semantically unchanged - Repository: openclaw/openclaw - Author: Bartok9 - State: open | merged: False - Link: https://github.com/openclaw/openclaw/pull/75543 ## Description (problem / solution / changelog) ## Summary When `config.patch` is called with a payload that results in no substantive change (only `meta.lastTouchedAt` / `meta.lastTouchedVersion` differ), the `writeConfigFile` function now returns early without touching disk. ## Problem On deployments where an admin layer calls `config.patch` on every inbound message (e.g., to register a per-message agent in `allowAgents`), a thrash loop occurs: 1. `config.patch` writes byte-different but semantically-identical JSON (only `meta.lastTouchedAt` changes) 2. File-watcher fires 3. `[reload] config change detected` lists 25+ false-positive "changed" plugin paths 4. SIGUSR1 → full gateway restart 5. In-flight Service Bus / WebSocket messages dropped 6. Repeat on next inbound message ## Fix After computing `nextHash` and `previousHash`, the function now: 1. **Fast path:** If byte-identical (`nextHash === previousHash`), return immediately 2. **Semantic path:** If hashes differ, strip volatile meta fields (`lastTouchedAt`, `lastTouchedVersion`) from both sides and re-compare. If the stable content is identical, short-circuit without disk I/O When the short-circuit fires, no atomic rename, backup rotation, audit log, or suspicious-reason check is performed — there is nothing to record. ## Test Added regression test in `io.write-config.test.ts` that verifies: - Second write of identical config does not touch disk (mtime unchanged) - No "Config overwrite" log is emitted - Backup rotation is not triggered Fixes #75534 ## Changed files - `src/config/io.ts` (modified, +33/-0) - `src/config/io.write-config.test.ts` (modified, +48/-0) --- # PR #219: fix(openclaw-config): break agent-create restart cascade (#193) - Repository: heypinchy/pinchy - Author: clemenshelm - State: closed | merged: True - Link: https://github.com/heypinchy/pinchy/pull/219 ## Description (problem / solution / changelog) ## Summary Closes the dominant remaining cause of #193: every settings save and agent-create triggered 15-30 s of "Agent runtime is not available" downtime, even after the #200 PR cluster (#201, #203, #205, #206) eliminated the original `SECRETS_RELOADER_DEGRADED` cause. Reproduced on staging 2026-05-01 and locally against the production-image stack. ## Root causes addressed (in order of discovery) 1. **`channels.telegram.enabled` ping-pong.** OpenClaw writes `enabled: true` back on every gateway start during auto-enable. Pinchy's allow-list for telegram-block fields didn't include `enabled`, so the next `regenerateOpenClawConfig` stripped it → diff → restart → repeat. 2. **`plugins.entries. ` ping-pong.** Same bug class, same trigger, just for OpenClaw-managed provider plugins (`anthropic`, `openai`, `google`, `ollama-cloud`). Fixed generically: preserve any `plugins.entries.*` entry that doesn't start with `pinchy-`. 3. **`meta.lastTouchedAt` slipping past the byte-equal early-return.** OpenClaw stamps a fresh timestamp on every write, so back-to-back regenerates without DB changes still produced different bytes → unnecessary `config.apply` RPC. Added a normalize-compare second early-return that ignores OpenClaw-managed metadata. 4. **(workaround) `env.*` false-positive restart trigger** — the dominant cause once 1–3 were fixed. OpenClaw's `diffConfigPaths(snapshot, parsed)` compares snapshot's runtime-resolved env (`sk-ant-...`) against Pinchy's template (`${ANTHROPIC_API_KEY}`); they never match, env.* always lands in `changedPaths`, and env.* has no rule in `BASE_RELOAD_RULES` → falls through to default full-restart trigger. Ve

openclaw2026-05-01 08:13:43

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#75534•Fetched 2026-05-02 05:33:30

View on GitHub

Comments

Participants

Timeline

Reactions

Author

pksidekyk

Participants

clawsweeper[bot]

pksidekyk

Timeline (top)

cross-referenced ×5commented ×1

writeConfigFile in src/config/io.ts computes nextHash and previousHash, sees they're equal (no-op write), but still proceeds with the atomic rename + audit log + suspicious-reason check + file-watcher trigger. On a deployment that calls config.patch frequently with semantically-equivalent payloads, this causes a thrash loop:

config.patch writes byte-identical JSON
File-watcher fires
[reload] config change detected lists 25+ "changed" plugin paths (false positives — bytes-different but semantics-same)
SIGUSR1 → gateway full-restart
In-flight Service Bus / WebSocket messages can't settle → dropped
Goto 1 on the next inbound message

Symptom signature in container logs:

Config overwrite: ... (sha256 X -> Y, backup=...)
[reload] config change detected; evaluating reload (plugins.entries.X.config, plugins.entries.Y.config, ...)
[reload] config change requires gateway restart (...)
[gateway] signal SIGUSR1 received

The reload "changed paths" all turn out to be no-ops on inspection.

Root Cause

In a multi-tenant SaaS deployment where the admin layer calls config.patch on every inbound channel message (e.g. to register a per-message agent in allowAgents), the patch is semantically a no-op when the agent ID is already present, but writeConfigFile still rewrites the file because:

Fix Action

Fix / Workaround

config.patch writes byte-identical JSON
File-watcher fires
[reload] config change detected lists 25+ "changed" plugin paths (false positives — bytes-different but semantics-same)
SIGUSR1 → gateway full-restart
In-flight Service Bus / WebSocket messages can't settle → dropped
Goto 1 on the next inbound message

PR fix notes

PR #75543: fix(config): skip writeConfigFile disk I/O when config is semantically unchanged

Repository: openclaw/openclaw
Author: Bartok9
State: open | merged: False
Link: https://github.com/openclaw/openclaw/pull/75543

Description (problem / solution / changelog)

Summary

When config.patch is called with a payload that results in no substantive change (only meta.lastTouchedAt / meta.lastTouchedVersion differ), the writeConfigFile function now returns early without touching disk.

Problem

On deployments where an admin layer calls config.patch on every inbound message (e.g., to register a per-message agent in allowAgents), a thrash loop occurs:

config.patch writes byte-different but semantically-identical JSON (only meta.lastTouchedAt changes)
File-watcher fires
[reload] config change detected lists 25+ false-positive "changed" plugin paths
SIGUSR1 → full gateway restart
In-flight Service Bus / WebSocket messages dropped
Repeat on next inbound message

Fix

After computing nextHash and previousHash, the function now:

Fast path: If byte-identical (nextHash === previousHash), return immediately
Semantic path: If hashes differ, strip volatile meta fields (lastTouchedAt, lastTouchedVersion) from both sides and re-compare. If the stable content is identical, short-circuit without disk I/O

When the short-circuit fires, no atomic rename, backup rotation, audit log, or suspicious-reason check is performed — there is nothing to record.

Test

Added regression test in io.write-config.test.ts that verifies:

Second write of identical config does not touch disk (mtime unchanged)
No "Config overwrite" log is emitted
Backup rotation is not triggered

Fixes #75534

Changed files

src/config/io.ts (modified, +33/-0)
src/config/io.write-config.test.ts (modified, +48/-0)

PR #219: fix(openclaw-config): break agent-create restart cascade (#193)

Repository: heypinchy/pinchy
Author: clemenshelm
State: closed | merged: True
Link: https://github.com/heypinchy/pinchy/pull/219

Description (problem / solution / changelog)

Summary

Closes the dominant remaining cause of #193: every settings save and agent-create triggered 15-30 s of "Agent runtime is not available" downtime, even after the #200 PR cluster (#201, #203, #205, #206) eliminated the original SECRETS_RELOADER_DEGRADED cause. Reproduced on staging 2026-05-01 and locally against the production-image stack.

Root causes addressed (in order of discovery)

channels.telegram.enabled ping-pong. OpenClaw writes enabled: true back on every gateway start during auto-enable. Pinchy's allow-list for telegram-block fields didn't include enabled, so the next regenerateOpenClawConfig stripped it → diff → restart → repeat.
plugins.entries.<provider> ping-pong. Same bug class, same trigger, just for OpenClaw-managed provider plugins (anthropic, openai, google, ollama-cloud). Fixed generically: preserve any plugins.entries.* entry that doesn't start with pinchy-.
meta.lastTouchedAt slipping past the byte-equal early-return. OpenClaw stamps a fresh timestamp on every write, so back-to-back regenerates without DB changes still produced different bytes → unnecessary config.apply RPC. Added a normalize-compare second early-return that ignores OpenClaw-managed metadata.
(workaround) env.* false-positive restart trigger — the dominant cause once 1–3 were fixed. OpenClaw's diffConfigPaths(snapshot, parsed) compares snapshot's runtime-resolved env (sk-ant-...) against Pinchy's template (${ANTHROPIC_API_KEY}); they never match, env.* always lands in changedPaths, and env.* has no rule in BASE_RELOAD_RULES → falls through to default full-restart trigger. Verified in OpenClaw 2026.4.12 (current pin) and 2026.4.29 (latest stable). Tracked upstream in openclaw#75534.
- Workaround: replace unchanged env values with OpenClaw's __OPENCLAW_REDACTED__ sentinel before config.apply. OpenClaw's restoreRedactedValues runs before the diff and substitutes the sentinel with snapshot's resolved value → no env diff → no restart. Plain templates kept for genuinely new env keys (legit restart). Cleanup tracked in #215.

Layered guardrails

Per the project rule for production bugs (Unit + Integration + Runtime-Validation or E2E):

5 unit tests in openclaw-config.test.ts exercising each fix and the workaround.
1 E2E test in e2e/telegram/agent-create-no-restart.spec.ts against the production-image stack (docker-compose.e2e.yml). Waits for the cold-start cascade to settle (log-scan based, more robust than the WS-connectivity check that caused #203's E2E to be dropped), then asserts no requires gateway restart / received SIGUSR1 / full process restart events appear in the OpenClaw log window after a POST /api/agents.

Validation

3344/3344 unit tests green
E2E agent-create-no-restart.spec.ts passes against the local production-image stack
RED ↔ GREEN toggle: temporarily reverted the env-redact workaround → E2E goes RED with the expected requires gateway restart (env.ANTHROPIC_API_KEY) log line; restored → GREEN
Other 13 Telegram E2E specs still pass; the 1 known-flaky @channel-restart test (already skipped in CI) is unrelated

Test plan

CI green
Manual click-through on staging after merge: create custom agent → immediately chat → no "Agent runtime is not available" banner, agent responds
Inspect staging OpenClaw logs after agent-create: only agents.list reload event, no SIGUSR1/restart

#193 — original cascade issue (this PR closes it)
#200 — original unknown agent id symptom (closed earlier; this finishes the broader story)
#215 — cleanup task once openclaw#75534 lands upstream
openclaw#75534 — upstream tracking for the env-diff bug

Changed files

docker-compose.test.yml (modified, +10/-0)
packages/web/e2e/telegram/agent-create-no-restart.spec.ts (added, +213/-0)
packages/web/src/__tests__/lib/openclaw-config.test.ts (modified, +390/-2)
packages/web/src/lib/openclaw-config.ts (modified, +159/-12)

PR #254: refactor(openclaw-config): split 1097-line module into focused sub-files

Repository: heypinchy/pinchy
Author: clemenshelm
State: open | merged: False
Link: https://github.com/heypinchy/pinchy/pull/254

Description (problem / solution / changelog)

Closes #233.

Summary

Splits packages/web/src/lib/openclaw-config.ts (1097 lines, 7 distinct concerns interleaved) into a directory of focused sub-modules. The import path @/lib/openclaw-config is unchanged — all 10 production importers and 14 test files keep working without edits.

File map

File	Concern
`build.ts`	`regenerateOpenClawConfig` — DB → openclaw.json
`targeted.ts`	`sanitizeOpenClawConfig`, `updateIdentityLinks`, `updateTelegramChannelConfig` (narrow-scope writes that bypass full regenerate)
`write.ts`	`writeConfigAtomic`, `readExistingConfig` (with the EACCES retry against `start-openclaw.sh`'s chmod loop), `pushConfigInBackground`
`normalize.ts`	`configsAreEquivalentUpToOpenClawMetadata`, `redactUnchangedEnvForApply`, `OPENCLAW_REDACTED_SENTINEL` — the openclaw#75534 workarounds, isolated so #215's removal becomes a one-file delete
`secrets-bundle.ts`	`buildSecretsBundle` helper + central home for the Pattern A/B/C secret-handling matrix from `CLAUDE.md`
`paths.ts`	`CONFIG_PATH` constant (separate file to avoid `write.ts` ↔ `normalize.ts` cycle on `redactUnchangedEnvForApply` ↔ `pushConfigInBackground`)
`index.ts`	Re-exports the four public symbols

Behaviour preservation (#193 constraints)

This is a pure code-organisation change — same runtime behaviour, same public API.

Trailing-newline format (JSON.stringify(...).trimEnd() + "\n") preserved so SHA256 hashes align with OpenClaw's writeConfigFile output (chokidar dedupe via lastAppliedWriteHash).
Byte-equal early-return + configsAreEquivalentUpToOpenClawMetadata semantics unchanged.
Sentinel redaction (OPENCLAW_REDACTED_SENTINEL / redactUnchangedEnvForApply) unchanged.
Atomic write + EACCES retry timing unchanged.
The byte-idempotency test from #193 still passes.

Test plan

tsc --noEmit — exit 0, no new errors
vitest run — 3407 passed, 12 skipped, 3 todo, 0 failed across 259 test files
Verified all @/lib/openclaw-config importers still resolve via the new index.ts
CI green
Manual smoke on staging: settings save → no spurious gateway restart (the #193 invariant)

Notes for reviewers

secrets-bundle.ts introduces a tiny buildSecretsBundle({...}) helper that wraps the four-field SecretsBundle literal previously inlined in regenerateOpenClawConfig. The bundle's resulting bytes are identical; the helper exists to give the secret-handling pattern matrix a stable home for future audit/rotation/validation logic.
paths.ts exists solely to break a cycle. If you'd prefer the constant inlined into write.ts and redactUnchangedEnvForApply taking existingContent as a parameter instead, happy to re-shape — but the cycle-avoidance via dedicated module felt cleaner.
One pre-existing flake in openclaw-config.test.ts ("skips file write and config.apply RPC when only meta.lastTouchedAt differs" / "keeps templates for new env keys not in existing config") was observed during verification. Reproduced ~30% on a clean main checkout (without this refactor). Flagged as a separate issue; not introduced by this PR.

#193 — original cascade issue (constraint: don't regress)
#215 — removal of the normalize-compare workaround when openclaw#75534 lands; once resolved, normalize.ts is deleted wholesale
openclaw/openclaw#75534 — upstream fix being tracked

Changed files

packages/web/src/lib/openclaw-config/build.ts (renamed, +9/-381)
packages/web/src/lib/openclaw-config/index.ts (added, +17/-0)
packages/web/src/lib/openclaw-config/normalize.ts (added, +94/-0)
packages/web/src/lib/openclaw-config/paths.ts (added, +5/-0)
packages/web/src/lib/openclaw-config/secrets-bundle.ts (added, +24/-0)
packages/web/src/lib/openclaw-config/targeted.ts (added, +189/-0)
packages/web/src/lib/openclaw-config/write.ts (added, +100/-0)

Code Example

Config overwrite: ... (sha256 X -> Y, backup=...)
[reload] config change detected; evaluating reload (plugins.entries.X.config, plugins.entries.Y.config, ...)
[reload] config change requires gateway restart (...)
[gateway] signal SIGUSR1 received

---

const json = JSON.stringify(stampedOutputConfig, null, 2).trimEnd().concat("\n");
const nextHash = hashConfigRaw(json);
const previousHash = resolveConfigSnapshotHash(snapshot);

---

// No-op short-circuit: bytes are semantically identical to disk. Skip the
// disk write to avoid the file-watcher → SIGUSR1 cascade. The audit
// trail, "Config overwrite" log, and suspicious-reasons check all become
// noise on hash-equal writes.
if (snapshot.exists && previousHash !== null && nextHash === previousHash) {
  return { persistedHash: nextHash, persistedConfig: stampedOutputConfig };
}

RAW_BUFFERClick to expand / collapse

Summary

config.patch writes byte-identical JSON
File-watcher fires
[reload] config change detected lists 25+ "changed" plugin paths (false positives — bytes-different but semantics-same)
SIGUSR1 → gateway full-restart
In-flight Service Bus / WebSocket messages can't settle → dropped
Goto 1 on the next inbound message

Symptom signature in container logs:

Config overwrite: ... (sha256 X -> Y, backup=...)
[reload] config change detected; evaluating reload (plugins.entries.X.config, plugins.entries.Y.config, ...)
[reload] config change requires gateway restart (...)
[gateway] signal SIGUSR1 received

The reload "changed paths" all turn out to be no-ops on inspection.

Reproduction

The deserialize → modify → JSON.stringify round-trip can produce different byte ordering / formatting than what's on disk
Any field that has a different default initialization order between Zod's parse and the legacy on-disk representation will differ on byte
Even map iteration order can affect output JSON

Result: every "no-op" patch hits disk, triggering the gateway file-watcher cascade.

Proposed fix

src/config/io.ts already computes both hashes around line 1735-1739:

const json = JSON.stringify(stampedOutputConfig, null, 2).trimEnd().concat("\n");
const nextHash = hashConfigRaw(json);
const previousHash = resolveConfigSnapshotHash(snapshot);

Add a no-op short-circuit immediately after:

// No-op short-circuit: bytes are semantically identical to disk. Skip the
// disk write to avoid the file-watcher → SIGUSR1 cascade. The audit
// trail, "Config overwrite" log, and suspicious-reasons check all become
// noise on hash-equal writes.
if (snapshot.exists && previousHash !== null && nextHash === previousHash) {
  return { persistedHash: nextHash, persistedConfig: stampedOutputConfig };
}

The audit appender + logger are intentionally skipped in this branch — there's nothing to audit (no rename, no permission tighten, no metadata change). Tests can guard against accidental skipping by exercising OPENCLAW_TEST_CONFIG_OVERWRITE_LOG=1 with a hash-different payload.

Why this is a safe change

The function's existing return contract is { persistedHash, persistedConfig }. Both values are well-defined in the no-op branch (hash is the existing one; config is what we computed).
Backup rotation, permissions tightening, atomic rename — none of those have side effects worth running when the bytes don't change.
Audit log is also a no-op semantically (nothing changed). Skipping it is correct.
File-watcher will not fire — which is the entire point.

Downstream impact

In a multi-tenant deployment where ~10 inbound messages/hour arrive per tenant, the current behavior causes ~10 gateway restarts/hour, dropping a fraction of in-flight messages each time. With this fix, restarts drop to ~0/hour during normal operation (only real config changes trigger reloads).

References

Affected file: src/config/io.ts (function writeConfigFile internal callback applyConfigToConfigPath-equivalent, around line 1700-1900)
Hash helpers: hashConfigRaw, resolveConfigSnapshotHash (already exist in same module)
File-watcher: [reload] config change detected log emitted by the gateway's config reloader (separate codepath, doesn't need changing — fixing the writer is sufficient)

I have a downstream consumer (https://github.com/sidekykai/openclawmtsaas) hitting this in production. Happy to draft a PR + tests if the maintainers agree on the approach.

extent analysis

TL;DR

The proposed fix involves adding a no-op short-circuit in src/config/io.ts to skip disk writes when the computed hashes are equal, preventing unnecessary file-watcher triggers and gateway restarts.

Guidance

Verify that the nextHash and previousHash computations are correct and consistent with the expected behavior.
Implement the proposed no-op short-circuit in src/config/io.ts to skip disk writes when the hashes are equal.
Test the change with the OPENCLAW_TEST_CONFIG_OVERWRITE_LOG=1 environment variable to ensure the audit log is not skipped unnecessarily.
Monitor the downstream impact on gateway restarts and in-flight message drops to confirm the fix is effective.

Example

if (snapshot.exists && previousHash !== null && nextHash === previousHash) {
  return { persistedHash: nextHash, persistedConfig: stampedOutputConfig };
}

Notes

The proposed fix assumes that the hashConfigRaw and resolveConfigSnapshotHash functions are correctly implemented and produce consistent results. Additionally, the fix may not apply to all scenarios, such as when the config file is modified externally.

Recommendation

Apply the proposed workaround by adding the no-op short-circuit in src/config/io.ts, as it is a targeted fix that addresses the specific issue of unnecessary gateway restarts and in-flight message drops.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#request error #file not found #serialization error #model compatibility #GPU setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix config: skip writeConfigFile when nextHash === previousHash (no-op short-circuit) [3 pull requests, 1 comments, 2 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

PR fix notes

PR #75543: fix(config): skip writeConfigFile disk I/O when config is semantically unchanged

Description (problem / solution / changelog)

Summary

Problem

Fix

Test

Changed files

PR #219: fix(openclaw-config): break agent-create restart cascade (#193)

Description (problem / solution / changelog)

Summary

Root causes addressed (in order of discovery)

Layered guardrails

Validation

Test plan

Related

Changed files

PR #254: refactor(openclaw-config): split 1097-line module into focused sub-files

Description (problem / solution / changelog)

Summary

File map

Behaviour preservation (#193 constraints)

Test plan

Notes for reviewers

Related

Changed files

Code Example

Summary

Reproduction

Proposed fix

Why this is a safe change

Downstream impact

References

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING