openclaw - ✅(Solved) Fix [Bug]: Skill toggle triggers full gateway restart with plugins.entries.*.config in reload diff [1 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#72061Fetched 2026-04-27 05:35:29
View on GitHub
Comments
2
Participants
3
Timeline
7
Reactions
0
Timeline (top)
commented ×2referenced ×2closed ×1cross-referenced ×1

On 2026.4.24, toggling a single skill in WebUI causes a full gateway process restart. The reload evaluator's diff includes plugins.entries.{browser,xai,openai,openrouter}.config even though the user only changed a skill enabled flag. Result: every skill toggle disconnects all webchat clients (code=1012 reason=service restart) and incurs a ~7–10s gateway restart cycle.

Root Cause

On 2026.4.24, toggling a single skill in WebUI causes a full gateway process restart. The reload evaluator's diff includes plugins.entries.{browser,xai,openai,openrouter}.config even though the user only changed a skill enabled flag. Result: every skill toggle disconnects all webchat clients (code=1012 reason=service restart) and incurs a ~7–10s gateway restart cycle.

Fix Action

Workaround

Not toggling skills from the UI during an active session.

PR fix notes

PR #72137: fix(gateway): collapse phantom diff entries from empty-container normalization (#72061)

Description (problem / solution / changelog)

Summary

  • Problem: every skill toggle on 2026.4.24 forces a full gateway restart with plugins.entries.{browser,xai,openai,openrouter}.config (and other unedited paths) appearing in the reload diff. All webchat clients drop on code=1012 reason=service restart.
  • Why it matters: the on-disk config blocks for those plugin entries contain only enabled: true. Runtime-side normalization injects an empty config: {} into the post-write source snapshot, so a no-op edit looks like a write to many sibling paths and the reload plan classifies them as restart-requiring (per buildGatewayReloadPlan's default-restart fallback).
  • What changed: diffConfigPaths now treats undefined, {}, and [] as equivalent "no value" shapes via a new isEmptyConfigBranch helper. Skill toggles only report skills.entries.<id>.enabled and stay on the hot-reload path.
  • What did NOT change (scope boundary): no edits to buildGatewayReloadPlan, the in-process write protocol, the source/runtime snapshot split, the skills-snapshot invalidation prefixes, or any caller of diffConfigPaths. requireApiKey, secrets/auth surfaces, and CODEOWNERS-restricted paths are untouched.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #72061
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: diffConfigPaths recurses into plain objects and falls through to return [prefix] when one side is undefined and the other is a non-empty (or empty) container. Equivalent "no value" shapes (missing field, {}, []) therefore produce phantom path entries. Write-time config normalization in the gateway can introduce empty config: {} entries for plugins.entries.<id> (and similar siblings) that the on-disk source omits, so every skill toggle write produces phantom diffs at those siblings. buildGatewayReloadPlan then treats plugins.entries.<id>.config as an unknown restart-requiring path, so even a no-op write triggers gateway restart, dropping all webchat sockets.
  • Missing detection / guardrail: no test in src/gateway/config-reload.test.ts covered the missing-vs-empty-container case for diffConfigPaths, and the skill-toggle integration path never asserted that only skills.entries.<id>.enabled should appear in the diff when sibling plugin entries are normalized.
  • Contributing context (if known): the existing array-equality short-circuit (isDeepStrictEqual for two arrays) covered the symmetrical-arrays case but not the asymmetric undefined{}[] cases that runtime-side normalization introduces.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/gateway/config-reload.test.ts (describe("diffConfigPaths"))
  • Scenario the test should lock in: four cases under the existing diffConfigPaths block —
    • undefined field vs empty plain object on the same path returns no paths,
    • undefined field vs empty array returns no paths,
    • empty plain object vs empty array returns no paths,
    • a missing-to-populated container still returns a change at the parent path,
    • the full #72061 scenario (skill toggle plus four sibling plugins.entries.<id>.config: {} injections plus an empty messages: {} branch) returns exactly ["skills.entries.coding-agent.enabled"].
  • Why this is the smallest reliable guardrail: diffConfigPaths is the single source of truth for the changed-paths list that drives buildGatewayReloadPlan. Asserting equivalence at the comparator boundary covers every caller (in-process writes, watcher reads, and the recovery path) without needing integration plumbing.
  • Existing test that already covers this (if any): N/A.
  • If no new test is added, why not: 5 new test cases added (4 micro-cases + 1 full-issue reproduction).

User-visible / Behavior Changes

  • Toggling a skill no longer triggers a full gateway restart and no longer drops active webchat clients with code=1012 reason=service restart. Hot-reload still fires for the skills snapshot. No config schema, contract, or default changes.

Diagram (if applicable)

Before:
[skill toggle] -> writeConfigFile -> notifyRuntimeConfigWriteListeners
  -> diffConfigPaths(prev_source, next_source)
       -> reports skills.entries.<id>.enabled
       -> phantom-reports plugins.entries.{browser,xai,openai,openrouter}.config
       -> phantom-reports messages, logging.redactSensitive, agents.defaults.*, ...
  -> buildGatewayReloadPlan -> restartGateway=true (unknown plugin paths)
  -> SIGUSR1 -> all webchat clients disconnect (code=1012)

After:
[skill toggle] -> writeConfigFile -> notifyRuntimeConfigWriteListeners
  -> diffConfigPaths(prev_source, next_source)
       -> isEmptyConfigBranch short-circuits undefined ≈ {} ≈ []
       -> reports only skills.entries.<id>.enabled
  -> buildGatewayReloadPlan -> restartGateway=false, restartHeartbeat=false
  -> hot reload only; webchat clients stay connected

Security Impact (required)

  • New permissions/capabilities? No
  • Secrets/tokens handling changed? No
  • New/changed network calls? No
  • Command/tool execution surface changed? No
  • Data access scope changed? No

The fix is contained to the in-memory comparator that drives reload classification. All existing restart triggers (gateway.auth.token, gateway.auth.mode, gateway.port, plugin enabled/disabled, etc.) continue to fire, because those produce real (non-empty) value diffs and bypass the new short-circuit.

Repro + Verification

Environment

  • OS: Linux 6.x (Debian/Ubuntu) inside Docker, also reproduced on Windows 11 dev host
  • Runtime/container: Node 24.11.0, pnpm 10.20.0
  • Model/provider: irrelevant — bug is in config-reload pre-model
  • Integration/channel (if any): WebUI Skills page → toggle any skill
  • Relevant config (redacted): plugins.entries.{browser,xai,openai,openrouter} containing only enabled: true (no nested config block)

Steps

  1. pnpm install
  2. Reproduce on main (pre-fix): add the new test cases and run pnpm test src/gateway/config-reload.test.tstreats missing fields and empty plain objects as equivalent and the full #72061 scenario fail.
  3. Apply this branch: pnpm test src/gateway/config-reload.test.ts → 68/68 pass.
  4. Targeted gates:
    • pnpm tsgo:core → clean
    • pnpm tsgo:core:test → clean
    • pnpm lint → 0 warnings, 0 errors
    • pnpm format → no changes
    • pnpm check:changed → exit 0

Expected

  • The skill-toggle reload diff lists skills.entries.<id>.enabled only.
  • buildGatewayReloadPlan produces a hot-reload plan, not a restart plan.

Actual

  • Matches expected on the new tests.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

The new full-#72061 test reconstructs the issue's reload-evaluator log verbatim and asserts the changed-paths list collapses to ["skills.entries.coding-agent.enabled"]. Without the fix the test reports the four phantom plugins.entries.<id>.config paths and the unrelated logging/models/agents/messages paths, exactly matching the issue body.

Human Verification (required)

  • Verified scenarios:
    • Pre-fix: reproduced phantom-diff classification by running the new test against unfixed source — fails with the four plugins.entries.<id>.config phantom paths.
    • Post-fix: full pnpm test src/gateway/config-reload.test.ts (68 tests) passes locally.
    • Lint/format clean across the full lint shards (pnpm lint).
    • Type-checks pass for both tsgo:core and tsgo:core:test.
  • Edge cases checked:
    • Missing-to-populated container (undefined{ trustedHosts: ["..."] }) still reports the change at the parent path.
    • Existing duplicate-path emission for plugin install timestamps is unchanged.
    • Existing array-equality short-circuit and array-of-objects diff continue to behave identically.
    • The skills-snapshot invalidation prefixes still fire for skills.* (not affected).
  • What I did not verify: live gateway restart behavior end-to-end (no Docker harness available locally). The behavior is asserted at the seam that drives the reload plan, which is the same boundary the issue's logs print from.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? Yes — all existing classification behavior is preserved; only equivalent "no value" shapes are collapsed.
  • Config/env changes? No
  • Migration needed? No

Risks and Mitigations

  • Risk: a caller of diffConfigPaths somewhere in the codebase relies on the previous behavior of treating undefined{} for empty containers.
    • Mitigation: searched callers; the function is only used to drive the gateway reload plan and the skills-invalidation check. Both consume the result as "did anything meaningful change?", which is preserved. The added test asserts that a populated container is still reported when transitioning from missing.
  • Risk: a future config field uses {} and undefined as semantically distinct values.
    • Mitigation: today no schema in src/config/types.openclaw.js distinguishes them, and the equivalence is conventional across the runtime-snapshot/source-config split. If that ever changes, the helper is the single place to refine.

🤖 AI-assisted (Claude Code). Test level: fully tested via pnpm test src/gateway/config-reload.test.ts. I understand the change.

Changed files

  • src/gateway/config-reload.test.ts (modified, +72/-0)
  • src/gateway/config-reload.ts (modified, +25/-0)

Code Example

08:08:37 [reload] skills snapshot invalidated by config change (skills.entries.coding-agent.enabled)
08:08:37 [reload] config change detected; evaluating reload (logging.redactSensitive, models.providers.github-copilot.models, agents.defaults.maxConcurrent, agents.defaults.subagents, skills.entries.coding-agent.enabled, plugins.entries.browser.config, plugins.entries.xai.config, plugins.entries.openai.config, plugins.entries.openrouter.config, messages)
08:08:37 [reload] config change requires gateway restart (plugins.entries.browser.config, plugins.entries.xai.config, plugins.entries.openai.config, plugins.entries.openrouter.config)
08:08:37 [gateway] signal SIGUSR1 received
08:08:37 [gateway] received SIGUSR1; restarting
08:08:38 [ws] webchat disconnected code=1012 reason=service restart conn=c1689ec7-7a60-4345-8b39-fb6cfdd4dcba
08:08:38 [gateway] restart mode: full process restart (spawned pid 3160)

---

08:12:02 [reload] skills snapshot invalidated by config change (skills.entries.session-logs.enabled)
08:12:02 [reload] config change detected; evaluating reload (logging.redactSensitive, models.providers.github-copilot.models, agents.defaults.maxConcurrent, agents.defaults.subagents, skills.entries.session-logs.enabled, plugins.entries.browser.config, plugins.entries.xai.config, plugins.entries.openai.config, plugins.entries.openrouter.config, messages)
08:12:02 [reload] config change requires gateway restart (plugins.entries.browser.config, plugins.entries.xai.config, plugins.entries.openai.config, plugins.entries.openrouter.config)
08:12:02 [gateway] signal SIGUSR1 received
08:12:02 [gateway] received SIGUSR1; restarting
08:12:02 [ws] webchat disconnected code=1012 reason=service restart conn=3b418e1e-a7ef-4213-ac98-e60ea8a5e53f
08:12:02 [gateway] restart mode: full process restart (spawned pid 103)

---
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

On 2026.4.24, toggling a single skill in WebUI causes a full gateway process restart. The reload evaluator's diff includes plugins.entries.{browser,xai,openai,openrouter}.config even though the user only changed a skill enabled flag. Result: every skill toggle disconnects all webchat clients (code=1012 reason=service restart) and incurs a ~7–10s gateway restart cycle.

Steps to reproduce

  1. Start gateway 2026.4.24
  2. Open WebUI → Skills page
  3. Toggle any skill on/off
  4. Observe gateway log

Expected behavior

A skill enabled toggle should only invalidate the skills snapshot. No full gateway restart, no webchat disconnect.

Actual behavior

Each skill toggle produces logs like the following (verbatim from docker logs, two occurrences captured in the same session):

08:08:37 [reload] skills snapshot invalidated by config change (skills.entries.coding-agent.enabled)
08:08:37 [reload] config change detected; evaluating reload (logging.redactSensitive, models.providers.github-copilot.models, agents.defaults.maxConcurrent, agents.defaults.subagents, skills.entries.coding-agent.enabled, plugins.entries.browser.config, plugins.entries.xai.config, plugins.entries.openai.config, plugins.entries.openrouter.config, messages)
08:08:37 [reload] config change requires gateway restart (plugins.entries.browser.config, plugins.entries.xai.config, plugins.entries.openai.config, plugins.entries.openrouter.config)
08:08:37 [gateway] signal SIGUSR1 received
08:08:37 [gateway] received SIGUSR1; restarting
08:08:38 [ws] webchat disconnected code=1012 reason=service restart conn=c1689ec7-7a60-4345-8b39-fb6cfdd4dcba
08:08:38 [gateway] restart mode: full process restart (spawned pid 3160)
08:12:02 [reload] skills snapshot invalidated by config change (skills.entries.session-logs.enabled)
08:12:02 [reload] config change detected; evaluating reload (logging.redactSensitive, models.providers.github-copilot.models, agents.defaults.maxConcurrent, agents.defaults.subagents, skills.entries.session-logs.enabled, plugins.entries.browser.config, plugins.entries.xai.config, plugins.entries.openai.config, plugins.entries.openrouter.config, messages)
08:12:02 [reload] config change requires gateway restart (plugins.entries.browser.config, plugins.entries.xai.config, plugins.entries.openai.config, plugins.entries.openrouter.config)
08:12:02 [gateway] signal SIGUSR1 received
08:12:02 [gateway] received SIGUSR1; restarting
08:12:02 [ws] webchat disconnected code=1012 reason=service restart conn=3b418e1e-a7ef-4213-ac98-e60ea8a5e53f
08:12:02 [gateway] restart mode: full process restart (spawned pid 103)

In both cases the user only modified skills.entries.<name>.enabled. The four plugins.entries.*.config paths were not changed in the on-disk config but still appeared in the reload diff and forced a full restart.

Config note

The on-disk plugins.entries.{browser,xai,openai,openrouter} blocks contain only enabled: true (with no nested config object for xai, openai, openrouter). Whatever runtime value the reload evaluator is comparing against, it differs from the on-disk source even when no edits were made to those paths.

OpenClaw version

2026.4.24

Operating system

Debian GNU/Linux 12 (bookworm) inside Docker on Ubuntu 24.04 host. Base image: ghcr.io/openclaw/openclaw:2026.4.24 (digest sha256:7c4370ff8777555d4c9fe5ab821aaaad7c87188d389a6cf761270725d96ec3e9).

Install method

Docker (ghcr.io/openclaw/openclaw)

Model

openai-codex/gpt-5.4

Provider / routing chain

openai-codex

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

Every skill toggle in the UI triggers a full gateway restart, dropping all WebSocket connections. Users perceive instability even though no crash occurs (exitCode=0, OOMKilled=false). Programmatic skill enable/disable workflows compound the latency.

Additional information

Workaround

Not toggling skills from the UI during an active session.

extent analysis

TL;DR

The issue can be fixed by ensuring that the reload evaluator correctly handles changes to skill enabled flags without triggering a full gateway restart.

Guidance

  • Investigate the reload evaluator's diff calculation to determine why plugins.entries.{browser,xai,openai,openrouter}.config are included even when only the skills.entries.<name>.enabled flag is changed.
  • Verify that the on-disk config for plugins.entries.{browser,xai,openai,openrouter} only contains enabled: true and no nested config object, as stated in the config note.
  • Check the runtime values used by the reload evaluator to compare against the on-disk config, as they seem to differ even when no edits are made to those paths.
  • Consider implementing a workaround to prevent full gateway restarts when toggling skills, such as only invalidating the skills snapshot when the enabled flag is changed.

Example

No code snippet is provided as the issue seems to be related to the internal workings of the reload evaluator and the gateway restart mechanism.

Notes

The issue is specific to the 2026.4.24 version of OpenClaw, and the fix may involve changes to the reload evaluator or the gateway restart logic.

Recommendation

Apply a workaround to prevent full gateway restarts when toggling skills, as the root cause of the issue is not immediately clear and may require further investigation.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

A skill enabled toggle should only invalidate the skills snapshot. No full gateway restart, no webchat disconnect.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING