openclaw - ✅(Solved) Fix Skills snapshot not invalidated on /restart or gateway restart [3 pull requests, 1 participants]

jensen-srp · 2026-03-26T05:48:46Z

[openclaw] PR 54969: fix: invalidate stale skillsSnapshot on gateway restart - Repository: openclaw/openclaw - Author: arkyu2077 - State: closed | merged: Fals… # PR #54969: fix: invalidate stale skillsSnapshot on gateway restart - Repository: openclaw/openclaw - Author: arkyu2077 - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/54969 ## Description (problem / solution / changelog) ## Summary The `skillsSnapshot` cached in `sessions.json` was never invalidated on gateway restart because the staleness check only looked at whether a snapshot existed, not whether it was outdated. ## Changes - Moved `skillsSnapshotVersion` computation before the `needsSkillsSnapshot` check - Compare the cached `snapshotVersion` against the current `getSkillsSnapshotVersion()` to detect stale snapshots and rebuild them ## Testing - Adding a new skill and restarting now correctly picks up the new skill in existing sessions. Fixes openclaw/openclaw#54938 ## Changed files - `src/agents/agent-command.ts` (modified, +3/-1) --- # PR #55021: fix: invalidate skills snapshot on gateway restart - Repository: openclaw/openclaw - Author: arkyu2077 - State: closed | merged: False - Link: https://github.com/openclaw/openclaw/pull/55021 ## Description (problem / solution / changelog) Fixes #54938 Skills snapshot cached in sessions was never refreshed on `/restart` or gateway restart. Now: (1) bump the global skills snapshot version before SIGUSR1, and (2) compare snapshot version in agent-command.ts so stale snapshots get rebuilt. ## Changed files - `src/agents/agent-command.ts` (modified, +5/-1) - `src/infra/restart.ts` (modified, +2/-0) --- # PR #67401: fix(stability): session skills snapshot, tool-loop guard, TUI watchdog, LM Studio preload backoff - Repository: openclaw/openclaw - Author: xantorres - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/67401 ## Description (problem / solution / changelog) ## Summary Four stability fixes for issues hit during a single self-hosted OpenClaw + LM Studio debugging session. All are reproducible, low-surface, and include unit tests. - Problem: disabling a bundled skill in config still left the model calling it, producing infinite \`Tool X not found\` loops until the embedded-run timeout. Root cause: the \`skillsSnapshot\` persisted in \`sessions.json\` was never invalidated when \`skills.*\` config changed. - Problem: the \`unknownToolThreshold\` stream guard was gated behind \`tools.loopDetection.enabled\`, which defaults to \`false\`. The protection against hallucinated / removed tool calls was effectively off in the stock config. - Problem: the TUI \`streaming · Xm Ys\` indicator never reset when the gateway's \`state: "final"\` event was lost (WS reconnect, gateway restart, etc.), leaving the TUI stuck indefinitely until killed. - Problem: LM Studio's memory guardrail rejecting \`POST /v1/models/load\` caused OpenClaw to re-hit the endpoint on every chat request (~every 2s), producing hundreds of WARN log lines per hour without useful retry semantics. - Why it matters: each of these failure modes amplifies any other local-model hiccup into a session-long stuck state that users have to recover manually. - What changed: - \`src/gateway/config-reload.ts\`: bump \`skillsSnapshotVersion\` when a config diff touches \`skills.*\`, via a new \`shouldInvalidateSkillsSnapshotForPaths\` helper wired into the single \`applySnapshot\` code path (covers both watcher writes and in-process \`config.apply\`). - \`src/agents/pi-embedded-runner/run/attempt.ts\`: make \`resolveUnknownToolGuardThreshold\` always return a positive threshold (default 10) regardless of \`tools.loopDetection.enabled\`. The guard is a pure safety net with no false-positive surface. - \`src/tui/tui-event-handlers.ts\`: add a 30s delta-silence watchdog that resets \`activityStatus\` to \`idle\` on timeout and surfaces a short system-log note; exposes \`dispose()\` + configurable \`streamingWatchdogMs\` context option. - \`extensions/lmstudio/src/stream.ts\`: add per-\`(baseUrl, modelKey, contextLength)\` cooldown (5s → 10s → 20s → … → 5min cap) after preload failures; during cooldown the wrapper skips preload entirely and runs the inference stream directly (the model is often already loaded via LM Studio's UI). The log line now carries consecutive-failure count and remaining cooldown. - What did NOT change (scope boundary): no schema additions, no new config keys (except the TUI \`streamingWatchdogMs\` developer option), no changes to the public Plugin SDK, no touches to session persistence format, no changes to the \`detectToolCallLoop\` / \`before-tool-call\` dispatcher path. ## Change Type (select all) - [x] Bug fix - [ ] Feature - [ ] Refactor required for the fix - [ ] Docs - [ ] Security hardening - [ ] Chore/infra ## Scope (select all touched areas) - [x] Gateway / orchestration - [x] Skills / tool execution - [ ] Auth / tokens - [ ] Memory / storage - [x] Integrations -

openclaw2026-03-26 05:48:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#54938•Fetched 2026-04-08 01:34:18

View on GitHub

Comments

Participants

Timeline

Reactions

Author

jensen-srp

Participants

jensen-srp

Timeline (top)

cross-referenced ×2referenced ×1

RAW_BUFFERClick to expand / collapse

Bug

When a new skill is added (either via config allowlist or by placing it in a workspace/managed skills directory), the skillsSnapshot cached in sessions.json is not invalidated by /restart or gateway restart. The stale snapshot persists and the new skill never appears in the session prompt.

Steps to Reproduce

Have an existing session with a cached skillsSnapshot in sessions.json
Add a new skill to the workspace skills directory (e.g. <workspace>/skills/my-skill/SKILL.md)
Add the skill name to the agent's skills allowlist in config (if applicable)
Gateway restart (SIGUSR1) or /restart command
Send a message in the session

Expected: The skill appears in the session's available skills list
Actual: The skill does not appear. The stale skillsSnapshot from sessions.json is reused.

Evidence

Gateway logs confirm the skill is loaded (Discord slash command count increased from 133 to 134)
The skillsSnapshot field in sessions.json for the session does not contain the new skill
Multiple gateway restarts and /restart commands did not clear the snapshot
Manually deleting the skillsSnapshot key from sessions.json and restarting fixed the issue

Environment

OpenClaw 2026.3.24
macOS (arm64)
Multi-agent setup with per-agent skills allowlist

Workaround

Manually delete the skillsSnapshot field from the affected session in <agentDir>/sessions/sessions.json and restart the gateway.

Suggested Fix

/restart (and gateway restart) should invalidate the skillsSnapshot so it rebuilds on the next turn. Alternatively, compare a hash of the current eligible skills against the cached snapshot and refresh if stale.

extent analysis

Fix Plan

To fix the issue, we need to invalidate the skillsSnapshot when a new skill is added or when the gateway restarts. Here are the steps:

Modify the /restart command to clear the skillsSnapshot field from the sessions.json file.
Update the gateway restart logic to also clear the skillsSnapshot field.
Alternatively, implement a hash comparison to refresh the skillsSnapshot if it's stale.

Example Code

import json
import os

def invalidate_skills_snapshot(session_id, agent_dir):
    sessions_file = os.path.join(agent_dir, 'sessions', 'sessions.json')
    with open(sessions_file, 'r+') as f:
        sessions = json.load(f)
        if session_id in sessions:
            sessions[session_id].pop('skillsSnapshot', None)
            f.seek(0)
            json.dump(sessions, f)
            f.truncate()

# Call this function when the /restart command is executed or when the gateway restarts
invalidate_skills_snapshot('session_id', '/path/to/agent/dir')

Verification

To verify that the fix worked, follow these steps:

Add a new skill to the workspace skills directory.
Restart the gateway or execute the /restart command.
Send a message in the session and check if the new skill appears in the available skills list.
Verify that the skillsSnapshot field in sessions.json has been updated or cleared.

Extra Tips

Make sure to handle any potential errors when reading or writing to the sessions.json file.
Consider implementing a more robust caching mechanism that can handle changes to the skills directory.
Test the fix thoroughly to ensure that it works as expected in different scenarios.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#latency issue #model loading #dependency error #configuration error #agent setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

openclaw - ✅(Solved) Fix Skills snapshot not invalidated on /restart or gateway restart [3 pull requests, 1 participants]

Recommended Tools

GitHub issue graph ai analysis

Fix Action

Workaround

PR fix notes

PR #54969: fix: invalidate stale skillsSnapshot on gateway restart

Description (problem / solution / changelog)

Summary

Changes

Testing

Changed files

PR #55021: fix: invalidate skills snapshot on gateway restart

Description (problem / solution / changelog)

Changed files

PR #67401: fix(stability): session skills snapshot, tool-loop guard, TUI watchdog, LM Studio preload backoff

Description (problem / solution / changelog)

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Test Plan

Reviewer notes

Changed files

Bug

Steps to Reproduce

Evidence

Environment

Workaround

Suggested Fix

extent analysis

Fix Plan

Example Code

Verification

Extra Tips

Still need to ship something?

RELATED_DISCOVERY

TRENDING