openclaw - 💡(How to fix) Fix [Bug]: openclaw doctor --fix 4-5x slower on 2026.5.20 vs 2026.5.19 (55s → 229s+) — session snapshot path traversal bottleneck

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

openclaw doctor --fix is 4-5x slower on 2026.5.20 (e510042) compared to 2026.5.19 (a185ca2). On a production VPS (Oracle Cloud, Ubuntu, 4 vCPU, 24GB RAM), the same command went from 55 seconds (5.19) to over 229 seconds (5.20, second run) — the first run on 5.20 timed out after 300 seconds.

Root Cause

The bottleneck is in the Session snapshots section of doctor. The log shows:

- Found 90 sessions with stale cached session metadata paths.
  ...and 1567 more stale cached paths.

Each session stores skillsSnapshot.prompt paths that reference the old OpenClaw installation root (~/.local/lib/node_modules/openclaw/skills/...). On 5.20, doctor validates ALL 1567+ cached paths by resolving symlinks and checking file existence. This is CPU-bound (not I/O-bound):

  • During the 229s run: time shows 196s user / 20s sys
  • Two concurrent doctor processes each consume 100% CPU with 10 threads
  • The path validation is pure JS string/symlink resolution, no disk I/O bottleneck

Additionally, the gateway health check adds latency: doctor restarts the gateway during the process, and the 10s gateway timeout on the VPS adds overhead.

Fix Action

Fix / Workaround

  1. Have a production setup with ~90 agent sessions with stale skillsSnapshot metadata across multiple agents (coder, pepper, security, etc.)
  2. Run: time openclaw doctor --fix on 2026.5.19 → observe ~55s
  3. Run: openclaw update to upgrade to 2026.5.20
  4. Run: time openclaw doctor --fix on 2026.5.20 → observe 3-5x slowdown
  5. The slowdown persists on repeated runs (no caching benefit)

Code Example

- Found 90 sessions with stale cached session metadata paths.
  ...and 1567 more stale cached paths.
RAW_BUFFERClick to expand / collapse

Bug Type

Performance Regression

Summary

openclaw doctor --fix is 4-5x slower on 2026.5.20 (e510042) compared to 2026.5.19 (a185ca2). On a production VPS (Oracle Cloud, Ubuntu, 4 vCPU, 24GB RAM), the same command went from 55 seconds (5.19) to over 229 seconds (5.20, second run) — the first run on 5.20 timed out after 300 seconds.

Benchmark Data

All tests on the same VPS (Oracle Cloud, Ubuntu 24.04, 4 vCPU, 24GB RAM, NVMe SSD):

ScenarioVersionReal TimeUser TimeSys Time
doctor --fix (steady state)2026.5.19 (a185ca2)55s38s7.6s
doctor --fix (first run after upgrade)2026.5.20 (e510042)>300s (TIMEOUT)
doctor --fix (second run, same version)2026.5.20 (e510042)229s196s20s

Environment Details

  • OS: Ubuntu 24.04 LTS (Linux 6.8.0-1021-oracle, aarch64)
  • CPU: 4 vCPU (Ampere Altra)
  • RAM: 24GB
  • Disk: NVMe SSD
  • Node: v24.15.0 (via nvm)
  • OpenClaw install: npm global under nvm
  • Install method: openclaw update from 5.19 → 5.20

Steps to Reproduce

  1. Have a production setup with ~90 agent sessions with stale skillsSnapshot metadata across multiple agents (coder, pepper, security, etc.)
  2. Run: time openclaw doctor --fix on 2026.5.19 → observe ~55s
  3. Run: openclaw update to upgrade to 2026.5.20
  4. Run: time openclaw doctor --fix on 2026.5.20 → observe 3-5x slowdown
  5. The slowdown persists on repeated runs (no caching benefit)

Root Cause Analysis

The bottleneck is in the Session snapshots section of doctor. The log shows:

- Found 90 sessions with stale cached session metadata paths.
  ...and 1567 more stale cached paths.

Each session stores skillsSnapshot.prompt paths that reference the old OpenClaw installation root (~/.local/lib/node_modules/openclaw/skills/...). On 5.20, doctor validates ALL 1567+ cached paths by resolving symlinks and checking file existence. This is CPU-bound (not I/O-bound):

  • During the 229s run: time shows 196s user / 20s sys
  • Two concurrent doctor processes each consume 100% CPU with 10 threads
  • The path validation is pure JS string/symlink resolution, no disk I/O bottleneck

Additionally, the gateway health check adds latency: doctor restarts the gateway during the process, and the 10s gateway timeout on the VPS adds overhead.

Expected Behavior

doctor --fix should complete within a reasonable timeframe (~1 minute) regardless of session count or stale path accumulation.

Actual Behavior

doctor --fix takes 4-5x longer on 5.20, and the first run after upgrade can timeout (>300s).

Impact

  • openclaw update hangs during the doctor phase, making upgrades feel broken
  • openclaw doctor --fix interrupts production workflows for 3+ minutes
  • Users with many agents and sessions (common in multi-agent deployments) are most affected

Suggested Fixes

  1. Cache the path validation result: Once a skills root is validated as healthy, skip re-validation for all sessions using that root
  2. Batch the stale path check: Instead of emitting 1567+ individual warnings, aggregate identical invalid roots
  3. Parallelize session validation: Sessions are independent — validate them concurrently
  4. Add a progress indicator: Show which phase is running so users know it hasn't hung
  5. Fix the root cause: The stale path accumulation (from migrating between different installation methods / versions) should be cleaned up automatically rather than re-checked every doctor run

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: openclaw doctor --fix 4-5x slower on 2026.5.20 vs 2026.5.19 (55s → 229s+) — session snapshot path traversal bottleneck