openclaw - 💡(How to fix) Fix [Bug]: Session file lock not released properly by watchdog

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Session write lock files persist beyond maxHoldMs timeout; watchdog fails to reclaim stale locks, causing "session file locked" errors on subsequent requests.

Error Message

  1. Observe "session file locked" error Error message: "session file locked (timeout 60000ms): pid=16834 /path/to/session.jsonl.lock"

Root Cause

Possible root causes:

  1. Watchdog timer not properly checking lock expiration
  2. maxHoldMs value being overridden somewhere (1020000 vs 300000)
  3. Process exit detection not triggering lock cleanup
  4. Race condition in lock acquisition/release

Fix Action

Fix / Workaround

Affected: All OpenClaw users on 2026.5.22 with long-running gateway Severity: Medium (requires manual intervention or workaround) Frequency: Observed multiple times after overnight operation Consequence: Agents fail to respond, user must manually delete lock files or restart gateway

Workaround applied:

  1. Extended timeouts via environment variables:
    • OPENCLAW_SESSION_WRITE_LOCK_ACQUIRE_TIMEOUT_MS=120000
    • OPENCLAW_SESSION_WRITE_LOCK_STALE_MS=3600000
    • OPENCLAW_SESSION_WRITE_LOCK_MAX_HOLD_MS=600000

Code Example

Lock file content:

{
  "pid": 16834,
  "createdAt": "2026-05-28T01:12:54.261Z",
  "maxHoldMs": 1020000
}


Process status:

pid 16834  83.3% CPU  openclaw gateway

(Process running for 170+ minutes, lock held for 8+ hours)

Configuration:
- session.writeLock.acquireTimeoutMs: 60000
- session.writeLock.staleMs: 1800000  
- session.writeLock.maxHoldMs: 300000
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Session write lock files persist beyond maxHoldMs timeout; watchdog fails to reclaim stale locks, causing "session file locked" errors on subsequent requests.

Steps to reproduce

  1. Start OpenClaw 2026.5.22 gateway and let it run for extended periods (overnight)
  2. Use multiple sessions or let sessions accumulate
  3. Observe lock files in ~/.openclaw/agents/main/sessions/
  4. Attempt to use OpenClaw after lock has exceeded maxHoldMs
  5. Observe "session file locked" error

Expected behavior

Lock files should be automatically released when:

  • The holding process exits
  • maxHoldMs timeout (300000ms / 5 minutes) is exceeded
  • Lock is marked as stale after staleMs (1800000ms / 30 minutes)

The watchdog should reclaim stale locks without manual intervention.

Actual behavior

Lock files persist for 8+ hours despite maxHoldMs being 300000ms (5 minutes) Lock file shows maxHoldMs: 1020000 (17 minutes) instead of configured 300000ms Watchdog does not reclaim stale locks automatically User must manually delete lock files or restart gateway to resolve Error message: "session file locked (timeout 60000ms): pid=16834 /path/to/session.jsonl.lock"

OpenClaw version

2026.5.22 (a374c3a)

Operating system

macOS Darwin 25.5.0 (arm64)

Install method

npm global

Model

qwen/kimi-k2.5

Provider / routing chain

openclaw -> modelstudio/qwen3.5-plus

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Lock file content:

{
  "pid": 16834,
  "createdAt": "2026-05-28T01:12:54.261Z",
  "maxHoldMs": 1020000
}


Process status:

pid 16834  83.3% CPU  openclaw gateway

(Process running for 170+ minutes, lock held for 8+ hours)

Configuration:
- session.writeLock.acquireTimeoutMs: 60000
- session.writeLock.staleMs: 1800000  
- session.writeLock.maxHoldMs: 300000

Impact and severity

Affected: All OpenClaw users on 2026.5.22 with long-running gateway Severity: Medium (requires manual intervention or workaround) Frequency: Observed multiple times after overnight operation Consequence: Agents fail to respond, user must manually delete lock files or restart gateway

Additional information

Workaround applied:

  1. Extended timeouts via environment variables:

    • OPENCLAW_SESSION_WRITE_LOCK_ACQUIRE_TIMEOUT_MS=120000
    • OPENCLAW_SESSION_WRITE_LOCK_STALE_MS=3600000
    • OPENCLAW_SESSION_WRITE_LOCK_MAX_HOLD_MS=600000
  2. Created cleanup script via crontab to remove stale locks every 10 minutes

Possible root causes:

  1. Watchdog timer not properly checking lock expiration
  2. maxHoldMs value being overridden somewhere (1020000 vs 300000)
  3. Process exit detection not triggering lock cleanup
  4. Race condition in lock acquisition/release

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Lock files should be automatically released when:

  • The holding process exits
  • maxHoldMs timeout (300000ms / 5 minutes) is exceeded
  • Lock is marked as stale after staleMs (1800000ms / 30 minutes)

The watchdog should reclaim stale locks without manual intervention.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING