openclaw - 💡(How to fix) Fix Session file lock timeout on model switch — deadlock or stale lock file [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#59991Fetched 2026-04-08 02:37:52
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Participants

Error Message

⚠️ Agent failed before reply: All models failed (3):
  - ollama/llama3.2:3b-lowctx: session file locked (timeout 10000ms)
  - openai/gpt-4.1-mini: session file locked (timeout 10000ms)
  - anthropic/claude-haiku-4-5-20251001: session file locked (timeout 10000ms)

session file: /home/node/.openclaw/agents/main/sessions/155f43cd-4368-4f5c-b95f-f85b5f214812.jsonl.lock
lock held by: pid=10

Root Cause

Root Cause (suspected)

Code Example

⚠️ Agent failed before reply: All models failed (3):
  - ollama/llama3.2:3b-lowctx: session file locked (timeout 10000ms)
  - openai/gpt-4.1-mini: session file locked (timeout 10000ms)
  - anthropic/claude-haiku-4-5-20251001: session file locked (timeout 10000ms)

session file: /home/node/.openclaw/agents/main/sessions/155f43cd-4368-4f5c-b95f-f85b5f214812.jsonl.lock
lock held by: pid=10
RAW_BUFFERClick to expand / collapse

Issue

When switching a Discord agent from local model back to default model (after failed ACP task), all three model backends fail with session file lock timeout.


Error

⚠️ Agent failed before reply: All models failed (3):
  - ollama/llama3.2:3b-lowctx: session file locked (timeout 10000ms)
  - openai/gpt-4.1-mini: session file locked (timeout 10000ms)
  - anthropic/claude-haiku-4-5-20251001: session file locked (timeout 10000ms)

session file: /home/node/.openclaw/agents/main/sessions/155f43cd-4368-4f5c-b95f-f85b5f214812.jsonl.lock
lock held by: pid=10

Reproduction

  1. Start agent in Discord
  2. Switch to local model: /model ollama/llama3.2:3b-lowctx
  3. Attempt task using local model (e.g., deep research)
  4. Agent reports missing ACP, fails
  5. Switch back to default model: /model anthropic/claude-haiku-4-5-20251001
  6. Result: Session file locked, all models timeout (10s), no reply

Root Cause (suspected)

  • Stale lock file: Process (pid=10) holding lock crashed or exited without releasing
  • Deadlock: Multiple agents/models contending for same session lock
  • File descriptor leak: Lock not released after failed task attempt

Symptoms

  • Session is blocked from all model backends (not model-specific)
  • Lock timeout is hard 10000ms, then gives up
  • Requires manual intervention to clear (delete .jsonl.lock file?)

Investigation

  • Check what process pid=10 is (likely gateway or a stuck worker)
  • Verify lock acquisition logic (timeout, retry, cleanup on crash)
  • Test: Does deleting the .jsonl.lock file allow normal operation?
  • Add lock staleness detection (if lock older than X minutes, force-acquire)
  • Review session file write logic for proper cleanup on agent failure

Impact

  • Agent becomes unresponsive after failed task + model switch
  • Affects all three model backends (not isolated)
  • May require container restart to clear

Related

  • Possibly related to ACP module failures
  • Session management under error conditions

extent analysis

TL;DR

Delete the .jsonl.lock file to potentially resolve the session file lock timeout issue after switching a Discord agent from a local model back to the default model.

Guidance

  • Investigate the process with pid=10 to determine why it is holding the lock and if it can be safely terminated or restarted.
  • Verify the lock acquisition logic to ensure proper timeout, retry, and cleanup mechanisms are in place, especially in the event of a crash.
  • Test deleting the .jsonl.lock file to see if it allows normal operation to resume, which could indicate a stale lock file issue.
  • Consider implementing lock staleness detection to force-acquire the lock if it is older than a certain threshold, preventing future timeouts.

Example

No specific code snippet is provided due to the lack of direct code references in the issue, but reviewing the session file write logic for proper cleanup on agent failure is crucial.

Notes

The exact solution may depend on the specifics of the lock acquisition and release logic, as well as the behavior of the process holding the lock. Further investigation into the process with pid=10 and the lock management code is necessary for a comprehensive fix.

Recommendation

Apply the workaround of deleting the .jsonl.lock file as a temporary solution to restore functionality, while also investigating the root cause to implement a more permanent fix, such as improving lock management logic.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Session file lock timeout on model switch — deadlock or stale lock file [1 participants]