openclaw - 💡(How to fix) Fix [Bug]: Stale plugin-runtime-deps lock survives container restart and is never reclaimed (PID collision in containers) [1 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#75028Fetched 2026-05-01 05:38:57
View on GitHub
Comments
1
Participants
2
Timeline
6
Reactions
2
Timeline (top)
labeled ×2closed ×1commented ×1mentioned ×1

When OpenClaw runs in a container with persistent storage for ~/.openclaw/plugin-runtime-deps/, a lock left behind by a prior gateway process is incorrectly treated as live by the next gateway, because lock ownership is identified by PID alone. In a container's PID namespace the new gateway frequently lands on the same PID as the previous one (typically 7), so the liveness check trivially succeeds and the new gateway waits the full 300s timeout per plugin before giving up. All plugins that need bundled runtime deps fail to initialise.

Error Message

[gateway] [plugins] failed to install bundled runtime deps: Error: Timed out waiting for bundled runtime deps lock at .../.openclaw-runtime-mirror.lock (waited=300068ms, ownerFile=ok, pid=7 alive=true, ownerAge=326616ms, ...)

Root Cause

When OpenClaw runs in a container with persistent storage for ~/.openclaw/plugin-runtime-deps/, a lock left behind by a prior gateway process is incorrectly treated as live by the next gateway, because lock ownership is identified by PID alone. In a container's PID namespace the new gateway frequently lands on the same PID as the previous one (typically 7), so the liveness check trivially succeeds and the new gateway waits the full 300s timeout per plugin before giving up. All plugins that need bundled runtime deps fail to initialise.

Fix Action

Workaround

Delete the lock directory before starting the gateway:

rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/.openclaw-runtime-mirror.lock

An entrypoint script can do this conditionally — the lock should never legitimately exist at startup, since it is only held during an active deps install.

Code Example

$ docker exec <c> cat .openclaw-runtime-mirror.lock/owner.json
{"pid":7,...,"startedAt":"2026-04-30T07:37:..."}

$ docker exec <c> ps -o pid,etime,cmd -p 7
  PID     ELAPSED CMD
    7    00:09:14 openclaw-gateway

---

rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/.openclaw-runtime-mirror.lock

---

[gateway] [plugins] failed to install bundled runtime deps: Error: Timed out waiting
for bundled runtime deps lock at .../.openclaw-runtime-mirror.lock
(waited=300068ms, ownerFile=ok, pid=7 alive=true, ownerAge=326616ms, ...)

---
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Summary

When OpenClaw runs in a container with persistent storage for ~/.openclaw/plugin-runtime-deps/, a lock left behind by a prior gateway process is incorrectly treated as live by the next gateway, because lock ownership is identified by PID alone. In a container's PID namespace the new gateway frequently lands on the same PID as the previous one (typically 7), so the liveness check trivially succeeds and the new gateway waits the full 300s timeout per plugin before giving up. All plugins that need bundled runtime deps fail to initialise.

Environment

  • OpenClaw version: 2026.4.26
  • Deployment: Docker container (Coolify-managed), Debian trixie / Node 24, ARM64
  • Persistent volume mounted at /home/node/.openclaw

Evidence of the PID collision

From a stuck container:

$ docker exec <c> cat .openclaw-runtime-mirror.lock/owner.json
{"pid":7,...,"startedAt":"2026-04-30T07:37:..."}

$ docker exec <c> ps -o pid,etime,cmd -p 7
  PID     ELAPSED CMD
    7    00:09:14 openclaw-gateway

The recorded startedAt is from the previous container; the live pid 7 is the new gateway, started 9 minutes ago. PID alone cannot distinguish them.

Workaround

Delete the lock directory before starting the gateway:

rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/.openclaw-runtime-mirror.lock

An entrypoint script can do this conditionally — the lock should never legitimately exist at startup, since it is only held during an active deps install.

Suggested fix

Record more discriminating identity in owner.json and validate all of it before treating the lock as live. Any one of the following would make the bug unreachable; combining a couple gives belt and braces:

  • Process start time (/proc/<pid>/stat field 22, or ps -o lstart). Cheap, namespace-agnostic, and uniquely identifies a process across PID reuse.
  • Boot ID (/proc/sys/kernel/random/boot_id). Different across container restarts and host reboots.
  • Container ID / hostname (/etc/hostname is the short container ID by default). Trivially differs between container instances.
  • A self-test write: have the lock holder periodically touch a heartbeat file; treat the lock as stale if the heartbeat is older than N seconds.

The owner check should require the recorded and observed values to match, not just kill(pid, 0).

Additional notes

  • Persistent storage of plugin-runtime-deps is desirable — it avoids a 25-minute reinstall on every container restart for plugins like lossless-claw. The fix should preserve that benefit.
  • The error message already prints ownerAge (the wall-clock age of the lock). It might be worth surfacing that more prominently, or warning when ownerAge greatly exceeds typical install duration, so operators notice staleness sooner.

Steps to reproduce

  1. Run OpenClaw in a container with ~/.openclaw on a persistent volume.
  2. While the gateway is starting (specifically while bundled runtime deps are being installed by any plugin), stop the container ungracefully — docker kill, OOM, host reboot, or docker compose down mid-startup. The lock directory is not removed on exit.
  3. Start a new container against the same volume.

Expected behavior

The new gateway detects that the lock owner is from a previous run and reclaims the lock, then proceeds normally.

Actual behavior

The new gateway reads owner.json, finds pid=7, asks the kernel whether pid 7 is alive, and gets true — because the new gateway process is itself pid 7 in its own namespace. Each plugin needing bundled deps then waits the full 300s lock timeout and fails validation:

[gateway] [plugins] failed to install bundled runtime deps: Error: Timed out waiting
for bundled runtime deps lock at .../.openclaw-runtime-mirror.lock
(waited=300068ms, ownerFile=ok, pid=7 alive=true, ownerAge=326616ms, ...)

Plugins serialise on the lock, so the failures are spaced 300s apart. In our case browser, discord, memory-core, slack, and telegram all failed; the gateway came up with only lossless-claw (which had genuinely held the lock during the previous run's install).

OpenClaw version

2026.4.26

Operating system

Debian GNU/Linux 13 (trixie)

Install method

npm install -g [email protected]

Model

N/A

Provider / routing chain

N/A

Additional provider/model setup details

No response

Logs, screenshots, and evidence

Impact and severity

No response

Additional information

No response

extent analysis

TL;DR

Delete the lock directory before starting the gateway or modify the lock ownership identification to include a unique process identifier.

Guidance

  • Identify the lock directory location and delete it before starting the gateway using a command like rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/.openclaw-runtime-mirror.lock.
  • Consider modifying the owner.json file to include additional unique identifiers such as process start time, boot ID, container ID, or a self-test write to prevent incorrect lock ownership identification.
  • Verify the lock ownership check by ensuring the recorded and observed values match, not just relying on kill(pid, 0).
  • To mitigate the issue, operators can monitor the ownerAge value and warn when it greatly exceeds typical install duration.

Example

# Delete lock directory before starting gateway
rm -rf ~/.openclaw/plugin-runtime-deps/openclaw-<version>-<hash>/.openclaw-runtime-mirror.lock

# Example of modified owner.json with additional unique identifier (process start time)
{
  "pid": 7,
  "startedAt": "2026-04-30T07:37:...",
  "startTime": 1643723900 // process start time in seconds since epoch
}

Notes

The provided workaround and suggested fix aim to address the issue without compromising the benefit of persistent storage for plugin-runtime-deps. However, the effectiveness of these solutions may depend on the specific deployment environment and container configuration.

Recommendation

Apply the workaround by deleting the lock directory before starting the gateway, as it provides a straightforward and effective solution to the issue. This approach ensures that the lock is properly released and avoids the incorrect ownership identification problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The new gateway detects that the lock owner is from a previous run and reclaims the lock, then proceeds normally.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING