openclaw - 💡(How to fix) Fix Stale lock recovery ignores createdAt when recorded PID is reused

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

  • A stale lock remained at /Users/betalpha/.openclaw/agents/tino/agent/auth-profiles.json.lock from 2026-05-23.
  • The recorded PID had been reused by another macOS process.
  • Post-run markAuthProfileSuccess() tried to acquire that lock after embedded run done.
  • Because PID liveness won over createdAt, the lock was not reclaimed.
  • Every Discord reply then waited the file-lock retry/recovery tail, about 50-60s, after the embedded/model run had already completed.

Fix Action

Fix / Workaround

Related context: #85822. We posted the full production timeline and local hotfix result there: https://github.com/openclaw/openclaw/issues/85822#issuecomment-4541945718

Code Example

export function shouldRemoveDeadOwnerOrExpiredLock(params: {
  payload: Record<string, unknown> | null;
  staleMs: number;
  nowMs?: number;
  isPidDefinitelyDead?: (pid: number) => boolean;
}): boolean {
  const payload = readLockFileOwnerPayload(params.payload);
  if (payload?.pid) {
    return (params.isPidDefinitelyDead ?? defaultIsPidDefinitelyDead)(payload.pid);
  }
  if (payload?.createdAt) {
    const createdAt = Date.parse(payload.createdAt);
    return !Number.isFinite(createdAt) || (params.nowMs ?? Date.now()) - createdAt > params.staleMs;
  }
  return false;
}

---

const payload = readLockFileOwnerPayload(params.payload);
-if (payload?.pid) {
-  return (params.isPidDefinitelyDead ?? defaultIsPidDefinitelyDead)(payload.pid);
-}
 if (payload?.createdAt) {
   const createdAt = Date.parse(payload.createdAt);
-  return !Number.isFinite(createdAt) || (params.nowMs ?? Date.now()) - createdAt > params.staleMs;
+  if (!Number.isFinite(createdAt) || (params.nowMs ?? Date.now()) - createdAt > params.staleMs) {
+    return true;
+  }
 }
+if (payload?.pid) {
+  return (params.isPidDefinitelyDead ?? defaultIsPidDefinitelyDead)(payload.pid);
+}
 return false;

---

{
  pid: 12345,
  createdAt: "2026-05-23T00:00:00.000Z"
}
RAW_BUFFERClick to expand / collapse

Bug

shouldRemoveDeadOwnerOrExpiredLock() currently trusts a live recorded PID before checking createdAt. If the original lock owner died and the OS later reused the same PID for an unrelated process, stale recovery never considers the lock expired.

Current main:

export function shouldRemoveDeadOwnerOrExpiredLock(params: {
  payload: Record<string, unknown> | null;
  staleMs: number;
  nowMs?: number;
  isPidDefinitelyDead?: (pid: number) => boolean;
}): boolean {
  const payload = readLockFileOwnerPayload(params.payload);
  if (payload?.pid) {
    return (params.isPidDefinitelyDead ?? defaultIsPidDefinitelyDead)(payload.pid);
  }
  if (payload?.createdAt) {
    const createdAt = Date.parse(payload.createdAt);
    return !Number.isFinite(createdAt) || (params.nowMs ?? Date.now()) - createdAt > params.staleMs;
  }
  return false;
}

This means a lock payload with both pid and an expired createdAt is kept forever as long as any process currently has that PID.

Production impact

We hit this on macOS with OpenClaw 2026.5.22 (a374c3a) during Discord replies.

  • A stale lock remained at /Users/betalpha/.openclaw/agents/tino/agent/auth-profiles.json.lock from 2026-05-23.
  • The recorded PID had been reused by another macOS process.
  • Post-run markAuthProfileSuccess() tried to acquire that lock after embedded run done.
  • Because PID liveness won over createdAt, the lock was not reclaimed.
  • Every Discord reply then waited the file-lock retry/recovery tail, about 50-60s, after the embedded/model run had already completed.

Related context: #85822. We posted the full production timeline and local hotfix result there: https://github.com/openclaw/openclaw/issues/85822#issuecomment-4541945718

Expected behavior

If a lock payload has createdAt, stale expiry should be evaluated before PID liveness, or PID liveness should only be trusted together with a stronger owner identity than PID alone.

A minimal fix is:

 const payload = readLockFileOwnerPayload(params.payload);
-if (payload?.pid) {
-  return (params.isPidDefinitelyDead ?? defaultIsPidDefinitelyDead)(payload.pid);
-}
 if (payload?.createdAt) {
   const createdAt = Date.parse(payload.createdAt);
-  return !Number.isFinite(createdAt) || (params.nowMs ?? Date.now()) - createdAt > params.staleMs;
+  if (!Number.isFinite(createdAt) || (params.nowMs ?? Date.now()) - createdAt > params.staleMs) {
+    return true;
+  }
 }
+if (payload?.pid) {
+  return (params.isPidDefinitelyDead ?? defaultIsPidDefinitelyDead)(payload.pid);
+}
 return false;

This preserves the existing PID-dead fast path for non-expired locks, but lets explicit stale expiry do what staleMs promises even when the OS has reused the PID.

Regression test suggestion

Add a test for a payload like:

{
  pid: 12345,
  createdAt: "2026-05-23T00:00:00.000Z"
}

with:

  • nowMs later than createdAt + staleMs
  • isPidDefinitelyDead: () => false

Expected result should be true because the lock is expired even though the PID is currently alive.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

If a lock payload has createdAt, stale expiry should be evaluated before PID liveness, or PID liveness should only be trusted together with a stronger owner identity than PID alone.

A minimal fix is:

 const payload = readLockFileOwnerPayload(params.payload);
-if (payload?.pid) {
-  return (params.isPidDefinitelyDead ?? defaultIsPidDefinitelyDead)(payload.pid);
-}
 if (payload?.createdAt) {
   const createdAt = Date.parse(payload.createdAt);
-  return !Number.isFinite(createdAt) || (params.nowMs ?? Date.now()) - createdAt > params.staleMs;
+  if (!Number.isFinite(createdAt) || (params.nowMs ?? Date.now()) - createdAt > params.staleMs) {
+    return true;
+  }
 }
+if (payload?.pid) {
+  return (params.isPidDefinitelyDead ?? defaultIsPidDefinitelyDead)(payload.pid);
+}
 return false;

This preserves the existing PID-dead fast path for non-expired locks, but lets explicit stale expiry do what staleMs promises even when the OS has reused the PID.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix Stale lock recovery ignores createdAt when recorded PID is reused