openclaw - 💡(How to fix) Fix file_lock_stale race condition under rapid subagent spawns

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

When spawning multiple subagents rapidly (N ≥ 10) from a single parent session, some children fail immediately with:

Error: file lock stale for /path/to/agents/<agent>/sessions/<id>.jsonl: code=file_lock_stale

The child never writes its first message — the lock is already stale before the session starts writing.

Error Message

Error: file lock stale for /path/to/agents/<agent>/sessions/<id>.jsonl: code=file_lock_stale

Root Cause

When spawning multiple subagents rapidly (N ≥ 10) from a single parent session, some children fail immediately with:

Error: file lock stale for /path/to/agents/<agent>/sessions/<id>.jsonl: code=file_lock_stale

The child never writes its first message — the lock is already stale before the session starts writing.

Fix Action

Workaround

Staggering spawns with >=2s delay between branches eliminates the race in practice. But this should not be necessary for concurrent spawn patterns.

Code Example

Error: file lock stale for /path/to/agents/<agent>/sessions/<id>.jsonl: code=file_lock_stale
RAW_BUFFERClick to expand / collapse

Description

When spawning multiple subagents rapidly (N ≥ 10) from a single parent session, some children fail immediately with:

Error: file lock stale for /path/to/agents/<agent>/sessions/<id>.jsonl: code=file_lock_stale

The child never writes its first message — the lock is already stale before the session starts writing.

Reproduction

  1. Spawn 10+ Max subagents in rapid succession (no delay between sessions_spawn calls)
  2. Observe that ~10-20% fail with file_lock_stale
  3. Children that survive the lock race complete normally

Environment

  • OpenClaw: 2026.5.12
  • OS: Linux 6.17.0-29-generic (x64)
  • Node: v25.9.0
  • Config: agents.defaults.subagents.maxConcurrent=20, maxChildrenPerAgent=20, maxSpawnDepth=2

Evidence

  • 9 unique Max sessions hit file_lock_stale over ~48 hours of heavy swarm testing
  • One dispatcher session (multi-spawn controller) hit it 9 times in a single run
  • Retest with 10 Max branches: 2/10 lost to stale-lock
  • c50 test (50 simultaneous Max children) had 0 stale-lock failures, suggesting timing sensitivity

Workaround

Staggering spawns with >=2s delay between branches eliminates the race in practice. But this should not be necessary for concurrent spawn patterns.

Suspected Cause

The session JSONL file lock has a TTL or contention window that is too short when multiple sessions are created in rapid succession under the same agent. The lock acquisition may race with lock creation/rotation from the parent process.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING