openclaw - 💡(How to fix) Fix session file locked when gateway times out and falls back to embedded runner [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62981Fetched 2026-04-09 07:59:50
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0
Author
Participants

Error Message

Gateway agent failed; falling back to embedded: Error: gateway timeout after 90000ms session file locked (timeout 10000ms): pid=47321 ...9e051a52.jsonl.lock FallbackSummaryError: All models failed (3): qiniu/deepseek-v3: session file locked ... (timeout) | qiniu/kimi-k2: Request was aborted (timeout) | qiniu/glm-4.5: Request was aborted (timeout)

Root Cause

src/commands/agent-via-gateway.ts — the fallback path passes the same sessionId to agentCommand() that the gateway already has locked:

} catch (err) {
  // gateway still holds the lock on localOpts.sessionId
  return await agentCommand(localOpts, runtime, deps)  // ← same sessionId
}

Code Example

session file locked (timeout 10000ms): pid=XXXX *.jsonl.lock

---

} catch (err) {
  // gateway still holds the lock on localOpts.sessionId
  return await agentCommand(localOpts, runtime, deps)  // ← same sessionId
}

---

openclaw config set agents.defaults.timeoutSeconds 60
openclaw agent --agent main --message "<long task requiring >60s to respond>"

---

Gateway agent failed; falling back to embedded: Error: gateway timeout after 90000ms
session file locked (timeout 10000ms): pid=47321 ...9e051a52.jsonl.lock
FallbackSummaryError: All models failed (3): qiniu/deepseek-v3: session file locked ... (timeout) | qiniu/kimi-k2: Request was aborted (timeout) | qiniu/glm-4.5: Request was aborted (timeout)
RAW_BUFFERClick to expand / collapse

Bug

When openclaw agent times out at the gateway and falls back to the embedded runner, the embedded runner reuses the same sessionId. The gateway process still holds the session write lock, so the embedded runner fails immediately with:

session file locked (timeout 10000ms): pid=XXXX *.jsonl.lock

This cascades to all fallback models, resulting in FallbackSummaryError: All models failed.

Root Cause

src/commands/agent-via-gateway.ts — the fallback path passes the same sessionId to agentCommand() that the gateway already has locked:

} catch (err) {
  // gateway still holds the lock on localOpts.sessionId
  return await agentCommand(localOpts, runtime, deps)  // ← same sessionId
}

Reproduction

openclaw config set agents.defaults.timeoutSeconds 60
openclaw agent --agent main --message "<long task requiring >60s to respond>"

Full error output:

Gateway agent failed; falling back to embedded: Error: gateway timeout after 90000ms
session file locked (timeout 10000ms): pid=47321 ...9e051a52.jsonl.lock
FallbackSummaryError: All models failed (3): qiniu/deepseek-v3: session file locked ... (timeout) | qiniu/kimi-k2: Request was aborted (timeout) | qiniu/glm-4.5: Request was aborted (timeout)

Confirmed on 2026.4.7-1 and 2026.4.8.

Expected Behavior

Embedded fallback should use a new sessionId (fresh session), or wait for the gateway to release the lock before attempting to acquire it.

extent analysis

TL;DR

Generate a new sessionId for the embedded runner when falling back from the gateway to avoid session lock conflicts.

Guidance

  • Identify the sessionId generation logic and modify it to create a new session ID when the gateway times out and the embedded runner takes over.
  • Verify that the gateway releases the session lock after timing out to ensure the embedded runner can acquire the lock with the new sessionId.
  • Consider implementing a retry mechanism with a backoff strategy for the embedded runner to handle cases where the gateway still holds the lock.
  • Review the agentCommand() function to ensure it can handle a new sessionId and update the session file accordingly.

Example

} catch (err) {
  // Generate a new sessionId for the embedded runner
  const newSessionId = generateNewSessionId();
  return await agentCommand({ ...localOpts, sessionId: newSessionId }, runtime, deps);
}

Notes

This solution assumes that generating a new sessionId is feasible and does not interfere with the existing session management logic. Additionally, the generateNewSessionId() function is not defined in the provided code snippet and should be implemented according to the existing session ID generation logic.

Recommendation

Apply workaround: Generate a new sessionId for the embedded runner to avoid session lock conflicts, as this is a more straightforward solution than modifying the gateway's lock release mechanism.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING