openclaw - 💡(How to fix) Fix Model fallback chain not triggered on provider-wide quota exhaustion + EmbeddedAttemptSessionTakeoverError

openclaw2026-05-21 21:24:46

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

[openai-transport] error provider=openai-codex model=gpt-5.5 status=429 type=usage_limit_reached [agent/embedded] embedded run agent end: isError=true model=gpt-5.5 error=rate limit (repeated 4x for the same runId, same model) [diagnostic] lane task error: EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released Embedded agent failed before reply

Code Example

[openai-transport] error provider=openai-codex model=gpt-5.5 status=429 type=usage_limit_reached
[agent/embedded] embedded run agent end: isError=true model=gpt-5.5 error=rate limit
(repeated 4x for the same runId, same model)
[diagnostic] lane task error: EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released
Embedded agent failed before reply

---

EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released

RAW_BUFFERClick to expand / collapse

Environment

OpenClaw version: 2026.5.19 (a185ca2)
Primary model: openai-codex/gpt-5.5
Fallback chain configured: gpt-5.5 → gpt-5.4-mini → deepseek-v4-pro → deepseek-v4-flash → kimi-k2.6

What happened

The OpenAI Codex provider hit a quota limit (429 usage_limit_reached). Instead of falling back through the configured model chain, the embedded run retried the same model (gpt-5.5) 4 times before crashing with an EmbeddedAttemptSessionTakeoverError.

Expected behavior

The embedded run should detect the provider-wide 429, exhaust any profile rotations quickly, then escalate to the model fallback chain. The second candidate (gpt-5.4-mini) would also fail (same provider), but the third (deepseek-v4-pro) should have succeeded.

What actually happened

All 4 attempts used the same runId, same model, same provider. No model fallback decision was logged — the fallback chain was never entered. Relevant gateway log sequence:

[openai-transport] error provider=openai-codex model=gpt-5.5 status=429 type=usage_limit_reached
[agent/embedded] embedded run agent end: isError=true model=gpt-5.5 error=rate limit
(repeated 4x for the same runId, same model)
[diagnostic] lane task error: EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released
Embedded agent failed before reply

Two distinct issues

1. Fallback chain not triggered on provider-wide rate limit

The embedded run retry loop appears to do profile/auth rotations on the same model but does not escalate to the model fallback chain after exhausting them. The retry_limit stage (in pi-embedded error resolution) should detect rate_limit + fallbackConfigured and return action: fallback_model, but this never happened.

2. EmbeddedAttemptSessionTakeoverError race condition

After the 4th retry, a session file lock contention killed the entire attempt:

EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released

This suggests a race condition between the retry loop releasing the session lock and another lane/process modifying the session file.

Impact

Users get a generic error when a provider hits quota, even though perfectly functional fallback models are configured. Starting a new session (/new) picks up a working fallback model, but the previous session context is lost.

Suggested fix

Ensure the embedded run retry_limit handler correctly escalates rate_limit / billing / usage_limit_reached reasons to model fallback after a reasonable number of same-provider retries.
Investigate the session file lock race condition that causes EmbeddedAttemptSessionTakeoverError during retry loops.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Model fallback chain not triggered on provider-wide quota exhaustion + EmbeddedAttemptSessionTakeoverError

Recommended Tools

GitHub issue graph ai analysis

Error Message

Code Example

Environment

What happened

Expected behavior

What actually happened

Two distinct issues

Impact

Suggested fix

FAQ

Expected behavior

Still need to ship something?

TRENDING