openclaw - 💡(How to fix) Fix Model fallback chain not triggered on provider-wide quota exhaustion + EmbeddedAttemptSessionTakeoverError

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

[openai-transport] error provider=openai-codex model=gpt-5.5 status=429 type=usage_limit_reached [agent/embedded] embedded run agent end: isError=true model=gpt-5.5 error=rate limit (repeated 4x for the same runId, same model) [diagnostic] lane task error: EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released Embedded agent failed before reply

Code Example

[openai-transport] error provider=openai-codex model=gpt-5.5 status=429 type=usage_limit_reached
[agent/embedded] embedded run agent end: isError=true model=gpt-5.5 error=rate limit
(repeated 4x for the same runId, same model)
[diagnostic] lane task error: EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released
Embedded agent failed before reply

---

EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released
RAW_BUFFERClick to expand / collapse

Environment

  • OpenClaw version: 2026.5.19 (a185ca2)
  • Primary model: openai-codex/gpt-5.5
  • Fallback chain configured: gpt-5.5 → gpt-5.4-mini → deepseek-v4-pro → deepseek-v4-flash → kimi-k2.6

What happened

The OpenAI Codex provider hit a quota limit (429 usage_limit_reached). Instead of falling back through the configured model chain, the embedded run retried the same model (gpt-5.5) 4 times before crashing with an EmbeddedAttemptSessionTakeoverError.

Expected behavior

The embedded run should detect the provider-wide 429, exhaust any profile rotations quickly, then escalate to the model fallback chain. The second candidate (gpt-5.4-mini) would also fail (same provider), but the third (deepseek-v4-pro) should have succeeded.

What actually happened

All 4 attempts used the same runId, same model, same provider. No model fallback decision was logged — the fallback chain was never entered. Relevant gateway log sequence:

[openai-transport] error provider=openai-codex model=gpt-5.5 status=429 type=usage_limit_reached
[agent/embedded] embedded run agent end: isError=true model=gpt-5.5 error=rate limit
(repeated 4x for the same runId, same model)
[diagnostic] lane task error: EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released
Embedded agent failed before reply

Two distinct issues

1. Fallback chain not triggered on provider-wide rate limit

The embedded run retry loop appears to do profile/auth rotations on the same model but does not escalate to the model fallback chain after exhausting them. The retry_limit stage (in pi-embedded error resolution) should detect rate_limit + fallbackConfigured and return action: fallback_model, but this never happened.

2. EmbeddedAttemptSessionTakeoverError race condition

After the 4th retry, a session file lock contention killed the entire attempt:

EmbeddedAttemptSessionTakeoverError: session file changed while embedded prompt lock was released

This suggests a race condition between the retry loop releasing the session lock and another lane/process modifying the session file.

Impact

Users get a generic error when a provider hits quota, even though perfectly functional fallback models are configured. Starting a new session (/new) picks up a working fallback model, but the previous session context is lost.

Suggested fix

  1. Ensure the embedded run retry_limit handler correctly escalates rate_limit / billing / usage_limit_reached reasons to model fallback after a reasonable number of same-provider retries.
  2. Investigate the session file lock race condition that causes EmbeddedAttemptSessionTakeoverError during retry loops.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

The embedded run should detect the provider-wide 429, exhaust any profile rotations quickly, then escalate to the model fallback chain. The second candidate (gpt-5.4-mini) would also fail (same provider), but the third (deepseek-v4-pro) should have succeeded.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING