openclaw - ✅(Solved) Fix [Bug]: OpenClaw 2026.4.5 fails to fallback on Anthropic overloaded_error (503), retries indefinitely [1 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62141Fetched 2026-04-08 03:08:28
View on GitHub
Comments
0
Participants
1
Timeline
0
Reactions
0

OpenClaw 2026.4.5 fails to correctly utilize its configured model fallback chain when the primary Anthropic provider returns an overloaded_error (HTTP 503). Instead of attempting the next model in the fallback sequence (e.g., Gemini or OpenAI), OpenClaw retries the same overloaded Anthropic provider repeatedly until the session times out. This leads to persistent unavailability of Anthropic models and unnecessary session delays.

This issue is explicitly documented in previous GitHub issues #32533 and #49079, indicating it's a known, unaddressed bug.

Error Message

  1. Upon receiving an overloaded_error (503) from Anthropic, OpenClaw logs WARN messages: error: "The AI service is temporarily overloaded. Please try again in a moment."
  2. The fallback mechanism (which should escalate to other providers) is not triggered or is ignored for this specific error type. {"0":"{"subsystem":"agent/embedded"}","1":{"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"cd02d139-a1cc-4adf-b68c-31a6fd2d6397","isError":true,"error":"The AI service is temporarily overloaded. Please try again in a moment.","failoverReason":"overloaded","model":"claude-sonnet-4-6","provider":"anthropic","rawErrorPreview":"{"type":"error","error":{"details":null,"type":"overloaded_error","message":"Overloaded"},"request_id":"sha256:0e4608081233"}","rawErrorHash":"sha256:1f16afbae93a","rawErrorFingerprint":"sha256:cab93443e944","providerErrorType":"overloaded_error","providerErrorMessagePreview":"Overloaded","requestIdHash":"sha256:0e4608081233"},"2":"embedded run agent end"} OpenClaw's model fallback logic needs to be updated to correctly interpret and handle overloaded_error (503) responses from providers like Anthropic. Upon receiving such an error, OpenClaw should immediately attempt the next model in the configured fallback chain, rather than retrying the same overloaded provider.

Root Cause

OpenClaw 2026.4.5 fails to correctly utilize its configured model fallback chain when the primary Anthropic provider returns an overloaded_error (HTTP 503). Instead of attempting the next model in the fallback sequence (e.g., Gemini or OpenAI), OpenClaw retries the same overloaded Anthropic provider repeatedly until the session times out. This leads to persistent unavailability of Anthropic models and unnecessary session delays.

This issue is explicitly documented in previous GitHub issues #32533 and #49079, indicating it's a known, unaddressed bug.

PR fix notes

PR #1681: fix(openclaw): prevent gateway retry from spawning duplicate error messages

Description (problem / solution / changelog)

问题

当 provider 返回错误后(如 MiniMax HTTP 500 / 错误码 2061「套餐不支持该模型」),openclaw-runtime gateway 会用相同的 runId 做指数退避重试。每次重试失败都会发出 lifecycle phase=error 事件。

由于 phase=error 走的是 stream='lifecycle'(而非 stream='error'),handleAgentEvent 的现有排除条件无法拦截这些事件重建 ActiveTurn。结果是每次重试都走了完整的错误处理流程,UI 中不断追加错误消息(原始错误 + 本地化「服务端出现错误」),每次重试新增一对。

已通过 gateway 日志确认:

  • 所有重试复用同一 runId(openclaw/openclaw#63335)
  • 每次 phase=error → ActiveTurn 重建 → 消息追加

修复方案

引入 terminatedRunIds: Set<string>dispatchAgentEvent 在收到 lifecycle phase=error 事件时立即将 runId 记录到该集合;handleAgentEvent 在重建 ActiveTurn 前检查该集合,命中则丢弃后续所有事件。

已知风险与取舍

openclaw gateway 在重试时固定复用同一 runId,无论最终成功还是失败。若某次重试成功(如真实瞬时网络抖动),因 runId 已在 terminatedRunIds 中,成功的事件会被静默丢弃,用户看到任务无输出结束(ghost output)。

可接受的原因:

  • 永久性错误(套餐限制、鉴权失败)重试永远不会成功,无数据丢失
  • 正确的上游修法是 openclaw 在可恢复 lifecycle 错误中携带 retrying: true,届时前端可区分可恢复与最终失败(openclaw/openclaw#64051,目前 open)
  • 待 openclaw-runtime 更新支持 retrying 语义后,本 workaround 应替换为检查该字段

相关 openclaw issues

  • openclaw/openclaw#64051 — lifecycle 合约 bug:可恢复错误与最终失败被混淆,导致 ghost output 和 UI 过早结束(权威说明)
  • openclaw/openclaw#63335 — 所有重试复用同一 runId(已确认行为)
  • openclaw/openclaw#62141 — gateway 对 503 无限重试而非 fallback(同类误分类重试问题)

测试

使用未订阅套餐的 MiniMax 账号发送消息,确认错误消息不再随重试不断追加。

🤖 Generated with Claude Code

Changed files

  • src/main/libs/agentEngine/openclawRuntimeAdapter.ts (modified, +13/-1)

Code Example

{"0":"{\"subsystem\":\"agent/embedded\"}","1":{"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"cd02d139-a1cc-4adf-b68c-31a6fd2d6397","isError":true,"error":"The AI service is temporarily overloaded. Please try again in a moment.","failoverReason":"overloaded","model":"claude-sonnet-4-6","provider":"anthropic","rawErrorPreview":"{\"type\":\"error\",\"error\":{\"details\":null,\"type\":\"overloaded_error\",\"message\":\"Overloaded\"},\"request_id\":\"sha256:0e4608081233\"}","rawErrorHash":"sha256:1f16afbae93a","rawErrorFingerprint":"sha256:cab93443e944","providerErrorType":"overloaded_error","providerErrorMessagePreview":"Overloaded","requestIdHash":"sha256:0e4608081233"},"2":"embedded run agent end"}
{"0":"{\"subsystem\":\"model-fallback/decision\"}","1":{"event":"model_fallback_decision","tags":["error_handling","model_fallback","candidate_failed"],"runId":"cd02d139-a1cc-4adf-b68c-31a6fd2d6397","decision":"candidate_failed","requestedProvider":"anthropic","requestedModel":"claude-sonnet-4-6","candidateProvider":"anthropic","candidateModel":"claude-sonnet-4-6","attempt":1,"total":2,"reason":"overloaded","status":503,"errorPreview":"The AI service is temporarily overloaded. Please try again in a moment.","errorHash":"sha256:4003210ceba6","nextCandidateProvider":"openai","nextCandidateModel":"gpt-5.2","isPrimary":true,"requestedModelMatched":true,"fallbackConfigured":true},"2":"model fallback decision"}
RAW_BUFFERClick to expand / collapse

[Bug]: OpenClaw 2026.4.5 fails to fallback on Anthropic overloaded_error (503), retries indefinitely

Bug type

Behavior bug (incorrect output/state without crash)

Summary

OpenClaw 2026.4.5 fails to correctly utilize its configured model fallback chain when the primary Anthropic provider returns an overloaded_error (HTTP 503). Instead of attempting the next model in the fallback sequence (e.g., Gemini or OpenAI), OpenClaw retries the same overloaded Anthropic provider repeatedly until the session times out. This leads to persistent unavailability of Anthropic models and unnecessary session delays.

This issue is explicitly documented in previous GitHub issues #32533 and #49079, indicating it's a known, unaddressed bug.

Steps to reproduce

  1. Configure Anthropic as the primary model with fallback models (e.g., Gemini) in openclaw.json.
  2. Ensure Anthropic API key is valid and network connectivity is present.
  3. Initiate conversations when the Anthropic API is under heavy load or returning overloaded_error (503).
  4. Observe OpenClaw logs.

Expected behavior

When the Anthropic API returns an overloaded_error (503), OpenClaw should:

  1. Immediately cease attempts to use the overloaded Anthropic provider.
  2. Gracefully transition to the next available model in the configured fallbacks chain.
  3. Continue the conversation with the fallback model without significant delay or session timeouts.

Actual behavior

OpenClaw exhibits the following behavior:

  1. Upon receiving an overloaded_error (503) from Anthropic, OpenClaw logs WARN messages: error: "The AI service is temporarily overloaded. Please try again in a moment."
  2. OpenClaw does not switch to a fallback model. It continues to retry the same Anthropic provider repeatedly.
  3. These retries eventually exhaust the session timeout, leading to a FailoverError: Request was aborted. and the session reverting to a default model (e.g., Gemini) only after significant delay.
  4. The fallback mechanism (which should escalate to other providers) is not triggered or is ignored for this specific error type.

Evidence from OpenClaw logs

{"0":"{\"subsystem\":\"agent/embedded\"}","1":{"event":"embedded_run_agent_end","tags":["error_handling","lifecycle","agent_end","assistant_error"],"runId":"cd02d139-a1cc-4adf-b68c-31a6fd2d6397","isError":true,"error":"The AI service is temporarily overloaded. Please try again in a moment.","failoverReason":"overloaded","model":"claude-sonnet-4-6","provider":"anthropic","rawErrorPreview":"{\"type\":\"error\",\"error\":{\"details\":null,\"type\":\"overloaded_error\",\"message\":\"Overloaded\"},\"request_id\":\"sha256:0e4608081233\"}","rawErrorHash":"sha256:1f16afbae93a","rawErrorFingerprint":"sha256:cab93443e944","providerErrorType":"overloaded_error","providerErrorMessagePreview":"Overloaded","requestIdHash":"sha256:0e4608081233"},"2":"embedded run agent end"}
{"0":"{\"subsystem\":\"model-fallback/decision\"}","1":{"event":"model_fallback_decision","tags":["error_handling","model_fallback","candidate_failed"],"runId":"cd02d139-a1cc-4adf-b68c-31a6fd2d6397","decision":"candidate_failed","requestedProvider":"anthropic","requestedModel":"claude-sonnet-4-6","candidateProvider":"anthropic","candidateModel":"claude-sonnet-4-6","attempt":1,"total":2,"reason":"overloaded","status":503,"errorPreview":"The AI service is temporarily overloaded. Please try again in a moment.","errorHash":"sha256:4003210ceba6","nextCandidateProvider":"openai","nextCandidateModel":"gpt-5.2","isPrimary":true,"requestedModelMatched":true,"fallbackConfigured":true},"2":"model fallback decision"}

Environment

  • OpenClaw version: 2026.4.5 (3e72c03)
  • OS: macOS 26.3.1 (arm64)
  • Node: v22.22.1
  • Install method: pnpm (global)
  • Gateway: local, LaunchAgent
  • Anthropic provider: Direct API key auth, baseUrl https://api.anthropic.com
  • Affected models: All Anthropic models (claude-sonnet-4-6, claude-opus-4-6)
  • Related issues: #32533, #49079 (highlighting this as a long-standing issue)
  • Related Incident: This is part of a broader instability incident raw/incident-reports/OC-2026-0406-GPT5-404.md which also covers OpenAI GPT-5.x parameter compatibility issues.

Suggested fix

OpenClaw's model fallback logic needs to be updated to correctly interpret and handle overloaded_error (503) responses from providers like Anthropic. Upon receiving such an error, OpenClaw should immediately attempt the next model in the configured fallback chain, rather than retrying the same overloaded provider.

extent analysis

TL;DR

Update OpenClaw's model fallback logic to handle overloaded_error (503) responses from Anthropic by immediately switching to the next model in the fallback chain.

Guidance

  • Review the OpenClaw configuration file (openclaw.json) to ensure that the fallback chain is correctly set up with alternative models like Gemini or OpenAI.
  • Modify the OpenClaw code to catch overloaded_error (503) responses from Anthropic and trigger the fallback mechanism to switch to the next available model.
  • Verify that the fallback logic is working correctly by testing with a simulated overloaded_error (503) response from Anthropic and checking that OpenClaw switches to the next model in the chain.
  • Consider adding logging or monitoring to track instances of overloaded_error (503) responses and fallback attempts to improve debugging and analytics.

Example

// Example fallback chain configuration in openclaw.json
{
  "models": [
    {
      "name": "claude-sonnet-4-6",
      "provider": "anthropic",
      "fallbacks": [
        {
          "name": "gpt-5.2",
          "provider": "openai"
        }
      ]
    }
  ]
}

Notes

The provided logs and issue description suggest that OpenClaw's current implementation does not correctly handle overloaded_error (503) responses from Anthropic, leading to repeated retries and session timeouts. Updating the fallback logic to handle this specific error type should resolve the issue.

Recommendation

Apply a workaround by modifying the OpenClaw code to handle overloaded_error (503) responses from Anthropic, as this is a known issue with a clear solution path.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

When the Anthropic API returns an overloaded_error (503), OpenClaw should:

  1. Immediately cease attempts to use the overloaded Anthropic provider.
  2. Gracefully transition to the next available model in the configured fallbacks chain.
  3. Continue the conversation with the fallback model without significant delay or session timeouts.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING