openclaw - ✅(Solved) Fix [Bug] server_is_overloaded / service_unavailable_error does not trigger model fallback, causing repeated retries against the same overloaded endpoint [1 pull requests, 1 participants]

Sway-Chan · 2026-04-22T10:17:19Z

[openclaw] PR 70743: codex Harden GPT-5.4 runtime paths - Repository: openclaw/openclaw - Author: 100yenadmin - State: closed | merged: True - Link: https://gi… # PR #70743: [codex] Harden GPT-5.4 runtime paths - Repository: openclaw/openclaw - Author: 100yenadmin - State: closed | merged: True - Link: https://github.com/openclaw/openclaw/pull/70743 ## Description (problem / solution / changelog) ## Summary This PR hardens the GPT-5.4 embedded-agent hot path after auditing `v2026.4.22`. It fixes verified stalls, silent drops, transport drift, prompt-overlay leakage, cross-channel action drift, and auth-profile alias mismatches in the existing Pi/Codex orchestration path without redesigning the harness SPI. This is the point-fix PR. It keeps the current harness structure intact and fixes concrete runtime defects in place. The follow-up additive extension-seam work is in #70772. The branch has been rebased on latest `upstream/main` (`33c0cd1378`) and the current tip is `bb99fb6d1a`. ## Runtime Routing Map Selecting GPT-5.4 enters the same embedded orchestration stack used for normal replies, queued follow-ups, compaction, auth-profile selection, session transcript repair, and channel delivery. `openai/*` and `openai-codex/*` still use the built-in Pi/OpenAI path. `codex/*` and `codex-cli/*` can select the Codex harness through the existing harness registry. ```mermaid flowchart TD User["User selects model / reply target"] --> AutoReply["auto-reply runner / follow-up runner"] AutoReply --> Fallback["runWithModelFallback"] Fallback --> Embedded["runEmbeddedPiAgent / runEmbeddedAgent alias"] Embedded --> Backend["runEmbeddedAttemptWithBackend"] Backend --> Selection["harness selection"] Selection -->|openai/*, openai-codex/*| Pi["built-in Pi/OpenAI attempt"] Selection -->|codex/*, codex-cli/*| Codex["Codex harness / app-server lifecycle"] Pi --> Params["extra params + tool schema shaping"] Pi --> Session["session transcript + orphan repair"] Pi --> Auth["auth profile / provider alias selection"] Pi --> Delivery["visible reply / follow-up delivery"] Codex --> Delivery Delivery --> Channels["origin channel or visible fallback"] ``` ## Failure Classes Fixed | Area | Before | After | Primary files | | --- | --- | --- | --- | | GPT-5.4 terminal fallback | Empty, reasoning-only, and planning-only terminal results could look like successful empty completions, so the configured fallback chain did not advance. | Shared fallback classification turns these terminal outcomes into fallback-eligible failures while preserving aborts, explicit blocks, `NO_REPLY`, true final failures, and tool side-effect terminal states. | `src/agents/model-fallback.ts`, `src/agents/pi-embedded-runner/result-fallback-classifier.ts`, `src/auto-reply/reply/agent-runner-execution.ts`, `src/auto-reply/reply/followup-runner.ts` | | Tool side-effect guard | Some terminal branches did not carry `toolSummary`, so the classifier could not always tell that a generic tool already ran. | `toolSummary` is built once from `attempt.toolMetas` and propagated through timeout, block, reasoning-only, incomplete-turn, and success metadata. | `src/agents/pi-embedded-runner/run.ts`, `src/agents/model-fallback.run-embedded.e2e.test.ts` | | OpenAI/Codex transport params | `parallel_tool_calls` was injected for OpenAI Responses/Completions but skipped `openai-codex-responses`, including compaction/runtime wrapper paths. | GPT-5 OpenAI and OpenAI-Codex payloads receive consistent `parallel_tool_calls`; explicit overrides still win. | `src/agents/provider-api-families.ts`, `src/agents/pi-embedded-runner/extra-params.ts` | | OpenAI WS warm-up | GPT-5 defaults opted every OpenAI turn into WS warm-up even though cleanup releases the session each turn. | Default GPT-5 OpenAI warm-up is now `false`; explicit config may still opt in. Pooling remains follow-up/gated work. | `src/agents/pi-embedded-runner/extra-params.ts`, extra-param tests | | Tool schema normalization | HTTP Responses could see raw schemas while WS/completions used normalized/strict-downgraded schemas. | Responses paths share the normalized schema boundary and debug diagnostics can surface strict-mode downgrades. | `src/agents/openai-tool-schema.ts`, `src/agents/openai-transport-stream.ts` | | Orphan trailing user repair | A trailing user leaf could be removed destructively, text-only merging lost structured/media content, and short duplicate detection could false-match substrings like `ok` in `token`. | Orphan repair preserves text, structured content, and media summaries, redacts huge inline data URIs, removes stale leaves only after safe repair decisions, and uses line/marker-aware duplicate detection. | `src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts`, `src/agents/pi-embedded-runner/run/attempt.ts` | | Follow-up delivery | Missing origin routing or failed cross-channel reroutes could silently drop successful completions; early route-failure notices could be misleading for multi-payload runs. |

openclaw2026-04-22 10:17:19

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#70120•Fetched 2026-04-23 07:29:04

View on GitHub

Comments

Participants

Timeline

Reactions

Author

Sway-Chan

Participants

Sway-Chan

Timeline (top)

renamed ×1

Error Message

17:52:18 embedded run agent end ... error=server_is_overloaded 17:54:49 embedded run agent end ... error=server_is_overloaded 18:00:45 embedded run agent end ... error=server_is_overloaded 18:00:48 embedded run agent end ... error=server_is_overloaded 18:01:01 embedded run agent end ... error=server_is_overloaded 18:01:11 embedded run agent end ... error=server_is_overloaded 18:01:39 embedded run agent end ... error=server_is_overloaded 18:02:06 embedded run agent end ... error=server_is_overloaded

PR fix notes

PR #70743: [codex] Harden GPT-5.4 runtime paths

Repository: openclaw/openclaw
Author: 100yenadmin
State: closed | merged: True
Link: https://github.com/openclaw/openclaw/pull/70743

Description (problem / solution / changelog)

Summary

This PR hardens the GPT-5.4 embedded-agent hot path after auditing v2026.4.22. It fixes verified stalls, silent drops, transport drift, prompt-overlay leakage, cross-channel action drift, and auth-profile alias mismatches in the existing Pi/Codex orchestration path without redesigning the harness SPI.

This is the point-fix PR. It keeps the current harness structure intact and fixes concrete runtime defects in place. The follow-up additive extension-seam work is in #70772.

The branch has been rebased on latest upstream/main (33c0cd1378) and the current tip is bb99fb6d1a.

Runtime Routing Map

Selecting GPT-5.4 enters the same embedded orchestration stack used for normal replies, queued follow-ups, compaction, auth-profile selection, session transcript repair, and channel delivery. openai/* and openai-codex/* still use the built-in Pi/OpenAI path. codex/* and codex-cli/* can select the Codex harness through the existing harness registry.

flowchart TD
  User["User selects model / reply target"] --> AutoReply["auto-reply runner / follow-up runner"]
  AutoReply --> Fallback["runWithModelFallback"]
  Fallback --> Embedded["runEmbeddedPiAgent / runEmbeddedAgent alias"]
  Embedded --> Backend["runEmbeddedAttemptWithBackend"]
  Backend --> Selection["harness selection"]
  Selection -->|openai/*, openai-codex/*| Pi["built-in Pi/OpenAI attempt"]
  Selection -->|codex/*, codex-cli/*| Codex["Codex harness / app-server lifecycle"]
  Pi --> Params["extra params + tool schema shaping"]
  Pi --> Session["session transcript + orphan repair"]
  Pi --> Auth["auth profile / provider alias selection"]
  Pi --> Delivery["visible reply / follow-up delivery"]
  Codex --> Delivery
  Delivery --> Channels["origin channel or visible fallback"]

Failure Classes Fixed

Area	Before	After	Primary files
GPT-5.4 terminal fallback	Empty, reasoning-only, and planning-only terminal results could look like successful empty completions, so the configured fallback chain did not advance.	Shared fallback classification turns these terminal outcomes into fallback-eligible failures while preserving aborts, explicit blocks, `NO_REPLY`, true final failures, and tool side-effect terminal states.	`src/agents/model-fallback.ts`, `src/agents/pi-embedded-runner/result-fallback-classifier.ts`, `src/auto-reply/reply/agent-runner-execution.ts`, `src/auto-reply/reply/followup-runner.ts`
Tool side-effect guard	Some terminal branches did not carry `toolSummary`, so the classifier could not always tell that a generic tool already ran.	`toolSummary` is built once from `attempt.toolMetas` and propagated through timeout, block, reasoning-only, incomplete-turn, and success metadata.	`src/agents/pi-embedded-runner/run.ts`, `src/agents/model-fallback.run-embedded.e2e.test.ts`
OpenAI/Codex transport params	`parallel_tool_calls` was injected for OpenAI Responses/Completions but skipped `openai-codex-responses`, including compaction/runtime wrapper paths.	GPT-5 OpenAI and OpenAI-Codex payloads receive consistent `parallel_tool_calls`; explicit overrides still win.	`src/agents/provider-api-families.ts`, `src/agents/pi-embedded-runner/extra-params.ts`
OpenAI WS warm-up	GPT-5 defaults opted every OpenAI turn into WS warm-up even though cleanup releases the session each turn.	Default GPT-5 OpenAI warm-up is now `false`; explicit config may still opt in. Pooling remains follow-up/gated work.	`src/agents/pi-embedded-runner/extra-params.ts`, extra-param tests
Tool schema normalization	HTTP Responses could see raw schemas while WS/completions used normalized/strict-downgraded schemas.	Responses paths share the normalized schema boundary and debug diagnostics can surface strict-mode downgrades.	`src/agents/openai-tool-schema.ts`, `src/agents/openai-transport-stream.ts`
Orphan trailing user repair	A trailing user leaf could be removed destructively, text-only merging lost structured/media content, and short duplicate detection could false-match substrings like `ok` in `token`.	Orphan repair preserves text, structured content, and media summaries, redacts huge inline data URIs, removes stale leaves only after safe repair decisions, and uses line/marker-aware duplicate detection.	`src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts`, `src/agents/pi-embedded-runner/run/attempt.ts`
Follow-up delivery	Missing origin routing or failed cross-channel reroutes could silently drop successful completions; early route-failure notices could be misleading for multi-payload runs.	Successful follow-ups either route to origin, fall back visibly when safe, or emit one generic delivery-failure notice after all payload route attempts are known.	`src/auto-reply/reply/followup-runner.ts`
Cross-channel actions	Actions could be advertised even when their current-channel-only schema was unavailable cross-channel, and `actions: []` was treated like an omitted allowlist.	Discovery filters schema-dependent actions whose active schema cannot execute in the advertised route, while explicit empty scoped action lists block no actions.	`src/channels/plugins/message-action-discovery.ts`, `src/channels/plugins/message-actions.test.ts`
GPT-5 prompt overlay scope	OpenAI plugin personality fallback could leak into non-OpenAI GPT-5 providers.	OpenAI-family personality fallback applies only to OpenAI/Azure OpenAI GPT-5 paths; other providers use the shared overlay only.	`src/agents/gpt5-prompt-overlay.ts`, `src/plugins/provider-runtime.ts`
Auth profile aliases	`codex-cli/gpt-5.4`, `openai-codex/*`, session overrides, CLI handoff, and embedded runner lock checks could compare different provider strings for the same auth profile family.	Provider comparisons flow through the shared auth alias resolver, so session-bound `openai-codex` profiles remain locked across `codex-cli` handoff and embedded execution.	`src/agents/provider-auth-aliases.ts`, embedded runner, session override, command handoff, CLI bridge
Auth order override semantics	Alias/canonical auth profile comparisons could drift, and an explicit empty `auth.order.<provider> = []` must still mean "use no stored profiles".	Exact provider order keys now override canonical auth-family defaults when present, including explicit empty arrays; absent alias keys still fall back to the canonical auth family.	`src/agents/auth-profiles/order.ts`, auth order tests

GPT-5.4 Fallback Flow

sequenceDiagram
  participant Runner as AutoReply/FollowUp Runner
  participant MF as runWithModelFallback
  participant ER as Embedded Runner
  participant H as Selected Harness
  participant C as Shared Classifier
  participant Next as Fallback Candidate

  Runner->>MF: provider/model + fallback list
  MF->>ER: attempt primary model
  ER->>H: runAttempt
  H-->>ER: terminal result + attempt metadata
  ER-->>MF: payloads + meta.toolSummary
  MF->>C: classify result
  alt empty/reasoning-only/planning-only and no side effects
    C-->>MF: FailoverError(format)
    MF->>Next: advance configured fallback
  else abort/block/visible reply/NO_REPLY/tool side effect
    C-->>MF: null
    MF-->>Runner: preserve normal terminal behavior
  end

Channel, Session, And Auth Delivery Flow

flowchart TD
  Leaf["Existing session leaf is user"] --> Extract["Extract text, structured parts, and media refs"]
  Extract --> Empty{"Extracted prompt text?"}
  Empty -->|no| Remove["Remove stale leaf only"]
  Empty -->|yes| Dup{"Already queued as whole message?"}
  Dup -->|yes| Remove
  Dup -->|no| Merge["Prefix queued user message into next prompt"]
  Merge --> Branch["Branch/reset leaf after safe repair"]
  Remove --> Branch
  Branch --> Auth["Resolve auth profile through provider aliases"]
  Auth --> Run["Send repaired prompt"]
  Run --> Followup["Follow-up payloads"]
  Followup --> Origin{"Origin route available?"}
  Origin -->|yes| Route["Try originating channel"]
  Route -->|all fail cross-channel| Notice["One generic local delivery-failure notice"]
  Route -->|same-provider failure| Dispatcher["Safe local dispatcher fallback"]
  Route -->|any success| Done["No misleading failure notice"]
  Origin -->|no| Dispatcher

Safety Boundaries

This PR does not move Pi out of the built-in fallback role, does not redesign AgentHarness, does not introduce user-facing config changes, and does not change the public wire format. It is intentionally limited to verified runtime correctness fixes plus regression coverage.

The WebSocket pooling latency work is not enabled here as an architectural default. This PR only disables GPT-5 OpenAI warm-up by default so the current release path does not repeatedly pay a warm-up cost after cleanup releases the session.

Related Work And Issue Map

This PR intentionally does not use Closes: for broad GPT-5.4/Codex tickets unless the exact reported scenario is covered. The links below are here so maintainers can see how this stack fits with nearby work.

Link	Relationship
#41282	Historical openai-codex/GPT-5.4 timeout/stall report. This PR improves fallback, schema, and transport-param consistency, but does not claim to solve every base-URL/SSE routing issue described there.
#64251	CLI-backed `codex-cli/gpt-5.4` follow-up instability. This PR helps by normalizing auth aliases and preventing successful follow-up payload drops.
#51063 / #65152	OpenAI-Codex tool execution/tool-definition symptoms. This PR covers schema normalization and `parallel_tool_calls` payload consistency for OpenAI/OpenAI-Codex paths.
#65844 / #57286 / #63856	OpenAI-Codex auth profile/order drift. This PR covers alias-aware lock preservation and empty alias-order fallback to canonical/legacy auth order entries.
#59928 / #65234 / #54698	Fallback-chain/session-model issues. This PR is narrower: it classifies GPT-5.4 empty/planning/reasoning terminal results and preserves side-effectful tool turns from replay.
#45761 / #60830 / #59680	Prior fallback classifier hardening. This PR builds on that line by adding GPT-5.4 embedded terminal classification and side-effect guards.
#52903 / #63608	Prior retry/session transcript integrity work. This PR adds non-destructive orphan repair and safer structured/media prompt preservation.
#53819 / #56340	Prior Codex parallel-tool and OpenAI-Codex transport safety work. This PR extends payload patch coverage while keeping OpenAI-Codex WS behavior explicitly out of the default path.
#70904 / #70911 / #63369	Adjacent reasoning-effort injection issue. Not fixed here; #70911 is the focused PR for missing `body.reasoning` when OpenAI/Codex Responses payloads start with `reasoning: undefined`.
#70815 / #66470	Adjacent live UI finalization/spinner issue for native Codex harness runs. Not fixed here; this PR focuses backend delivery/fallback semantics.
#69453 / #55461 / #42225	Adjacent GPT-5.4 context-window/catalog mismatch issues. Not fixed here.
#56487 / #50647 / #57917	Adjacent UI/model-picker provider-prefix issues. Not fixed here.

Live Search Additions (2026-04-24)

I re-ran live GitHub search across GPT-5.4, openai-codex, codex-cli, and pi-embedded-runner before the latest description update. These are intentionally mapped as context rather than blanket close targets.

Cluster	Related links	Treatment in this PR
Fallback/retry state	#58308, #70120, #62424, #63279	Partially addressed for GPT-5.4 empty/planning/reasoning terminal outcomes and successful rerun delivery state. Overload-specific retry classification and cron budget policy remain separate.
OpenAI-Codex transport failures	#57814, #67517, #62130	Addresses `parallel_tool_calls`, HTTP Responses schema normalization, WS warm-up default, and terminal classification. Does not claim to fix Cloudflare/base-url/network failures.
Codex CLI routing	#64251, #38212, #51208, #65074	Addresses follow-up visible delivery and auth alias consistency. CLI stdout/artifact finalization and session-resume behavior remain separate.
Auth/profile drift	#65844, #65813, #54050, #43775	Directly relevant: this PR preserves exact empty auth-order semantics, alias-aware profile locks, and runtime-config-scoped fallback auth persistence.
Embedded runner integrity	#64570, #64888, #67878, #68329	Addresses GPT-5.4 thinking/reasoning-only fallback classification and orphan repair. Broader cancellation/liveness and CLI compaction remain separate.
Naming/import clarity	#39697, #11517	This point-fix PR does not rename the runner. #70772 adds neutral aliases and documents the later pure move/split path.

Latest Validation

Post-rebase verification on the final branch:

Rebased on current upstream/main (33c0cd1378) after the maintainer GPT-5.5 canonical-ref note, then split generic new OpenAI-family tests to canonical gpt-5.5 while leaving gpt-5.4/codex-cli refs only as explicit regression or legacy-compat coverage.
node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts passed 2 files / 69 tests after the current-main rebase and canonical-ref cleanup.
node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/openai-transport-stream.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/model-fallback.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/agent-command.live-model-switch.test.ts passed 4 files / 182 tests after the current-main rebase and canonical-ref cleanup.
node scripts/run-vitest.mjs run --config test/vitest/vitest.plugins.config.ts src/plugins/provider-runtime.test.ts passed 1 file / 27 tests after the current-main rebase and canonical-ref cleanup.
node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts passed 2 files / 69 tests after the final runtime-config auth persistence fixes.
node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/command/attempt-execution.cli.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/pi-embedded-runner-extraparams-resolve.test.ts src/agents/model-fallback.test.ts src/agents/auth-profiles/order.test.ts src/agents/auth-profiles.resolve-auth-profile-order.uses-stored-profiles-no-config-exists.test.ts src/agents/auth-profiles/session-override.test.ts src/agents/provider-auth-aliases.test.ts src/agents/agent-command.live-model-switch.test.ts passed 7 files / 192 tests.
node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/followup-runner.test.ts passed 1 file / 23 tests.
node scripts/run-vitest.mjs run --config test/vitest/vitest.e2e.config.ts src/agents/model-fallback.run-embedded.e2e.test.ts passed 1 file / 17 tests.

Earlier focused/broad local verification on this PR also covered:

pnpm lint
pnpm tsgo:core:test
node scripts/run-vitest.mjs run --config test/vitest/vitest.full-core-support-boundary.config.ts test/scripts/lint-suppressions.test.ts
node scripts/run-vitest.mjs run --config test/vitest/vitest.auto-reply.config.ts src/auto-reply/reply/agent-runner-execution.test.ts src/auto-reply/reply/followup-runner.test.ts
node scripts/run-vitest.mjs run --config test/vitest/vitest.agents.config.ts src/agents/model-fallback.test.ts src/agents/pi-embedded-runner/run/attempt.test.ts src/agents/pi-embedded-runner-extraparams.test.ts src/agents/openai-transport-stream.test.ts src/agents/auth-profiles/session-override.test.ts src/agents/auth-profiles/order.test.ts src/agents/command/attempt-execution.cli.test.ts src/agents/provider-auth-aliases.test.ts src/agents/tools/message-tool.test.ts src/agents/agent-command.live-model-switch.test.ts src/plugins/provider-runtime.test.ts
node scripts/run-vitest.mjs run --config test/vitest/vitest.channels.config.ts src/channels/plugins/message-actions.test.ts
OPENCLAW_VITEST_NO_OUTPUT_TIMEOUT_MS=0 node scripts/run-vitest.mjs run --config test/vitest/vitest.extension-messaging.config.ts
pnpm exec oxfmt --check <changed files>
git diff --check

Review State

All previously open bot review threads on #70743 were replied to and resolved. The final review-fix commits after the latest rebase are:

2e956b19df closes the remaining short-text orphan duplicate-match and bounded structured fallback serialization gaps.
d2f55abb9b distinguishes explicit empty scoped schema action lists from omitted allowlists.
961567766a preserves aliased embedded auth locks.
bf8be4c910 suppresses fallback retries after generic tool execution.
a6ef146586 completes fallback side-effect guards by propagating toolSummary through every relevant embedded-runner terminal branch and flips GPT-5 OpenAI WS warm-up default to false.
35f7c348e9 updates the rebased CLI attempt-execution test mock for upstream's provider auth alias-map export.
10b74a4459 addresses fresh bot review by keeping stripped NO_REPLY terminal turns out of fallback and preserving explicit empty auth-order overrides, including exact alias keys such as codex-cli: [].
f73022e4f4 addresses fresh follow-up routing review by emitting a visible partial-delivery notice when any cross-channel payload fails, even if another payload in the same completion routes successfully.
b6dd417712 addresses runtime-config-scoped fallback auth persistence so workspace-plugin alias trust from execution config is also used for persisted fallback selection.
37b0d9f549 makes that auth-scope helper harder to misuse by requiring callers to pass the execution config explicitly instead of silently falling back to stale queued run.config.
bb99fb6d1a responds to the maintainer GPT-5.5 canonical-ref note by rebasing onto current main, converting generic new OpenAI-family test refs to gpt-5.5, and documenting remaining gpt-5.4/codex-cli refs as intentional regression or legacy-compat coverage.

Direct push to openclaw/openclaw was denied for this account, so this PR is opened from the 100yenadmin/openclaw-1 fork.

Changed files

CHANGELOG.md (modified, +1/-0)
extensions/codex/src/app-server/run-attempt.ts (modified, +8/-1)
extensions/matrix/src/actions.ts (modified, +1/-0)
extensions/msteams/src/actions.ts (modified, +1/-0)
extensions/msteams/src/channel.ts (modified, +1/-0)
extensions/openai/speech-provider.test.ts (modified, +1/-0)
extensions/openai/tts.test.ts (modified, +1/-0)
extensions/openai/tts.ts (modified, +57/-63)
src/agents/agent-command.live-model-switch.test.ts (modified, +69/-4)
src/agents/agent-command.ts (modified, +9/-1)
src/agents/auth-profiles/order.test.ts (modified, +152/-0)
src/agents/auth-profiles/order.ts (modified, +12/-2)
src/agents/auth-profiles/session-override.test.ts (modified, +42/-0)
src/agents/auth-profiles/session-override.ts (modified, +5/-3)
src/agents/command/attempt-execution.cli.test.ts (modified, +53/-1)
src/agents/command/attempt-execution.ts (modified, +10/-1)
src/agents/gpt5-prompt-overlay.ts (modified, +20/-2)
src/agents/model-fallback.run-embedded.e2e.test.ts (modified, +46/-0)
src/agents/model-fallback.test.ts (modified, +145/-0)
src/agents/model-fallback.ts (modified, +74/-1)
src/agents/models-config.uses-first-github-copilot-profile-env-tokens.test.ts (modified, +1/-0)
src/agents/openai-responses-payload-policy.ts (modified, +5/-1)
src/agents/openai-tool-schema.ts (modified, +94/-0)
src/agents/openai-transport-stream.test.ts (modified, +74/-0)
src/agents/openai-transport-stream.ts (modified, +51/-18)
src/agents/pi-embedded-runner-extraparams-resolve.test.ts (modified, +2/-2)
src/agents/pi-embedded-runner-extraparams.test.ts (modified, +47/-2)
src/agents/pi-embedded-runner.run-embedded-pi-agent.auth-profile-rotation.e2e.test.ts (modified, +65/-13)
src/agents/pi-embedded-runner/compact.ts (modified, +2/-1)
src/agents/pi-embedded-runner/extra-params.ts (modified, +2/-1)
src/agents/pi-embedded-runner/openai-stream-wrappers.ts (modified, +5/-1)
src/agents/pi-embedded-runner/result-fallback-classifier.ts (added, +111/-0)
src/agents/pi-embedded-runner/run.ts (modified, +21/-9)
src/agents/pi-embedded-runner/run/attempt.prompt-helpers.ts (modified, +176/-19)
src/agents/pi-embedded-runner/run/attempt.test.ts (modified, +120/-3)
src/agents/pi-embedded-runner/run/attempt.ts (modified, +20/-9)
src/agents/pi-model-discovery.synthetic-auth.test.ts (modified, +2/-0)
src/agents/provider-auth-aliases.test.ts (added, +35/-0)
src/agents/provider-auth-aliases.ts (modified, +39/-14)
src/agents/tools-effective-inventory.ts (modified, +2/-1)
src/agents/tools/message-tool.test.ts (modified, +51/-0)
src/agents/tools/message-tool.ts (modified, +3/-2)
src/auto-reply/reply/agent-runner-auth-profile.ts (modified, +18/-2)
src/auto-reply/reply/agent-runner-execution.test.ts (modified, +290/-2)
src/auto-reply/reply/agent-runner-execution.ts (modified, +58/-6)
src/auto-reply/reply/followup-runner.test.ts (modified, +56/-5)
src/auto-reply/reply/followup-runner.ts (modified, +43/-11)
src/channels/plugins/message-action-discovery.ts (modified, +42/-0)
src/channels/plugins/message-actions.test.ts (modified, +112/-0)
src/channels/plugins/types.core.ts (modified, +6/-0)
src/plugins/provider-runtime.test.ts (modified, +65/-0)
src/plugins/provider-runtime.ts (modified, +8/-3)

Code Example

# All 8 requests hit server_is_overloaded — zero fallbacks triggered:
17:52:18 embedded run agent end ... error=server_is_overloaded
17:54:49 embedded run agent end ... error=server_is_overloaded
18:00:45 embedded run agent end ... error=server_is_overloaded
18:00:48 embedded run agent end ... error=server_is_overloaded
18:01:01 embedded run agent end ... error=server_is_overloaded
18:01:11 embedded run agent end ... error=server_is_overloaded
18:01:39 embedded run agent end ... error=server_is_overloaded
18:02:06 embedded run agent end ... error=server_is_overloaded

# For comparison — timeout and rate_limit correctly trigger fallback:
17:44:30 failover decision: reason=timeout from=gpt-5.4 → next=zai/glm-5-turbo ✅
18:06:46 failover decision: reason=rate_limit from=gpt-5.3-codex → next=zai/glm-5-turbo ✅

RAW_BUFFERClick to expand / collapse

Problem

When an LLM provider returns server_is_overloaded (HTTP 503) or service_unavailable_error, OpenClaw does not trigger model fallback and instead retries the same endpoint repeatedly until timeout or manual intervention.

In contrast, timeout and rate_limit errors correctly trigger fallback and switch to the next candidate model.

Steps to Reproduce

Configure an OpenAI model (e.g. gpt-5.4) as the primary model
Wait for OpenAI to return server_is_overloaded errors
Observe: the agent retries the same model repeatedly without falling back

Log Evidence

# All 8 requests hit server_is_overloaded — zero fallbacks triggered:
17:52:18 embedded run agent end ... error=server_is_overloaded
17:54:49 embedded run agent end ... error=server_is_overloaded
18:00:45 embedded run agent end ... error=server_is_overloaded
18:00:48 embedded run agent end ... error=server_is_overloaded
18:01:01 embedded run agent end ... error=server_is_overloaded
18:01:11 embedded run agent end ... error=server_is_overloaded
18:01:39 embedded run agent end ... error=server_is_overloaded
18:02:06 embedded run agent end ... error=server_is_overloaded

# For comparison — timeout and rate_limit correctly trigger fallback:
17:44:30 failover decision: reason=timeout from=gpt-5.4 → next=zai/glm-5-turbo ✅
18:06:46 failover decision: reason=rate_limit from=gpt-5.3-codex → next=zai/glm-5-turbo ✅

Impact

Downstream session blocking: subagent or main session stuck in retry loop, lane not released, subsequent messages queue indefinitely
Resource waste: each retry consumes API quota and wall-clock time
Poor UX: group/DM chats become unresponsive for extended periods

Expected Behavior

server_is_overloaded and service_unavailable_error should be treated the same as timeout and rate_limit — immediately trigger fallback to the next candidate model instead of retrying the same overloaded endpoint.

Environment

OpenClaw version: latest (2026-04-22)
Affected models: openai-codex/gpt-5.4, openai-codex/gpt-5.3-codex
Fallback candidate: zai/glm-5-turbo (works correctly on timeout/rate_limit)

extent analysis

TL;DR

Update the error handling logic in OpenClaw to treat server_is_overloaded and service_unavailable_error as fallback triggers, similar to timeout and rate_limit errors.

Guidance

Review the current error handling logic in OpenClaw to identify why server_is_overloaded and service_unavailable_error are not triggering fallbacks.
Update the logic to include these error types as fallback triggers, ensuring consistency with timeout and rate_limit handling.
Verify the changes by simulating server_is_overloaded and service_unavailable_error scenarios and checking if the fallback to the next candidate model is correctly triggered.
Test the updated logic with different models and error scenarios to ensure the fix is robust and reliable.

Example

No code snippet is provided as the issue does not include specific code references. However, the fix should involve updating the error handling logic to include server_is_overloaded and service_unavailable_error in the list of errors that trigger fallbacks.

Notes

The fix assumes that the error handling logic is modifiable and that the fallback mechanism is correctly implemented for timeout and rate_limit errors. Additional testing and verification may be necessary to ensure the fix does not introduce new issues.

Recommendation

Apply the workaround by updating the error handling logic to treat server_is_overloaded and service_unavailable_error as fallback triggers. This should resolve the issue and improve the overall reliability and user experience of OpenClaw.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #GPU compatibility #latency issue #model loading #dependency error

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.