openclaw - ✅(Solved) Fix [Bug]: Generic "An unknown error occurred" stream errors don't trigger model fallback for non-Anthropic providers [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#71620Fetched 2026-04-26 05:10:33
View on GitHub
Comments
0
Participants
1
Timeline
6
Reactions
0
Participants
Timeline (top)
cross-referenced ×2referenced ×2closed ×1labeled ×1

Provider gate in isAnthropicGenericUnknownError (dist/errors-CJULmF31.js:423) prevents the configured model fallback chain from rotating when non-Anthropic providers hit the generic "An unknown error occurred" stream-error path; the literal error string is surfaced to end users.

Error Message

Provider gate in isAnthropicGenericUnknownError (dist/errors-CJULmF31.js:423) prevents the configured model fallback chain from rotating when non-Anthropic providers hit the generic "An unknown error occurred" stream-error path; the literal error string is surfaced to end users. 2. Send the gateway an agent request that causes the Gemini stream to end with stopReason: "error". 3. Observe the user-facing reply contains the literal text An unknown error occurred. Journal shows [agent/embedded] embedded run agent end ... isError=true ... provider=google error=An unknown error occurred rawError=An unknown error occurred with no fallback rotation attempted. classifyFailoverReason returns null for non-Anthropic providers because the gate at line 423 is isProvider(provider, "anthropic") && .... isFailoverAssistantError returns false, failoverFailure is false, fallback rotation is skipped, and the literal stream-wrapper error is returned to the user. error=An unknown error occurred rawError=An unknown error occurred Error("An unknown error occurred") for any provider when stopReason === "aborted" | "error") Severity: High for user-facing channels — raw error text surfaces to end users instead of the fallback response. Frequency: Consistent when the Gemini stream ends in stopReason === "error". Observed in production 2026-04-25 13:44 UTC; defeats the purpose of having fallbacks configured. Suggested fix: rename isAnthropicGenericUnknownError → isGenericUnknownStreamError and drop the isProvider(provider, "anthropic") && gate. Alternative is to fix upstream by having the streaming wrapper throw a provider-tagged error string so each provider's classifier can pattern-match its own variant. Hotfix applied locally on 2026.4.15 (one-line patch removing the gate); gateway restarted cleanly and channels reconnected normally.

Root Cause

classifyFailoverReason returns null for non-Anthropic providers because the gate at line 423 is isProvider(provider, "anthropic") && .... isFailoverAssistantError returns false, failoverFailure is false, fallback rotation is skipped, and the literal stream-wrapper error is returned to the user.

Fix Action

Fix / Workaround

Suggested fix: rename isAnthropicGenericUnknownError → isGenericUnknownStreamError and drop the isProvider(provider, "anthropic") && gate. Alternative is to fix upstream by having the streaming wrapper throw a provider-tagged error string so each provider's classifier can pattern-match its own variant. Hotfix applied locally on 2026.4.15 (one-line patch removing the gate); gateway restarted cleanly and channels reconnected normally.

PR fix notes

PR #71634: fix(agents): fail over generic unknown stream errors

Description (problem / solution / changelog)

fix(agents): fail over generic unknown stream errors for non-OpenRouter providers

Summary

This fixes fallback classification for provider-scoped "An unknown error occurred" stream wrapper errors so non-OpenRouter providers (for example Google Gemini) can rotate to configured fallback models.

Fixes #71620.

Root Cause

  • src/agents/transport-stream-shared.ts throws "An unknown error occurred" for transport stopReason === "error" | "aborted" regardless of provider.
  • src/agents/pi-embedded-helpers/errors.ts only classified that phrase as timeout when provider matched Anthropic.
  • For providers like google, failover classification returned null, so fallback rotation was skipped and raw error text surfaced to users.

What Changed

  • Replaced Anthropic-only generic-unknown matcher with provider-scoped matcher:
    • classify "An unknown error occurred" as timeout when provider context exists and is not OpenRouter.
    • keep existing OpenRouter-specific handling ("Provider returned error").
  • Added regression assertions in:
    • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts
    • src/agents/failover-error.test.ts
  • Updated failover docs in docs/concepts/model-failover.md to reflect current behavior.

Validation

  • pnpm test src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts
  • pnpm check:changed (fails in this working tree due to unrelated pre-existing repo-wide typecheck errors; targeted touched-surface tests passed)

Risk / Follow-up

  • Risk is low and scoped to generic "An unknown error occurred" classification with provider context.
  • OpenRouter remains conservative for this exact phrase and still relies on its explicit matcher path.

AI-assisted

  • AI-assisted and manually reviewed.

Changed files

  • docs/concepts/model-failover.md (modified, +10/-8)
  • src/agents/failover-error.test.ts (modified, +6/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +3/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +13/-6)

PR #71647: fix(agents/failover): classify bare pi-ai stream wrapper as timeout for all providers

Description (problem / solution / changelog)

Fixes #71620.

Root cause

@mariozechner/pi-ai providers (anthropic, google, google-vertex, openai-completions, openai-responses, mistral, amazon-bedrock, azure-openai-responses, google-gemini-cli) all throw a literal Error("An unknown error occurred") from a shared stream-wrapper pattern when the stream ends with stopReason: "aborted" | "error" and no specific error info is attached.

OpenClaw classified that bare string as transient (timeout) for failover only when the active provider name contained "anthropic", via isAnthropicGenericUnknownError() in src/agents/pi-embedded-helpers/errors.ts:756. For non-Anthropic primaries (the issue reporter is on google/gemini-2.5-flash) the classifier returned null, so:

  • isFailoverAssistantError()false
  • failoverFailure branch in pi-embedded-runner/run.ts:1301 never fires
  • The configured agents.defaults.model.fallbacks chain is silently bypassed
  • The literal An unknown error occurred text is surfaced to end users

This is provider-agnostic upstream, but the gate was provider-scoped.

Fix

src/agents/pi-embedded-helpers/errors.ts:756: rename isAnthropicGenericUnknownError(raw, provider) to isGenericUnknownStreamError(raw) and drop the isProvider(provider, "anthropic") gate. Replace the includes("an unknown error occurred") substring check with an anchored regex /^\s*an unknown error occurred\.?\s*$/i so we only catch the bare wrapper message and not user-text or assistant prose that happens to contain the phrase.

Tests previously locked the old behavior in two places:

  • pi-embedded-helpers.isbillingerrormessage.test.ts:742classifies provider-scoped generic upstream messages (kept the OpenRouter assertions, dropped the "An unknown error occurred" line because it no longer needs scoping)
  • pi-embedded-helpers.isbillingerrormessage.test.ts:754 and failover-error.test.ts:459does not classify provider-scoped generic upstream messages without provider context (kept the Provider returned error / Key limit exceeded lines, dropped the "An unknown error occurred" lines that were the bug)

Added new tests covering bare/case/whitespace variants across providers anthropic, google, openrouter, plus negative tests for wrapped messages (LLM request failed with an unknown error. and a sentence containing the phrase mid-string) so they keep returning null.

Verification

  • pnpm test src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts src/agents/failover-error.test.ts — 189/189 passed.
  • pnpm test src/plugins/provider-runtime.test.ts — 30/30 passed (uses the same classifier through isFailoverErrorMessage).
  • npx oxlint <changed files> — 0 errors, 0 warnings.
  • pnpm format:check <changed files> — clean.
  • pnpm tsgo:core — same set of pre-existing unrelated errors as on c070509b7f (ui/src/ui/views/agents-*.ts, src/mcp/*, src/media/qr-runtime.ts, src/plugin-sdk/*, src/trajectory/metadata.ts); confirmed by stashing and re-running on clean main.

Test plan

  • Existing failover regression tests pass with new behavior.
  • Added regression coverage for #71620 across providers and case/whitespace variants.
  • Negative tests prevent the substring match from accidentally classifying wrapped or descriptive prose.
  • Live verification with a non-Anthropic primary that hits stopReason: "error" (issue reporter has confirmed the local one-line patch works on 2026.4.15).

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • docs/concepts/model-failover.md (modified, +12/-9)
  • src/agents/failover-error.test.ts (modified, +18/-9)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +31/-6)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +8/-6)

Code Example

Logs, screenshots, and evidence (in the shell-rendered block):
  2026-04-25T13:43:48.879+00:00 [whatsapp] Inbound message (145 chars)
  2026-04-25T13:43:55.770+00:00 [gateway] cron: job updated
  2026-04-25T13:44:00.035+00:00 [agent/embedded] embedded run agent end:
    runId=<redacted> isError=true model=gemini-2.5-flash provider=google
    error=An unknown error occurred rawError=An unknown error occurred

  Code references in installed dist/:
  - errors-CJULmF31.js:423  (provider-gated branch in isAnthropicGenericUnknownError)
  - errors-CJULmF31.js:458  (call site in classifyFailoverClassificationFromMessage)
  - anthropic-vertex-stream-BpkPWKP9.js:4720, 6211, 6349
      (provider-agnostic stream wrapper that throws the generic
       Error("An unknown error occurred") for any provider when
       stopReason === "aborted" | "error")
  - pi-embedded-runner-DN0VbqlW.js:7340
      (failoverFailure branch that never fires due to classification gap)
RAW_BUFFERClick to expand / collapse

Bug type

Behavior bug (incorrect output/state without crash)

Beta release blocker

No

Summary

Provider gate in isAnthropicGenericUnknownError (dist/errors-CJULmF31.js:423) prevents the configured model fallback chain from rotating when non-Anthropic providers hit the generic "An unknown error occurred" stream-error path; the literal error string is surfaced to end users.

Steps to reproduce

  1. Configure agents.defaults.model.primary = "google/gemini-2.5-flash" with non-empty agents.defaults.model.fallbacks.
  2. Send the gateway an agent request that causes the Gemini stream to end with stopReason: "error".
  3. Observe the user-facing reply contains the literal text An unknown error occurred. Journal shows [agent/embedded] embedded run agent end ... isError=true ... provider=google error=An unknown error occurred rawError=An unknown error occurred with no fallback rotation attempted.

Expected behavior

Configured fallback models are tried in sequence, matching the behavior already implemented for provider === "anthropic" (classified as timeout and routed through the failoverFailure path in dist/pi-embedded-runner-*.js:7340).

Actual behavior

classifyFailoverReason returns null for non-Anthropic providers because the gate at line 423 is isProvider(provider, "anthropic") && .... isFailoverAssistantError returns false, failoverFailure is false, fallback rotation is skipped, and the literal stream-wrapper error is returned to the user.

OpenClaw version

2026.4.15 (build 041266a)

Operating system

Ubuntu 24.04 (ARM64, Oracle Cloud)

Install method

npm global

Model

google/gemini-2.5-flash (primary); fallbacks: google/gemini-flash-latest, google/gemini-3.1-pro-preview

Provider / routing chain

openclaw -> google (direct, api_key auth profile)

Additional provider/model setup details

Standard direct Google provider via api_key auth profile. No proxy/router layer. Fallbacks configured under agents.defaults.model.fallbacks.

Logs, screenshots, and evidence

Logs, screenshots, and evidence (in the shell-rendered block):
  2026-04-25T13:43:48.879+00:00 [whatsapp] Inbound message (145 chars)
  2026-04-25T13:43:55.770+00:00 [gateway] cron: job updated
  2026-04-25T13:44:00.035+00:00 [agent/embedded] embedded run agent end:
    runId=<redacted> isError=true model=gemini-2.5-flash provider=google
    error=An unknown error occurred rawError=An unknown error occurred

  Code references in installed dist/:
  - errors-CJULmF31.js:423  (provider-gated branch in isAnthropicGenericUnknownError)
  - errors-CJULmF31.js:458  (call site in classifyFailoverClassificationFromMessage)
  - anthropic-vertex-stream-BpkPWKP9.js:4720, 6211, 6349
      (provider-agnostic stream wrapper that throws the generic
       Error("An unknown error occurred") for any provider when
       stopReason === "aborted" | "error")
  - pi-embedded-runner-DN0VbqlW.js:7340
      (failoverFailure branch that never fires due to classification gap)

Impact and severity

Affected: deployments using a non-Anthropic primary model with configured fallbacks (Google Gemini in our case). Severity: High for user-facing channels — raw error text surfaces to end users instead of the fallback response. Frequency: Consistent when the Gemini stream ends in stopReason === "error". Observed in production 2026-04-25 13:44 UTC; defeats the purpose of having fallbacks configured. Consequence: configured fallback chain is silently bypassed.

Additional information

Suggested fix: rename isAnthropicGenericUnknownError → isGenericUnknownStreamError and drop the isProvider(provider, "anthropic") && gate. Alternative is to fix upstream by having the streaming wrapper throw a provider-tagged error string so each provider's classifier can pattern-match its own variant. Hotfix applied locally on 2026.4.15 (one-line patch removing the gate); gateway restarted cleanly and channels reconnected normally.

extent analysis

TL;DR

The most likely fix is to rename isAnthropicGenericUnknownError to isGenericUnknownStreamError and remove the isProvider(provider, "anthropic") gate.

Guidance

  • Verify that the issue is caused by the isAnthropicGenericUnknownError function not being triggered for non-Anthropic providers by checking the code at errors-CJULmF31.js:423.
  • Remove the isProvider(provider, "anthropic") gate from the isAnthropicGenericUnknownError function to allow it to trigger for all providers.
  • Test the fallback chain with a non-Anthropic primary model to ensure it rotates correctly when the primary model encounters an error.
  • Consider fixing the upstream issue by having the streaming wrapper throw a provider-tagged error string, allowing each provider's classifier to pattern-match its own variant.

Example

// Before
function isAnthropicGenericUnknownError(provider, error) {
  return isProvider(provider, "anthropic") && ...;
}

// After
function isGenericUnknownStreamError(provider, error) {
  // Remove the isProvider(provider, "anthropic") gate
  return ...;
}

Notes

This fix assumes that the issue is caused by the isAnthropicGenericUnknownError function not being triggered for non-Anthropic providers. If the issue is more complex, additional debugging may be required.

Recommendation

Apply the workaround by renaming isAnthropicGenericUnknownError to isGenericUnknownStreamError and removing the isProvider(provider, "anthropic") gate, as this is a straightforward fix that can be implemented quickly.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Configured fallback models are tried in sequence, matching the behavior already implemented for provider === "anthropic" (classified as timeout and routed through the failoverFailure path in dist/pi-embedded-runner-*.js:7340).

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix [Bug]: Generic "An unknown error occurred" stream errors don't trigger model fallback for non-Anthropic providers [2 pull requests, 1 participants]