openclaw - ✅(Solved) Fix fix(agents): provider-specific errors misclassified during failover [2 pull requests, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#58837Fetched 2026-04-08 02:32:03
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
0
Participants
Timeline (top)
cross-referenced ×2closed ×1locked ×1

Error Message

classifyFailoverReason() in src/agents/pi-embedded-helpers/errors.ts relies on generic string matching that misclassifies errors from several providers. This causes the failover engine to choose wrong recovery strategies — for example, retrying with the same oversized context instead of triggering compaction, or treating a rate limit as an unknown error. | Provider | Error Message | Expected | Actual | Provider-specific error patterns should be normalized to the existing FailoverReason enum so the failover engine can make correct decisions regardless of which provider returned the error.

Fix Action

Fixed

PR fix notes

PR #58845: fix(agents): normalize provider errors for better failover

Description (problem / solution / changelog)

Fixes #58837

Related: #58347 #58305 #58377 #58392 (LiveSessionModelSwitchError cluster)

Summary

Add provider-specific error patterns for AWS Bedrock, Ollama, Mistral, Cohere, DeepSeek, Together AI, and Cloudflare Workers AI. These providers return errors in non-standard formats that the generic classifiers miss, causing incorrect failover behavior.

  • Create provider-error-patterns.ts with provider-specific matchers
  • matchesProviderContextOverflow() catches context overflow patterns the generic regex misses
  • classifyProviderSpecificError() maps provider errors to FailoverReason
  • Wire into isContextOverflowError() and classifyFailoverReason() as catch-all layers

Examples fixed:

ProviderErrorWasNow
BedrockThrottlingExceptionunclassifiedrate_limit
BedrockModelNotReadyExceptionunclassifiedoverloaded
Groqmodel_is_deactivatedunclassifiedmodel_not_found
Together AIconcurrency limit reachedunclassifiedrate_limit

Test plan

  • Unit tests for matchesProviderContextOverflow (8 provider samples + negatives)
  • Unit tests for classifyProviderSpecificError (5 provider mappings)
  • Integration tests with isContextOverflowError and classifyFailoverReason
  • Existing failover tests pass (35 tests)

Changed files

  • src/agents/pi-embedded-helpers/errors.ts (modified, +12/-1)
  • src/agents/pi-embedded-helpers/provider-error-patterns.test.ts (added, +92/-0)
  • src/agents/pi-embedded-helpers/provider-error-patterns.ts (added, +117/-0)

PR #58856: fix(agents): normalize provider errors for better failover

Description (problem / solution / changelog)

Fixes #58837

Related: #58347 #58305 #58377 #58392 (LiveSessionModelSwitchError cluster)

Summary

Add provider-specific error patterns for AWS Bedrock, Ollama, Mistral, Cohere, DeepSeek, Together AI, and Cloudflare Workers AI. These providers return errors in non-standard formats that the generic classifiers miss, causing incorrect failover behavior.

  • Create provider-error-patterns.ts with provider-specific matchers
  • matchesProviderContextOverflow() catches context overflow patterns the generic regex misses
  • classifyProviderSpecificError() maps provider errors to FailoverReason
  • Wire into isContextOverflowError() and classifyFailoverReason() as catch-all layers

Examples fixed:

ProviderErrorWasNow
BedrockThrottlingExceptionunclassifiedrate_limit
BedrockModelNotReadyExceptionunclassifiedoverloaded
Groqmodel_is_deactivatedunclassifiedmodel_not_found
Together AIconcurrency limit reachedunclassifiedrate_limit

Test plan

  • Unit tests for matchesProviderContextOverflow (8 provider samples + negatives)
  • Unit tests for classifyProviderSpecificError (5 provider mappings)
  • Integration tests with isContextOverflowError and classifyFailoverReason
  • Existing failover tests pass (35 tests)
  • tsgo --noEmit clean, oxlint clean, oxfmt --check clean

Changed files

  • CHANGELOG.md (modified, +1/-0)
  • src/agents/failover-error.test.ts (modified, +18/-0)
  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +29/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +96/-79)
  • src/agents/pi-embedded-helpers/provider-error-patterns.test.ts (added, +100/-0)
  • src/agents/pi-embedded-helpers/provider-error-patterns.ts (added, +111/-0)
RAW_BUFFERClick to expand / collapse

Bug Description

classifyFailoverReason() in src/agents/pi-embedded-helpers/errors.ts relies on generic string matching that misclassifies errors from several providers. This causes the failover engine to choose wrong recovery strategies — for example, retrying with the same oversized context instead of triggering compaction, or treating a rate limit as an unknown error.

Examples of Misclassification

ProviderError MessageExpectedActual
AWS BedrockThrottlingException: Too many concurrent requestsrate_limitnull (unclassified)
AWS BedrockModelNotReadyException: model is not readyoverloadednull
Groqmodel_is_deactivatedmodel_not_foundnull
Together AIconcurrency limit has been reachedrate_limitnull
Google VertexINVALID_ARGUMENT: exceeds the maximum input tokenscontext overflownull

Impact

  • Incorrect failover decisions waste API credits (retrying instead of switching model/provider)
  • May contribute to LiveSessionModelSwitchError issues (#58347, #58305, #58377, #58392) when failover misclassifies errors and picks wrong recovery paths
  • Users on providers like Bedrock, Together AI, Groq see unexpected failures instead of graceful failover

Expected Behavior

Provider-specific error patterns should be normalized to the existing FailoverReason enum so the failover engine can make correct decisions regardless of which provider returned the error.

Affected Code

  • src/agents/pi-embedded-helpers/errors.ts:1002classifyFailoverReason()
  • src/agents/pi-embedded-helpers/failover-matches.ts — generic pattern matchers

extent analysis

TL;DR

Update the classifyFailoverReason() function in src/agents/pi-embedded-helpers/errors.ts to include provider-specific error patterns.

Guidance

  • Review the classifyFailoverReason() function to identify areas where generic string matching is used and replace it with provider-specific error patterns.
  • Update the failover-matches.ts file to include additional pattern matchers for each provider's unique error messages.
  • Verify the changes by testing the failover engine with various error scenarios from different providers.
  • Consider adding unit tests to ensure the classifyFailoverReason() function correctly classifies errors for each provider.

Example

// Example of updated classifyFailoverReason() function
function classifyFailoverReason(errorMessage: string): FailoverReason {
  // Provider-specific error patterns
  if (errorMessage.includes('ThrottlingException: Too many concurrent requests')) {
    return FailoverReason.rate_limit;
  } else if (errorMessage.includes('ModelNotReadyException: model is not ready')) {
    return FailoverReason.overloaded;
  } else if (errorMessage.includes('model_is_deactivated')) {
    return FailoverReason.model_not_found;
  }
  // ...
}

Notes

The updated classifyFailoverReason() function should handle errors from various providers, including AWS Bedrock, Groq, Together AI, and Google Vertex.

Recommendation

Apply workaround: Update the classifyFailoverReason() function to include provider-specific error patterns, as this will allow the failover engine to make correct decisions regardless of which provider returned the error.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - ✅(Solved) Fix fix(agents): provider-specific errors misclassified during failover [2 pull requests, 1 participants]