hermes - 💡(How to fix) Fix error_classifier: 500/502 server errors should default to should_fallback=True

hermes2026-05-27 02:34:42

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

We encountered this when a proxy returned 502 for a deprecated model name. T6 (connection-error) fallback worked correctly; T5 (502-from-proxy) did not trigger the chain. Internal verdict R-2026-05-27-HERMES-T5-FALLBACK-GAP classified this as a known limitation pending an upstream fix. Internal amendment A13 documents the gap and the interim workaround (manual model config update).

Happy to open a PR for Option A if the change is welcome.

Error Message

Model deprecation: An OpenAI-compatible proxy (including ai-cli-proxy) returns HTTP 502 for an unknown or deprecated model name. Because the error message does not match _REQUEST_VALIDATION_PATTERNS and the error code is not in the known set (invalid_request_error, unknown_parameter, unsupported_parameter), the code falls through to line 783 with should_fallback=False. The fallback chain does not activate. Note: the earlier branch at lines 773–782 correctly sets should_fallback=True when the error looks like a request-validation problem (format_error path). The gap is specifically the non-validation 500/502 case at line 783.

Contrast with connection-error path

Connection errors (TCP-level failures) do trigger fallback. The 500/502 path should behave similarly for any error that is not provider-side request validation. We encountered this when a proxy returned 502 for a deprecated model name. T6 (connection-error) fallback worked correctly; T5 (502-from-proxy) did not trigger the chain. Internal verdict R-2026-05-27-HERMES-T5-FALLBACK-GAP classified this as a known limitation pending an upstream fix. Internal amendment A13 documents the gap and the interim workaround (manual model config update).

Root Cause

Fix Action

Fix / Workaround

Code Example

return result_fn(FailoverReason.server_error, retryable=True)

---

return result_fn(FailoverReason.server_error, retryable=True, should_fallback=True)

RAW_BUFFERClick to expand / collapse

Problem

In agent/error_classifier.py line 783, the fallback branch for HTTP 500/502 responses returns:

return result_fn(FailoverReason.server_error, retryable=True)

ClassifiedError.should_fallback defaults to False (line 83), so this path never triggers the fallback_providers chain. Hermes retries the same provider rather than advancing to the next configured fallback.

Affected scenarios

Transient or persistent 5xx outages: A provider returning HTTP 500 during a degraded period also hits this path. The client retries the same provider instead of falling back.

Note: the earlier branch at lines 773–782 correctly sets should_fallback=True when the error looks like a request-validation problem (format_error path). The gap is specifically the non-validation 500/502 case at line 783.

Contrast with connection-error path

Connection errors (TCP-level failures) do trigger fallback. The 500/502 path should behave similarly for any error that is not provider-side request validation.

Proposed fix

Option A — Set should_fallback=True on the non-validation 500/502 path:

return result_fn(FailoverReason.server_error, retryable=True, should_fallback=True)

Option B — Add a config flag (e.g. fallback_on_server_error: true in config.yaml) so operators can opt into this behavior without changing the default.

Option A is simpler and matches the intent of fallback_providers: if a provider is returning server errors, the next provider in the chain is a reasonable alternative.

Context

Happy to open a PR for Option A if the change is welcome.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

hermes - 💡(How to fix) Fix error_classifier: 500/502 server errors should default to should_fallback=True

Recommended Tools

GitHub issue graph ai analysis

Error Message

Contrast with connection-error path

Root Cause

Fix Action

Fix / Workaround

Code Example

Problem

Affected scenarios

Contrast with connection-error path

Proposed fix

Context

Still need to ship something?

TRENDING