openclaw - ✅(Solved) Fix [Bug]: OpenRouter 404 'No endpoints found' classified as candidate_succeeded, halts fallback chain [2 pull requests, 2 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#61411Fetched 2026-04-08 02:58:57
View on GitHub
Comments
2
Participants
3
Timeline
5
Reactions
0
Author
Timeline (top)
commented ×2cross-referenced ×2closed ×1

Error Message

This is distinct from #51571 (OpenRouter JSON 404 payloads with different message format) and #59524 (finish_reason: error misclassification), though the root cause in the fallback classifier is likely shared: any non-auth, non-timeout API response is treated as success. [agent/embedded] embedded run agent end: runId=88f84408-... isError=true model=deepseek/deepseek-r1:free provider=openrouter error=404 No endpoints found for deepseek/deepseek-r1:free. The fallback decision function appears to only classify auth and timeout as failure reasons. Any other API error (including 4xx HTTP responses) falls through to a default candidate_succeeded path. The classifier should treat all non-2xx HTTP responses as candidate failures:

  • #59524 — finish_reason: error treated as success (same fallback classifier bug)

Root Cause

This is distinct from #51571 (OpenRouter JSON 404 payloads with different message format) and #59524 (finish_reason: error misclassification), though the root cause in the fallback classifier is likely shared: any non-auth, non-timeout API response is treated as success.

Fix Action

Fixed

PR fix notes

PR #61472: fix(agents): continue fallback after OpenRouter no-endpoints 404

Description (problem / solution / changelog)

Summary

  • Problem: OpenRouter 404 No endpoints found for <model> errors were not classified as model_not_found, so fallback could stop on a dead candidate.
  • Why it matters: a retired OpenRouter endpoint could halt the configured fallback chain and leave healthy later candidates unused.
  • What changed: taught the shared model-not-found matcher to recognize OpenRouter's No endpoints found for wording, reused that matcher in failover classification, and added regression coverage for classification plus embedded fallback behavior.
  • What did NOT change (scope boundary): no broader HTTP error remapping, no provider auth changes, and no fallback-policy changes outside this exact not-found detection path.

Change Type (select all)

  • Bug fix
  • Feature
  • Refactor required for the fix
  • Docs
  • Security hardening
  • Chore/infra

Scope (select all touched areas)

  • Gateway / orchestration
  • Skills / tool execution
  • Auth / tokens
  • Memory / storage
  • Integrations
  • API / contracts
  • UI / DX
  • CI/CD / infra

Linked Issue/PR

  • Closes #61411
  • Related #51571
  • This PR fixes a bug or regression

Root Cause (if applicable)

  • Root cause: the shared model-not-found heuristic did not recognize OpenRouter's No endpoints found for <model> wording, so assistant-side failover classification could miss the not-found path.
  • Missing detection / guardrail: there was no regression test for this OpenRouter-specific 404 wording, and the matcher logic had drifted across helpers.
  • Contributing context (if known): the fallback loop already handled model_not_found correctly once the error was classified.

Regression Test Plan (if applicable)

  • Coverage level that should have caught this:
    • Unit test
    • Seam / integration test
    • End-to-end test
    • Existing coverage already sufficient
  • Target test or file: src/agents/failover-error.test.ts, src/agents/live-model-errors.test.ts, src/agents/model-fallback.run-embedded.e2e.test.ts, src/agents/models.profiles.live.test.ts
  • Scenario the test should lock in: OpenRouter 404 No endpoints found for ... is classified as model_not_found and the embedded fallback chain advances to the next candidate.
  • Why this is the smallest reliable guardrail: it covers both the shared matcher and the runtime failover seam without needing a live OpenRouter dependency.
  • Existing test that already covers this (if any): None.
  • If no new test is added, why not: N/A.

User-visible / Behavior Changes

  • Fallback chains now continue past OpenRouter models that return 404 No endpoints found for <model> instead of stopping on that dead candidate.

Diagram (if applicable)

N/A

Security Impact (required)

  • New permissions/capabilities? (Yes/No) No
  • Secrets/tokens handling changed? (Yes/No) No
  • New/changed network calls? (Yes/No) No
  • Command/tool execution surface changed? (Yes/No) No
  • Data access scope changed? (Yes/No) No
  • If any Yes, explain risk + mitigation:

Repro + Verification

Environment

  • OS: macOS
  • Runtime/container: local checkout
  • Model/provider: OpenRouter fallback classification
  • Integration/channel (if any): N/A
  • Relevant config (redacted): fallback chain with an OpenRouter model that returns 404 No endpoints found for <model>

Steps

  1. Configure a fallback chain that can reach a retired OpenRouter model.
  2. Trigger a run so the chain reaches that OpenRouter candidate.
  3. Observe whether the runner classifies the 404 as model_not_found and advances.

Expected

  • The candidate is recorded as model_not_found and the next fallback model is tried.

Actual

  • Before this fix, the OpenRouter candidate could stop the chain instead of continuing.

Evidence

Attach at least one:

  • Failing test/log before + passing after
  • Trace/log snippets
  • Screenshot/recording
  • Perf numbers (if relevant)

Human Verification (required)

  • Verified scenarios: targeted regression tests for shared matcher coverage and embedded fallback continuation.
  • Edge cases checked: bare 404 No endpoints found for ..., explicit status: 404 payloads, and reuse of the shared matcher in live-helper tests.
  • What you did not verify: no live OpenRouter call; repo-wide pnpm check remains blocked by unrelated existing tsgo failures outside this diff.

Review Conversations

  • I replied to or resolved every bot review conversation I addressed in this PR.
  • I left unresolved only the conversations that still need reviewer or maintainer judgment.

Compatibility / Migration

  • Backward compatible? (Yes/No) Yes
  • Config/env changes? (Yes/No) No
  • Migration needed? (Yes/No) No
  • If yes, exact upgrade steps:

Risks and Mitigations

  • Risk: broadening the generic model-not-found matcher could accidentally catch an unrelated provider message.
    • Mitigation: the new wording is narrow and backed by unit plus failover-seam regression tests.

Changed files

  • src/agents/failover-error.test.ts (modified, +22/-0)
  • src/agents/live-model-errors.test.ts (modified, +6/-0)
  • src/agents/live-model-errors.ts (modified, +23/-1)
  • src/agents/model-fallback.run-embedded.e2e.test.ts (modified, +23/-1)
  • src/agents/models.profiles.live.test.ts (modified, +7/-29)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +2/-30)

PR #61635: fix(agents): classify OpenRouter no-endpoints 404s

Description (problem / solution / changelog)

Summary

  • classify OpenRouter "No endpoints found" 404 responses as model_not_found
  • treat 404s as model_not_found only when the message classification already indicates a missing model / endpoint
  • add a focused regression test for the OpenRouter no-endpoints response shape

Why

OpenRouter can return HTTP 404 with an error like No endpoints found for deepseek/deepseek-r1:free. when a free model has no available upstream endpoints. Today that response is not classified as model_not_found, so the failover path can stop too early instead of treating it like a missing model.

Verification

  • pnpm exec vitest run src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts --testNamePattern "OpenRouter 404 no-endpoints"
  • pnpm exec oxfmt --check src/agents/pi-embedded-helpers/errors.ts src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts

Notes

I also tried the full src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts file locally, but it currently trips the shared loadConfig unhandled rejection / timeout that is not introduced by this change.

Related #61411

Changed files

  • src/agents/pi-embedded-helpers.isbillingerrormessage.test.ts (modified, +9/-0)
  • src/agents/pi-embedded-helpers/errors.ts (modified, +4/-0)

Code Example

# Auth failures cascade correctly:
[model-fallback/decision] decision=candidate_failed requested=google/gemini-2.5-pro candidate=google/gemini-2.5-pro reason=auth next=google/gemini-2.5-flash
[model-fallback/decision] decision=candidate_failed requested=google/gemini-2.5-pro candidate=google/gemini-2.5-flash reason=auth next=openrouter/deepseek/deepseek-r1:free

# OpenRouter returns 404 — but fallback treats it as success:
[agent/embedded] embedded run agent end: runId=88f84408-... isError=true model=deepseek/deepseek-r1:free provider=openrouter error=404 No endpoints found for deepseek/deepseek-r1:free.
[model-fallback/decision] decision=candidate_succeeded requested=google/gemini-2.5-pro candidate=openrouter/deepseek/deepseek-r1:free reason=unknown next=none
RAW_BUFFERClick to expand / collapse

Bug

When an OpenRouter model returns HTTP 404 with the message No endpoints found for <model>, the model-fallback system logs decision=candidate_succeeded and stops the fallback chain. Remaining healthy candidates are never tried.

This is distinct from #51571 (OpenRouter JSON 404 payloads with different message format) and #59524 (finish_reason: error misclassification), though the root cause in the fallback classifier is likely shared: any non-auth, non-timeout API response is treated as success.

Reproduction

  1. Configure a fallback chain with an OpenRouter model whose endpoint has been removed (e.g., deepseek/deepseek-r1:free which was retired).
  2. Ensure earlier models in the chain fail (e.g., auth misconfiguration) so the chain reaches the dead model.
  3. Observe the fallback chain halt at the dead model instead of continuing.

Log evidence

From production (OpenClaw v2026.3.12, gateway PID 2430658, 2026-04-04T23:54:04 EDT):

# Auth failures cascade correctly:
[model-fallback/decision] decision=candidate_failed requested=google/gemini-2.5-pro candidate=google/gemini-2.5-pro reason=auth next=google/gemini-2.5-flash
[model-fallback/decision] decision=candidate_failed requested=google/gemini-2.5-pro candidate=google/gemini-2.5-flash reason=auth next=openrouter/deepseek/deepseek-r1:free

# OpenRouter returns 404 — but fallback treats it as success:
[agent/embedded] embedded run agent end: runId=88f84408-... isError=true model=deepseek/deepseek-r1:free provider=openrouter error=404 No endpoints found for deepseek/deepseek-r1:free.
[model-fallback/decision] decision=candidate_succeeded requested=google/gemini-2.5-pro candidate=openrouter/deepseek/deepseek-r1:free reason=unknown next=none

Two more healthy fallbacks (meta-llama/llama-3.3-70b-instruct:free, qwen/qwen3.6-plus:free) were configured but never attempted.

This pattern repeated on every inbound message, making the bot completely unresponsive.

Expected behavior

HTTP 404 from OpenRouter (any message variant, including "No endpoints found") should be classified as candidate_failed with reason=model_not_found, continuing to the next fallback candidate.

Suggested fix

The fallback decision function appears to only classify auth and timeout as failure reasons. Any other API error (including 4xx HTTP responses) falls through to a default candidate_succeeded path. The classifier should treat all non-2xx HTTP responses as candidate failures:

  • 401/403reason=auth (existing behavior, works correctly)
  • 404reason=model_not_found (currently broken)
  • 408/timeoutreason=timeout (existing behavior, works correctly)
  • 429reason=rate_limit
  • 5xxreason=provider_error

Environment

  • OpenClaw version: 2026.4.2 (confirmed unfixed; original repro on 2026.3.12)
  • Provider: OpenRouter
  • Model: deepseek/deepseek-r1:free (removed endpoint)
  • Platform: Linux (systemd user service)

Related issues

  • #51571 — OpenRouter JSON 404 payloads (different message format, same root cause)
  • #59524 — finish_reason: error treated as success (same fallback classifier bug)
  • #48680 — HTTP 403 treated as candidate_succeeded
  • #4992 — Original 404 failover bug (closed, partially fixed for some 404 variants but not all)

extent analysis

TL;DR

Update the fallback decision function to classify non-2xx HTTP responses, including 404 errors, as candidate failures.

Guidance

  • Review the fallback decision function to ensure it correctly handles non-2xx HTTP responses, such as 404 errors, and updates the reason field accordingly.
  • Verify that the function treats 404 errors as candidate_failed with reason=model_not_found, allowing the fallback chain to continue.
  • Test the updated function with different HTTP error codes (e.g., 401, 403, 408, 429, 5xx) to ensure correct classification.
  • Consider updating the logging to include more detailed information about the error, such as the HTTP status code and response message.

Example

def fallback_decision(http_status, response_message):
    if http_status == 401 or http_status == 403:
        return 'candidate_failed', 'auth'
    elif http_status == 404:
        return 'candidate_failed', 'odel_not_found'
    elif http_status == 408:
        return 'candidate_failed', 'timeout'
    elif http_status == 429:
        return 'candidate_failed', 'rate_limit'
    elif http_status >= 500:
        return 'candidate_failed', 'provider_error'
    #... existing logic for other cases...

Notes

The provided example is a simplified illustration and may require adaptation to the actual implementation. The fix should be applied to the fallback decision function, and thorough testing should be performed to ensure correct behavior for various HTTP error codes.

Recommendation

Apply the workaround by updating the fallback decision function to correctly classify non-2xx HTTP responses as candidate failures, as this will allow the fallback chain to continue and prevent the bot from becoming unresponsive.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

HTTP 404 from OpenRouter (any message variant, including "No endpoints found") should be classified as candidate_failed with reason=model_not_found, continuing to the next fallback candidate.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING