openclaw - 💡(How to fix) Fix v2026.4.25: Agents with no fallbacks array auto-fail-over to every registered models.providers.* entry [3 comments, 3 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73332Fetched 2026-04-29 06:20:52
View on GitHub
Comments
3
Participants
3
Timeline
4
Reactions
0
Author
Timeline (top)
commented ×3closed ×1

Starting in v2026.4.25, when an agent's primary model fails, the gateway will attempt every model registered under models.providers.* as a fallback — even when the agent's model config has no fallbacks array configured at all. In v2026.4.24 and earlier, absence of the key meant "no fallbacks". In v4.25, absence of the key means "use every registered custom provider as an implicit fallback chain".

This caused a real outage on this host: a models.providers.* entry kept solely for ad-hoc infer model run probing got pulled into the fallback path of a cron-driven agent and tripled the latency of a recoverable model failure into a 127s lane timeout that stalled Telegram polling.

The fix on the user side is to pin "fallbacks": [] explicitly on every cost-sensitive agent and never keep a known-broken model registered. But this is a behavioral break: configs that worked correctly on v4.24 silently changed semantics on v4.25, and the change is not mentioned in the v4.25 release notes.

Error Message

08:02:33 lane error after 127s

Root Cause

Starting in v2026.4.25, when an agent's primary model fails, the gateway will attempt every model registered under models.providers.* as a fallback — even when the agent's model config has no fallbacks array configured at all. In v2026.4.24 and earlier, absence of the key meant "no fallbacks". In v4.25, absence of the key means "use every registered custom provider as an implicit fallback chain".

This caused a real outage on this host: a models.providers.* entry kept solely for ad-hoc infer model run probing got pulled into the fallback path of a cron-driven agent and tripled the latency of a recoverable model failure into a 127s lane timeout that stalled Telegram polling.

The fix on the user side is to pin "fallbacks": [] explicitly on every cost-sensitive agent and never keep a known-broken model registered. But this is a behavioral break: configs that worked correctly on v4.24 silently changed semantics on v4.25, and the change is not mentioned in the v4.25 release notes.

Fix Action

Workaround

Two-part discipline that user-side configs need to take on for v4.25+:

  1. Pin "fallbacks": [] explicitly on every agent where the design intent is "no fallback". The empty array is machine-readable and survives the v4.x default-merge:
    "model": {
      "primary": "opencode-go/minimax-m2.7",
      "fallbacks": []
    }
  2. Never keep a model registered in models.providers.* unless it is a viable production fallback for at least one agent. "Registered for ad-hoc probing only" is no longer a safe state on v4.25+. To probe a non-viable model, register it transiently (add → probe → remove, all in one session, before next gateway restart).

Code Example

"model": {
     "primary": "<a primary model id that can fail>"
   }

---

opencode-go/minimax-m2.7: ended with an incomplete terminal response (format)

---

08:00:26 minimax-m2.7 fails (incomplete terminal response)
08:00:26 fallback decision → custom-opencode-go-extras/deepseek-v4-flash
08:00:??  DSV4 returns: 400 The "reasoning_content" in the thinking mode must be passed back to the API.
08:02:33 lane error after 127s
08:02:33 "Embedded agent failed before reply"

---

"model": {
     "primary": "opencode-go/minimax-m2.7",
     "fallbacks": []
   }
RAW_BUFFERClick to expand / collapse

v2026.4.25: Agents with no fallbacks array auto-fail-over to every registered models.providers.* entry

Summary

Starting in v2026.4.25, when an agent's primary model fails, the gateway will attempt every model registered under models.providers.* as a fallback — even when the agent's model config has no fallbacks array configured at all. In v2026.4.24 and earlier, absence of the key meant "no fallbacks". In v4.25, absence of the key means "use every registered custom provider as an implicit fallback chain".

This caused a real outage on this host: a models.providers.* entry kept solely for ad-hoc infer model run probing got pulled into the fallback path of a cron-driven agent and tripled the latency of a recoverable model failure into a 127s lane timeout that stalled Telegram polling.

The fix on the user side is to pin "fallbacks": [] explicitly on every cost-sensitive agent and never keep a known-broken model registered. But this is a behavioral break: configs that worked correctly on v4.24 silently changed semantics on v4.25, and the change is not mentioned in the v4.25 release notes.

Environment

  • OpenClaw: regression introduced in v2026.4.25; persists in v2026.4.26 (release notes don't mention it being reverted).
  • Affected agent on this host: fast (Dash), configured { "primary": "opencode-go/minimax-m2.7" } with no fallbacks array — by design, per a "no fallback for the cron-traffic agent" decision.
  • Registered custom provider: models.providers.custom-opencode-go-extras proxying through https://opencode.ai/zen/go/v1 with one model: deepseek-v4-flash (kept solely for ad-hoc probing via openclaw infer model run).

Steps to reproduce

  1. On v2026.4.25 or v2026.4.26, define an agent in agents.list[] with the following model block:
    "model": {
      "primary": "<a primary model id that can fail>"
    }
    No fallbacks key at all.
  2. Register at least one entry under models.providers.* (any entry — bundled or custom). It does not have to be in any agent's fallback list.
  3. Trigger a primary-model failure (e.g., a model that intermittently returns malformed payloads, or stop the upstream provider).
  4. Observe the gateway log: a model_fallback_decision event will fire and the gateway will attempt the registered models.providers.* model even though the agent has no fallbacks configured.

Expected behavior

Absence of fallbacks in an agent's model block means no fallback. The gateway should surface a clean primary-model failure to the requester. This was the v4.24 and earlier behavior.

If the design intent is that models.providers.* registrations participate in default fallback resolution, this should be:

  • Documented in the v4.25 release notes (it isn't), and
  • Opt-in, not opt-out (e.g., gated behind an explicit agents.defaults.useRegisteredProvidersAsFallback: true knob).

Actual behavior (real-world incident, 2026-04-28 08:00 WIB)

A Telegram-routed request to fast (Dash) hit a v4.25 schema mismatch on the primary:

opencode-go/minimax-m2.7: ended with an incomplete terminal response (format)

The gateway then auto-failed over to custom-opencode-go-extras/deepseek-v4-flash — which had been registered for ad-hoc probing only and is known to fail on v4.25's OpenAI-compat passthrough due to a reasoning_content schema bug. Cascade:

08:00:26 minimax-m2.7 fails (incomplete terminal response)
08:00:26 fallback decision → custom-opencode-go-extras/deepseek-v4-flash
08:00:??  DSV4 returns: 400 The "reasoning_content" in the thinking mode must be passed back to the API.
08:02:33 lane error after 127s
08:02:33 "Embedded agent failed before reply"

Telegram polling stalled 125s during this and forced a transport restart. Health-monitor logged the polling stall but the underlying fallback-chain behavior was opaque — the models.providers.custom-opencode-go-extras entry was never named in any agent's fallback list, so a config audit didn't surface the connection.

Diagnostic evidence

  • runId: 36894cce-b406-434c-8529-df4f3d8afbaa
  • sessionId: 1307d383-6863-496f-91c5-fd1a332e8cbb
  • Gateway log around the incident shows the model_fallback_decision event referencing custom-opencode-go-extras/deepseek-v4-flash even though the agent's model config has only { "primary": "opencode-go/minimax-m2.7" }.
  • feedback_v425_fallback_default.md (internal memory) records the same finding.

Workaround

Two-part discipline that user-side configs need to take on for v4.25+:

  1. Pin "fallbacks": [] explicitly on every agent where the design intent is "no fallback". The empty array is machine-readable and survives the v4.x default-merge:
    "model": {
      "primary": "opencode-go/minimax-m2.7",
      "fallbacks": []
    }
  2. Never keep a model registered in models.providers.* unless it is a viable production fallback for at least one agent. "Registered for ad-hoc probing only" is no longer a safe state on v4.25+. To probe a non-viable model, register it transiently (add → probe → remove, all in one session, before next gateway restart).

Suggested fix

Either:

  • Restore the v4.24 semantic where absence of fallbacks means "no fallbacks", and require explicit opt-in for "use registered providers as fallback chain", or
  • Document the new behavior in the v4.25 release notes and provide a migration note (a one-time openclaw doctor warning would be ideal: "agent X has no fallbacks but is implicitly using registered provider Y — pin fallbacks: [] to opt out").

Severity

Medium. The regression is silent (no warning at config-validate or doctor time), turns recoverable single-model failures into multi-model timeout cascades, and disproportionately affects users who keep registered custom providers for probing. There is a clean workaround once the behavior is known, which is why this is medium and not high.

extent analysis

TL;DR

To fix the issue, pin "fallbacks": [] explicitly on every agent where no fallback is intended and ensure only viable production fallback models are registered in models.providers.*.

Guidance

  • Verify the absence of a fallbacks array in an agent's model block no longer means "no fallback" in v2026.4.25 and later.
  • To avoid unintended fallbacks, explicitly set "fallbacks": [] in the agent's configuration.
  • Review and clean up models.providers.* registrations to only include viable production fallback models.
  • Consider implementing a transient registration process for probing non-viable models to avoid unintended fallbacks.

Example

"model": {
  "primary": "opencode-go/minimax-m2.7",
  "fallbacks": []
}

This configuration explicitly sets no fallbacks for the agent, preventing unintended fallbacks to registered providers.

Notes

The suggested fix requires either restoring the previous semantic or documenting the new behavior and providing a migration note. The workaround provided is a temporary solution until the underlying issue is addressed.

Recommendation

Apply the workaround by pinning "fallbacks": [] on every agent where no fallback is intended and cleaning up models.providers.* registrations. This will prevent unintended fallbacks and timeout cascades until a more permanent fix is implemented.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Absence of fallbacks in an agent's model block means no fallback. The gateway should surface a clean primary-model failure to the requester. This was the v4.24 and earlier behavior.

If the design intent is that models.providers.* registrations participate in default fallback resolution, this should be:

  • Documented in the v4.25 release notes (it isn't), and
  • Opt-in, not opt-out (e.g., gated behind an explicit agents.defaults.useRegisteredProvidersAsFallback: true knob).

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING