openclaw - 💡(How to fix) Fix [Bug]: Default fallback chain includes models that fail the minimum context-window check, making them effectively unusable [2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#73687Fetched 2026-04-29 06:16:21
View on GitHub
Comments
2
Participants
2
Timeline
3
Reactions
0
Author
Timeline (top)
commented ×2closed ×1

The default agents.defaults.model.fallbacks chain ships with google/gemma-4-31b-it (contextWindow: 8192 per modelsConfig), but the agent enforces a minimum of 16000 tokens before it will use a model. So when fallback fires (e.g. primary returns 429 RESOURCE_EXHAUSTED), the chain advances to gemma-4-31b-it which is immediately rejected, the entire chain exhausts, and the user sees a generic All models failed error with no usable recovery.

Error Message

$ docker exec openclaw-demo-typhon openclaw agent --to "+177..." --message "Show me my holdings" --timeout 90 [model-fallback/decision] decision=candidate_failed requested=google/gemini-flash-latest
candidate=google/gemini-flash-latest reason=rate_limit
next=google/gemma-4-31b-it
detail=Resource has been exhausted (...). RESOURCE_EXHAUSTED

[agent/embedded] low context window: google/gemma-4-31b-it ctx=8192 (warn<32000) source=modelsConfig [agent/embedded] blocked model (context window too small): google/gemma-4-31b-it ctx=8192 (min=16000) source=modelsConfig

[model-fallback/decision] decision=candidate_failed requested=google/gemini-flash-latest
candidate=google/gemma-4-31b-it reason=unknown
next=none
detail=Model context window too small (8192 tokens; source=modelsConfig). Minimum is 16000.

FallbackSummaryError: All models failed (2): ...

Root Cause

The fallback chain's purpose is to absorb provider failures. Right now the default chain has a single fallback (gemma-4-31b-it) that can't absorb anything because of the ctx-window enforcement. Effectively the agent has no fallback at all — and the configured fallback creates a misleading log trail that obscures the real root cause (the primary's rate-limit error gets buried two levels down in the FallbackSummaryError aggregate).

Code Example

$ docker exec openclaw-demo-typhon openclaw agent --to "+177..." --message "Show me my holdings" --timeout 90
[model-fallback/decision] decision=candidate_failed requested=google/gemini-flash-latest \
  candidate=google/gemini-flash-latest reason=rate_limit \
  next=google/gemma-4-31b-it \
  detail=Resource has been exhausted (...). RESOURCE_EXHAUSTED

[agent/embedded] low context window: google/gemma-4-31b-it ctx=8192 (warn<32000) source=modelsConfig
[agent/embedded] blocked model (context window too small): google/gemma-4-31b-it ctx=8192 (min=16000) source=modelsConfig

[model-fallback/decision] decision=candidate_failed requested=google/gemini-flash-latest \
  candidate=google/gemma-4-31b-it reason=unknown \
  next=none \
  detail=Model context window too small (8192 tokens; source=modelsConfig). Minimum is 16000.

FallbackSummaryError: All models failed (2): ...
RAW_BUFFERClick to expand / collapse

Summary

The default agents.defaults.model.fallbacks chain ships with google/gemma-4-31b-it (contextWindow: 8192 per modelsConfig), but the agent enforces a minimum of 16000 tokens before it will use a model. So when fallback fires (e.g. primary returns 429 RESOURCE_EXHAUSTED), the chain advances to gemma-4-31b-it which is immediately rejected, the entire chain exhausts, and the user sees a generic All models failed error with no usable recovery.

Reproduction

Easy on any container with the default config when the primary key is rate-limited:

$ docker exec openclaw-demo-typhon openclaw agent --to "+177..." --message "Show me my holdings" --timeout 90
[model-fallback/decision] decision=candidate_failed requested=google/gemini-flash-latest \
  candidate=google/gemini-flash-latest reason=rate_limit \
  next=google/gemma-4-31b-it \
  detail=Resource has been exhausted (...). RESOURCE_EXHAUSTED

[agent/embedded] low context window: google/gemma-4-31b-it ctx=8192 (warn<32000) source=modelsConfig
[agent/embedded] blocked model (context window too small): google/gemma-4-31b-it ctx=8192 (min=16000) source=modelsConfig

[model-fallback/decision] decision=candidate_failed requested=google/gemini-flash-latest \
  candidate=google/gemma-4-31b-it reason=unknown \
  next=none \
  detail=Model context window too small (8192 tokens; source=modelsConfig). Minimum is 16000.

FallbackSummaryError: All models failed (2): ...

So the user-visible behavior on hitting the primary rate limit is the agent dying immediately rather than recovering on the configured fallback.

Why this matters

The fallback chain's purpose is to absorb provider failures. Right now the default chain has a single fallback (gemma-4-31b-it) that can't absorb anything because of the ctx-window enforcement. Effectively the agent has no fallback at all — and the configured fallback creates a misleading log trail that obscures the real root cause (the primary's rate-limit error gets buried two levels down in the FallbackSummaryError aggregate).

Suggested fix (any of)

  1. Filter the default fallback chain at config-load time: drop any model whose contextWindow is below the agent's enforced minimum, with a clear log line so operators know.
  2. Replace gemma-4-31b-it with a fallback that has ≥16000 ctx in the shipped default (groq/llama-3.3-70b-versatile is already a configured provider in the same default config and has 131K context — would be a drop-in).
  3. Soften the minimum to match the smallest-ctx model in the default chain, with a per-call check that skips ctx-bound prompts to that model.

(2) is probably cleanest for ergonomics; (1) is the most defensible / future-proof.

Adjacent

  • #9986 covers the triggering of fallback on context overflow — this issue is about executability of an already-triggered fallback, so they're complementary.
  • #66646 is the session-lock cascade I just commented on — when both this and #66646 happen on the same call, the user-visible error is genuinely confusing because both contribute to the FallbackSummaryError without either being the root cause.

Versions

  • openclaw-demo:latest container, OpenClaw 2026.4.24
  • Default config shipped with the container (no operator override)

extent analysis

TL;DR

The most likely fix is to replace the default fallback model gemma-4-31b-it with a model that has a context window of at least 16000 tokens, such as groq/llama-3.3-70b-versatile.

Guidance

  • Filter the default fallback chain at config-load time to drop any model whose contextWindow is below the agent's enforced minimum.
  • Replace gemma-4-31b-it with a fallback model that has a sufficient context window, such as groq/llama-3.3-70b-versatile.
  • Consider softening the minimum context window to match the smallest-ctx model in the default chain, with a per-call check to skip ctx-bound prompts to that model.
  • Verify the fix by testing the fallback chain with a rate-limited primary key and checking that the agent recovers correctly.

Example

No code snippet is provided as it is not necessary for this issue.

Notes

The suggested fix assumes that the groq/llama-3.3-70b-versatile model is a suitable replacement for gemma-4-31b-it. Additionally, the fix may not apply if the agent's configuration is overridden by an operator.

Recommendation

Apply workaround by replacing gemma-4-31b-it with groq/llama-3.3-70b-versatile, as it is the cleanest and most ergonomic solution.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

openclaw - 💡(How to fix) Fix [Bug]: Default fallback chain includes models that fail the minimum context-window check, making them effectively unusable [2 comments, 2 participants]