openclaw - 💡(How to fix) Fix Feature: cache-aware sticky fallback to prevent prompt cache bouncing across providers [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
openclaw/openclaw#62974Fetched 2026-04-09 07:59:56
View on GitHub
Comments
0
Participants
1
Timeline
1
Reactions
0
Author
Participants
Timeline (top)
cross-referenced ×1

When the primary provider (e.g. Anthropic direct) intermittently times out and falls back to a secondary provider (e.g. OpenRouter/Bedrock), the per-request fallback design causes prompt cache bouncing: each successful primary request warms cache on Provider A, while each fallback request needs a full cache reload on Provider B. The result is near-zero cache hit rates on both providers simultaneously.

Root Cause

Model fallback is per-request: each new turn independently tries the primary provider first. There is no mechanism to "stick" to the fallback provider long enough for its prompt cache to warm and persist across turns.

  • overloadedBackoffMs only controls delay within a single request's retry chain, not across requests
  • Auth profile cooldown (1min) is too short and not cache-aware
  • No config exists for "stay on fallback for N minutes across future requests"

Fix Action

Workaround

Currently using manual provider switching — disable fallback entirely and switch providers manually during outages. Works but not automated.

Code Example

{
  auth: {
    cooldowns: {
      // After failover, stay on fallback provider for this duration
      // before probing primary again. Aligns with prompt cache TTL.
      stickyFallbackMs: 300000,  // 5 minutes (matches cache TTL)
      // Or: "cache-ttl" to auto-derive from cacheRetention setting
      stickyFallbackMode: "fixed",  // "fixed" | "cache-ttl"
    }
  }
}
RAW_BUFFERClick to expand / collapse

Summary

When the primary provider (e.g. Anthropic direct) intermittently times out and falls back to a secondary provider (e.g. OpenRouter/Bedrock), the per-request fallback design causes prompt cache bouncing: each successful primary request warms cache on Provider A, while each fallback request needs a full cache reload on Provider B. The result is near-zero cache hit rates on both providers simultaneously.

Observed impact

In a real session with ~160K context on Opus 4.6:

  • Primary (Anthropic) intermittently times out
  • Fallback (OpenRouter/Bedrock) handles those requests
  • Bedrock caches 22,935 tokens (14%) instead of 160K+ (99%)
  • Every fallback turn costs ~$0.90 instead of ~$0.09
  • 36 Opus requests in one session cost $37.39 — estimated $5-7 with proper caching
  • Pattern confirmed: OR CSV shows time gaps (primary succeeded) followed by immediate cache drops on fallback

Root cause

Model fallback is per-request: each new turn independently tries the primary provider first. There is no mechanism to "stick" to the fallback provider long enough for its prompt cache to warm and persist across turns.

  • overloadedBackoffMs only controls delay within a single request's retry chain, not across requests
  • Auth profile cooldown (1min) is too short and not cache-aware
  • No config exists for "stay on fallback for N minutes across future requests"

Proposed solution

Add a sticky fallback window that keeps subsequent requests on the fallback provider after a failover event:

{
  auth: {
    cooldowns: {
      // After failover, stay on fallback provider for this duration
      // before probing primary again. Aligns with prompt cache TTL.
      stickyFallbackMs: 300000,  // 5 minutes (matches cache TTL)
      // Or: "cache-ttl" to auto-derive from cacheRetention setting
      stickyFallbackMode: "fixed",  // "fixed" | "cache-ttl"
    }
  }
}

Ideal behavior:

  1. Primary times out → fall to secondary
  2. Secondary succeeds → lock to secondary for stickyFallbackMs
  3. During sticky window, all new requests go directly to secondary (cache builds)
  4. After window expires, probe primary once
  5. If primary succeeds → switch back (cache will warm on primary over next turns)
  6. If primary fails → extend sticky window

This ensures the prompt cache on whichever provider is active stays warm.

Environment

  • OpenClaw 4.5 (3e72c03)
  • Primary: Anthropic direct (Opus 4.6, cacheRetention: "long")
  • Fallback: OpenRouter Anthropic (cacheRetention: "short")
  • Context size: ~160-190K tokens
  • Cache TTL: 5min (short) / 1hr (long)

Workaround

Currently using manual provider switching — disable fallback entirely and switch providers manually during outages. Works but not automated.

extent analysis

TL;DR

Implement a "sticky fallback window" to keep subsequent requests on the fallback provider after a failover event, ensuring the prompt cache stays warm.

Guidance

  • Introduce a stickyFallbackMs configuration to specify the duration for which the fallback provider should be used after a failover.
  • Set stickyFallbackMode to "fixed" or "cache-ttl" to control the fallback behavior.
  • Implement a mechanism to probe the primary provider after the sticky window expires and switch back if it succeeds.
  • Consider aligning the stickyFallbackMs value with the cache TTL to optimize cache warming.
  • Test the new configuration with different fallback scenarios to ensure the prompt cache hit rates improve.

Example

{
  auth: {
    cooldowns: {
      stickyFallbackMs: 300000, // 5 minutes
      stickyFallbackMode: "fixed"
    }
  }
}

Notes

The proposed solution assumes that the cache TTL is known and can be used to determine the sticky fallback window. If the cache TTL is dynamic or unknown, an alternative approach may be needed.

Recommendation

Apply the workaround by implementing the "sticky fallback window" configuration, as it addresses the root cause of the issue and provides a clear solution to improve cache hit rates and reduce costs.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING