openclaw - 💡(How to fix) Fix Feature: cache-aware sticky fallback to prevent prompt cache bouncing across providers [1 participants]

openclaw2026-04-08 06:20:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

openclaw/openclaw#62974•Fetched 2026-04-09 07:59:56

View on GitHub

Comments

Participants

Timeline

Reactions

Author

liu51115

Participants

liu51115

Timeline (top)

cross-referenced ×1

When the primary provider (e.g. Anthropic direct) intermittently times out and falls back to a secondary provider (e.g. OpenRouter/Bedrock), the per-request fallback design causes prompt cache bouncing: each successful primary request warms cache on Provider A, while each fallback request needs a full cache reload on Provider B. The result is near-zero cache hit rates on both providers simultaneously.

Root Cause

Model fallback is per-request: each new turn independently tries the primary provider first. There is no mechanism to "stick" to the fallback provider long enough for its prompt cache to warm and persist across turns.

overloadedBackoffMs only controls delay within a single request's retry chain, not across requests
Auth profile cooldown (1min) is too short and not cache-aware
No config exists for "stay on fallback for N minutes across future requests"

Fix Action

Workaround

Currently using manual provider switching — disable fallback entirely and switch providers manually during outages. Works but not automated.

Code Example

{
  auth: {
    cooldowns: {
      // After failover, stay on fallback provider for this duration
      // before probing primary again. Aligns with prompt cache TTL.
      stickyFallbackMs: 300000,  // 5 minutes (matches cache TTL)
      // Or: "cache-ttl" to auto-derive from cacheRetention setting
      stickyFallbackMode: "fixed",  // "fixed" | "cache-ttl"
    }
  }
}

RAW_BUFFERClick to expand / collapse

Summary

Observed impact

In a real session with ~160K context on Opus 4.6:

Primary (Anthropic) intermittently times out
Fallback (OpenRouter/Bedrock) handles those requests
Bedrock caches 22,935 tokens (14%) instead of 160K+ (99%)
Every fallback turn costs ~$0.90 instead of ~$0.09
36 Opus requests in one session cost $37.39 — estimated $5-7 with proper caching
Pattern confirmed: OR CSV shows time gaps (primary succeeded) followed by immediate cache drops on fallback

Root cause

overloadedBackoffMs only controls delay within a single request's retry chain, not across requests
Auth profile cooldown (1min) is too short and not cache-aware
No config exists for "stay on fallback for N minutes across future requests"

Proposed solution

Add a sticky fallback window that keeps subsequent requests on the fallback provider after a failover event:

{
  auth: {
    cooldowns: {
      // After failover, stay on fallback provider for this duration
      // before probing primary again. Aligns with prompt cache TTL.
      stickyFallbackMs: 300000,  // 5 minutes (matches cache TTL)
      // Or: "cache-ttl" to auto-derive from cacheRetention setting
      stickyFallbackMode: "fixed",  // "fixed" | "cache-ttl"
    }
  }
}

Ideal behavior:

Primary times out → fall to secondary
Secondary succeeds → lock to secondary for stickyFallbackMs
During sticky window, all new requests go directly to secondary (cache builds)
After window expires, probe primary once
If primary succeeds → switch back (cache will warm on primary over next turns)
If primary fails → extend sticky window

This ensures the prompt cache on whichever provider is active stays warm.

Environment

OpenClaw 4.5 (3e72c03)
Primary: Anthropic direct (Opus 4.6, cacheRetention: "long")
Fallback: OpenRouter Anthropic (cacheRetention: "short")
Context size: ~160-190K tokens
Cache TTL: 5min (short) / 1hr (long)

Workaround

Currently using manual provider switching — disable fallback entirely and switch providers manually during outages. Works but not automated.

extent analysis

TL;DR

Implement a "sticky fallback window" to keep subsequent requests on the fallback provider after a failover event, ensuring the prompt cache stays warm.

Guidance

Introduce a stickyFallbackMs configuration to specify the duration for which the fallback provider should be used after a failover.
Set stickyFallbackMode to "fixed" or "cache-ttl" to control the fallback behavior.
Implement a mechanism to probe the primary provider after the sticky window expires and switch back if it succeeds.
Consider aligning the stickyFallbackMs value with the cache TTL to optimize cache warming.
Test the new configuration with different fallback scenarios to ensure the prompt cache hit rates improve.

Example

{
  auth: {
    cooldowns: {
      stickyFallbackMs: 300000, // 5 minutes
      stickyFallbackMode: "fixed"
    }
  }
}

Notes

The proposed solution assumes that the cache TTL is known and can be used to determine the sticky fallback window. If the cache TTL is dynamic or unknown, an alternative approach may be needed.

Recommendation

Apply the workaround by implementing the "sticky fallback window" configuration, as it addresses the root cause of the issue and provides a clear solution to improve cache hit rates and reduce costs.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#cache error #pipeline error #runtime error #dependency conflict #environment setup

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

openclaw - 💡(How to fix) Fix Feature: cache-aware sticky fallback to prevent prompt cache bouncing across providers [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Observed impact

Root cause

Proposed solution

Environment

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

openclaw - 💡(How to fix) Fix Feature: cache-aware sticky fallback to prevent prompt cache bouncing across providers [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Workaround

Code Example

Summary

Observed impact

Root cause

Proposed solution

Environment

Workaround

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING