litellm - 💡(How to fix) Fix [Bug]: Enable cache-control for qwen models [1 pull requests]

litellm2026-05-30 09:18:04

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fixed

Fixed by PR: fix(openrouter): support cache_control for qwen models (#29322) (https://github.com/BerriAI/litellm/pull/29335)

Code Example

{
       "model": "openrouter/qwen/qwen3.6-flash",
       "messages": [
         {
           "role": "system",
           "content": [
             {
               "type": "text",
               "text": "<long repeated stable prompt>",
               "cache_control": { "type": "ephemeral" }
             }
           ]
         },
         {
           "role": "user",
           "content": "Reply with one short sentence."
         }
       ]
     }

  3. Check the usage fields in both responses.
  4. Result: cached_tokens / cache_read_tokens stays 0.
  5. Call OpenRouter directly with the same payload and model.
  6. Result: OpenRouter direct call reports prompt-cache usage, but the same request through LiteLLM does not.


### Relevant log output

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

cache_control doens't works in litellm for https://openrouter.ai/qwen/qwen3.6-flash . When i connect to openrouter directly i can set cache_control and it works. I think CacheControlSupportedModels should support "qwen" models also

Steps to Reproduce

Configure LiteLLM proxy with this OpenRouter model:

openrouter/qwen/qwen3.6-flash

Send the same chat completion request twice through LiteLLM /v1/chat/completions with a long stable system prompt and OpenRouter prompt-cache marker:

{
  "model": "openrouter/qwen/qwen3.6-flash",
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "<long repeated stable prompt>",
          "cache_control": { "type": "ephemeral" }
        }
      ]
    },
    {
      "role": "user",
      "content": "Reply with one short sentence."
    }
  ]
}

Check the usage fields in both responses.
Result: cached_tokens / cache_read_tokens stays 0.
Call OpenRouter directly with the same payload and model.
Result: OpenRouter direct call reports prompt-cache usage, but the same request through LiteLLM does not.

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

v1.86.0

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering