litellm - 💡(How to fix) Fix [Bug]: Enable cache-control for qwen models [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

Code Example

{
       "model": "openrouter/qwen/qwen3.6-flash",
       "messages": [
         {
           "role": "system",
           "content": [
             {
               "type": "text",
               "text": "<long repeated stable prompt>",
               "cache_control": { "type": "ephemeral" }
             }
           ]
         },
         {
           "role": "user",
           "content": "Reply with one short sentence."
         }
       ]
     }

  3. Check the usage fields in both responses.
  4. Result: cached_tokens / cache_read_tokens stays 0.
  5. Call OpenRouter directly with the same payload and model.
  6. Result: OpenRouter direct call reports prompt-cache usage, but the same request through LiteLLM does not.


### Relevant log output
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

cache_control doens't works in litellm for https://openrouter.ai/qwen/qwen3.6-flash . When i connect to openrouter directly i can set cache_control and it works. I think CacheControlSupportedModels should support "qwen" models also

Steps to Reproduce

  1. Configure LiteLLM proxy with this OpenRouter model:

    openrouter/qwen/qwen3.6-flash

  2. Send the same chat completion request twice through LiteLLM /v1/chat/completions with a long stable system prompt and OpenRouter prompt-cache marker:

    {
      "model": "openrouter/qwen/qwen3.6-flash",
      "messages": [
        {
          "role": "system",
          "content": [
            {
              "type": "text",
              "text": "<long repeated stable prompt>",
              "cache_control": { "type": "ephemeral" }
            }
          ]
        },
        {
          "role": "user",
          "content": "Reply with one short sentence."
        }
      ]
    }
  3. Check the usage fields in both responses.

  4. Result: cached_tokens / cache_read_tokens stays 0.

  5. Call OpenRouter directly with the same payload and model.

  6. Result: OpenRouter direct call reports prompt-cache usage, but the same request through LiteLLM does not.

Relevant log output

What part of LiteLLM is this about?

No response

What LiteLLM version are you on ?

v1.86.0

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Enable cache-control for qwen models [1 pull requests]