litellm - 💡(How to fix) Fix [Bug]: Prompt Cache Not Working with GPT-5.4 via litellm proxy, but works with direct OpenAI API [1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#24987Fetched 2026-04-08 02:35:13
View on GitHub
Comments
0
Participants
1
Timeline
4
Reactions
1
Participants
Timeline (top)
labeled ×3subscribed ×1
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

Prompt caching does not work when calling GPT-5.4 through litellm proxy, while the same requests work correctly when using OpenCode/ClaudeCode CLI directly with sub2api or other third platform.

Steps to Reproduce

  1. Configure litellm proxy with GPT-5.4 model
  2. Send requests with cache_control parameter in messages
  3. Observe that cache is not being used on subsequent requests

Expected Behavior

When sending identical requests with cache_control, the second request should hit the prompt cache and return faster with reduced token costs.

Actual Behavior

  • Direct API calls via OpenCode/ClaudeCode CLI → Cache works ✅
  • Same requests through litellm proxy → Cache not triggered ❌

Environment

  • litellm version: v1.82.3
  • Model: gpt-5.4 (also tested with gpt-5.4)
  • Provider: OpenAI (via sub2api)
  • Configuration:
  • use_in_pass_through: true
  • use_litellm_proxy: false

Use Litellm: <img width="2221" height="798" alt="Image" src="https://github.com/user-attachments/assets/e25765a8-83d8-49bf-b525-b7ab4cfb2ed4" />

Not Use Litellm

<img width="2242" height="795" alt="Image" src="https://github.com/user-attachments/assets/f5c0ab5e-db28-4f9f-a2cd-ffb1e78b391a" />

Steps to Reproduce

  1. Configure litellm proxy with GPT-5.4 model
  2. Send requests with cache_control parameter in messages
  3. Observe that cache is not being used on subsequent requests

Relevant log output

What part of LiteLLM is this about?

UI Dashboard

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

TL;DR

  • The issue may be resolved by adjusting the use_litellm_proxy configuration to true and verifying that the cache_control parameter is correctly passed through the litellm proxy to the GPT-5.4 model.

Guidance

  • Review the litellm proxy configuration to ensure it is correctly set up to handle cache control parameters.
  • Verify that the cache_control parameter is being passed correctly from the client to the litellm proxy and then to the GPT-5.4 model.
  • Check the litellm proxy documentation to see if there are any specific settings or configurations needed to enable caching with the GPT-5.4 model.
  • Test the caching behavior with a simplified setup to isolate if the issue is specific to the litellm proxy or the GPT-5.4 model integration.

Example

No specific code example can be provided without more details on the litellm proxy and GPT-5.4 model integration, but ensuring the use_litellm_proxy is set to true and the cache_control parameter is correctly formatted and passed through might look something like adjusting the configuration to include these settings.

Notes

The provided issue lacks specific log output or detailed configuration settings, making it challenging to provide a precise solution. The suggestions are based on the information given and may require further investigation into the litellm proxy and GPT-5.4 model documentation.

Recommendation

  • Apply workaround: Adjust the use_litellm_proxy configuration to true and verify the cache_control parameter handling, as this seems to be a configuration or integration issue rather than a version-specific problem.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING