litellm - 💡(How to fix) Fix [Bug]: Prompt Cache Not Working with GPT-5.4 via litellm proxy, but works with direct OpenAI API [1 participants]

litellm2026-04-02 10:39:28

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

GitHub stats

BerriAI/litellm#24987•Fetched 2026-04-08 02:35:13

View on GitHub

Comments

Participants

Timeline

Reactions

Author

s906903912

Participants

s906903912

Timeline (top)

labeled ×3subscribed ×1

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Bug Description

Prompt caching does not work when calling GPT-5.4 through litellm proxy, while the same requests work correctly when using OpenCode/ClaudeCode CLI directly with sub2api or other third platform.

Steps to Reproduce

Configure litellm proxy with GPT-5.4 model
Send requests with cache_control parameter in messages
Observe that cache is not being used on subsequent requests

Expected Behavior

When sending identical requests with cache_control, the second request should hit the prompt cache and return faster with reduced token costs.

Actual Behavior

Direct API calls via OpenCode/ClaudeCode CLI → Cache works ✅
Same requests through litellm proxy → Cache not triggered ❌

Environment

litellm version: v1.82.3
Model: gpt-5.4 (also tested with gpt-5.4)
Provider: OpenAI (via sub2api)
Configuration:
use_in_pass_through: true
use_litellm_proxy: false

Use Litellm: <img width="2221" height="798" alt="Image" src="https://github.com/user-attachments/assets/e25765a8-83d8-49bf-b525-b7ab4cfb2ed4" />

Not Use Litellm

Steps to Reproduce

Configure litellm proxy with GPT-5.4 model
Send requests with cache_control parameter in messages
Observe that cache is not being used on subsequent requests

Relevant log output

What part of LiteLLM is this about?

UI Dashboard

What LiteLLM version are you on ?

v1.82.3

Twitter / LinkedIn details

No response

extent analysis

TL;DR

The issue may be resolved by adjusting the use_litellm_proxy configuration to true and verifying that the cache_control parameter is correctly passed through the litellm proxy to the GPT-5.4 model.

Guidance

Review the litellm proxy configuration to ensure it is correctly set up to handle cache control parameters.
Verify that the cache_control parameter is being passed correctly from the client to the litellm proxy and then to the GPT-5.4 model.
Check the litellm proxy documentation to see if there are any specific settings or configurations needed to enable caching with the GPT-5.4 model.
Test the caching behavior with a simplified setup to isolate if the issue is specific to the litellm proxy or the GPT-5.4 model integration.

Example

No specific code example can be provided without more details on the litellm proxy and GPT-5.4 model integration, but ensuring the use_litellm_proxy is set to true and the cache_control parameter is correctly formatted and passed through might look something like adjusting the configuration to include these settings.

Notes

The provided issue lacks specific log output or detailed configuration settings, making it challenging to provide a precise solution. The suggestions are based on the information given and may require further investigation into the litellm proxy and GPT-5.4 model documentation.

Recommendation

Apply workaround: Adjust the use_litellm_proxy configuration to true and verify the cache_control parameter handling, as this seems to be a configuration or integration issue rather than a version-specific problem.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

#api #permission error #memory optimization #batch processing #GPU compatibility

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

litellm - 💡(How to fix) Fix [Bug]: Prompt Cache Not Working with GPT-5.4 via litellm proxy, but works with direct OpenAI API [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Check for existing issues

What happened?

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

TRENDING

litellm - 💡(How to fix) Fix [Bug]: Prompt Cache Not Working with GPT-5.4 via litellm proxy, but works with direct OpenAI API [1 participants]

Recommended Tools

GitHub issue graph ai analysis

Check for existing issues

What happened?

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Steps to Reproduce

Relevant log output

What part of LiteLLM is this about?

What LiteLLM version are you on ?

Twitter / LinkedIn details

extent analysis

TL;DR

Guidance

Example

Notes

Recommendation

Still need to ship something?

RELATED_DISCOVERY

TRENDING