litellm - 💡(How to fix) Fix [Feature]: Add native /v1beta/cachedContents CRUD routes to google_endpoints (parity with /v1beta/interactions)

litellm2026-05-29 02:58:31

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Fix Action

Fix / Workaround

POST /v1beta/cachedContents + POST /cachedContents
GET /v1beta/cachedContents + GET /cachedContents (list)
GET /v1beta/cachedContents/{name:path} + GET /cachedContents/{name:path}
PATCH /v1beta/cachedContents/{name:path} + PATCH /cachedContents/{name:path} (TTL extension)
DELETE /v1beta/cachedContents/{name:path} + DELETE /cachedContents/{name:path}

Clients using genai-style SDKs that target generativelanguage.googleapis.com paths can't reach the cache CRUD at all through LiteLLM proxy.
Server-side caching (a significant cost reduction for long-prefix workflows) is unreachable on the same wire as generateContent / streamGenerateContent.
There is no PATCH path to extend a cache's TTL, so a cache hit's lifetime is bounded by whatever TTL was set at create time — long-running sessions must re-create the cache every cycle, paying the full prompt-token cost again.

The pattern is already in the same file for /v1beta/interactions CRUD (POST/GET/DELETE + cancel), so this is a parity request, not a new architecture: copy the interactions block, swap cachedContents for interactions, add PATCH, and forward to the existing VertexAIContextCaching client at litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py which already constructs the right URLs for both Google AI Studio and Vertex AI.

Code Example

$ curl -s -o /dev/null -w '%{http_code}\n' -X POST $PROXY/v1beta/cachedContents -H "Authorization: Bearer ..." -d '{...}'
404

$ curl -s $PROXY/routes | jq '.[] | select(.path|test("v1beta"))'
"/v1beta/models/{model_name:path}:generateContent"
"/v1beta/models/{model_name:path}:streamGenerateContent"
"/v1beta/models/{model_name:path}:countTokens"

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

Add native Gemini-API cachedContents CRUD routes to litellm/proxy/google_endpoints/endpoints.py, mirroring the pattern already shipped for interactions in the same file:

POST /v1beta/cachedContents + POST /cachedContents
GET /v1beta/cachedContents + GET /cachedContents (list)
GET /v1beta/cachedContents/{name:path} + GET /cachedContents/{name:path}
PATCH /v1beta/cachedContents/{name:path} + PATCH /cachedContents/{name:path} (TTL extension)
DELETE /v1beta/cachedContents/{name:path} + DELETE /cachedContents/{name:path}

These should sign into the configured Vertex/Google credentials the same way google_generate_content already does, translate between Gemini-API resource names (cachedContents/{id}) and Vertex resource paths (projects/{P}/locations/{L}/cachedContents/{id}) as needed, and apply user_api_key_auth + pre_call_hook the same as the surrounding routes.

Motivation, pitch

Today google_endpoints/endpoints.py registers generateContent, streamGenerateContent, countTokens, and the full interactions CRUD — but the Gemini API also exposes cachedContents for server-side prompt caching, and there is no corresponding handler. Concretely, on v1.83.14-stable:

$ curl -s -o /dev/null -w '%{http_code}\n' -X POST $PROXY/v1beta/cachedContents -H "Authorization: Bearer ..." -d '{...}'
404

$ curl -s $PROXY/routes | jq '.[] | select(.path|test("v1beta"))'
"/v1beta/models/{model_name:path}:generateContent"
"/v1beta/models/{model_name:path}:streamGenerateContent"
"/v1beta/models/{model_name:path}:countTokens"

The Vertex passthrough at /vertex-ai/{endpoint:path} works against the Vertex resource shape but (a) requires default_vertex_config to be set and (b) exposes the entire Vertex surface to any key holder, which is a non-starter for proxy deployments that want to scope access to the Gemini API surface only.

Without native cachedContents routes:

Clients using genai-style SDKs that target generativelanguage.googleapis.com paths can't reach the cache CRUD at all through LiteLLM proxy.
Server-side caching (a significant cost reduction for long-prefix workflows) is unreachable on the same wire as generateContent / streamGenerateContent.
There is no PATCH path to extend a cache's TTL, so a cache hit's lifetime is bounded by whatever TTL was set at create time — long-running sessions must re-create the cache every cycle, paying the full prompt-token cost again.

Related (downstream symptoms / adjacent caching bugs):

#17696 — Gemini request fails when cache size is too small
#16995 — Vertex AI context caching fails with "global" location
#26014 — Vertex AI sends cached_content + system_instruction in same request
#27853 — _remove_strict_from_schema deletes user-defined strict (breaks Vertex AI cachedContents)

What part of LiteLLM is this about?

Proxy

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering