litellm - 💡(How to fix) Fix [Feature]: Add native /v1beta/cachedContents CRUD routes to google_endpoints (parity with /v1beta/interactions)

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fix / Workaround

  • POST /v1beta/cachedContents + POST /cachedContents
  • GET /v1beta/cachedContents + GET /cachedContents (list)
  • GET /v1beta/cachedContents/{name:path} + GET /cachedContents/{name:path}
  • PATCH /v1beta/cachedContents/{name:path} + PATCH /cachedContents/{name:path} (TTL extension)
  • DELETE /v1beta/cachedContents/{name:path} + DELETE /cachedContents/{name:path}
  1. Clients using genai-style SDKs that target generativelanguage.googleapis.com paths can't reach the cache CRUD at all through LiteLLM proxy.
  2. Server-side caching (a significant cost reduction for long-prefix workflows) is unreachable on the same wire as generateContent / streamGenerateContent.
  3. There is no PATCH path to extend a cache's TTL, so a cache hit's lifetime is bounded by whatever TTL was set at create time — long-running sessions must re-create the cache every cycle, paying the full prompt-token cost again.

The pattern is already in the same file for /v1beta/interactions CRUD (POST/GET/DELETE + cancel), so this is a parity request, not a new architecture: copy the interactions block, swap cachedContents for interactions, add PATCH, and forward to the existing VertexAIContextCaching client at litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py which already constructs the right URLs for both Google AI Studio and Vertex AI.

Code Example

$ curl -s -o /dev/null -w '%{http_code}\n' -X POST $PROXY/v1beta/cachedContents -H "Authorization: Bearer ..." -d '{...}'
404

$ curl -s $PROXY/routes | jq '.[] | select(.path|test("v1beta"))'
"/v1beta/models/{model_name:path}:generateContent"
"/v1beta/models/{model_name:path}:streamGenerateContent"
"/v1beta/models/{model_name:path}:countTokens"
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

The Feature

Add native Gemini-API cachedContents CRUD routes to litellm/proxy/google_endpoints/endpoints.py, mirroring the pattern already shipped for interactions in the same file:

  • POST /v1beta/cachedContents + POST /cachedContents
  • GET /v1beta/cachedContents + GET /cachedContents (list)
  • GET /v1beta/cachedContents/{name:path} + GET /cachedContents/{name:path}
  • PATCH /v1beta/cachedContents/{name:path} + PATCH /cachedContents/{name:path} (TTL extension)
  • DELETE /v1beta/cachedContents/{name:path} + DELETE /cachedContents/{name:path}

These should sign into the configured Vertex/Google credentials the same way google_generate_content already does, translate between Gemini-API resource names (cachedContents/{id}) and Vertex resource paths (projects/{P}/locations/{L}/cachedContents/{id}) as needed, and apply user_api_key_auth + pre_call_hook the same as the surrounding routes.

Motivation, pitch

Today google_endpoints/endpoints.py registers generateContent, streamGenerateContent, countTokens, and the full interactions CRUD — but the Gemini API also exposes cachedContents for server-side prompt caching, and there is no corresponding handler. Concretely, on v1.83.14-stable:

$ curl -s -o /dev/null -w '%{http_code}\n' -X POST $PROXY/v1beta/cachedContents -H "Authorization: Bearer ..." -d '{...}'
404

$ curl -s $PROXY/routes | jq '.[] | select(.path|test("v1beta"))'
"/v1beta/models/{model_name:path}:generateContent"
"/v1beta/models/{model_name:path}:streamGenerateContent"
"/v1beta/models/{model_name:path}:countTokens"

The Vertex passthrough at /vertex-ai/{endpoint:path} works against the Vertex resource shape but (a) requires default_vertex_config to be set and (b) exposes the entire Vertex surface to any key holder, which is a non-starter for proxy deployments that want to scope access to the Gemini API surface only.

Without native cachedContents routes:

  1. Clients using genai-style SDKs that target generativelanguage.googleapis.com paths can't reach the cache CRUD at all through LiteLLM proxy.
  2. Server-side caching (a significant cost reduction for long-prefix workflows) is unreachable on the same wire as generateContent / streamGenerateContent.
  3. There is no PATCH path to extend a cache's TTL, so a cache hit's lifetime is bounded by whatever TTL was set at create time — long-running sessions must re-create the cache every cycle, paying the full prompt-token cost again.

The pattern is already in the same file for /v1beta/interactions CRUD (POST/GET/DELETE + cancel), so this is a parity request, not a new architecture: copy the interactions block, swap cachedContents for interactions, add PATCH, and forward to the existing VertexAIContextCaching client at litellm/llms/vertex_ai/context_caching/vertex_ai_context_caching.py which already constructs the right URLs for both Google AI Studio and Vertex AI.

Related (downstream symptoms / adjacent caching bugs):

  • #17696 — Gemini request fails when cache size is too small
  • #16995 — Vertex AI context caching fails with "global" location
  • #26014 — Vertex AI sends cached_content + system_instruction in same request
  • #27853 — _remove_strict_from_schema deletes user-defined strict (breaks Vertex AI cachedContents)

What part of LiteLLM is this about?

Proxy

LiteLLM is hiring a founding backend engineer, are you interested in joining us and shipping to all our users?

No

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

litellm - 💡(How to fix) Fix [Feature]: Add native /v1beta/cachedContents CRUD routes to google_endpoints (parity with /v1beta/interactions)