litellm - 💡(How to fix) Fix [Bug]: `cache_control_injection_points` in litellm_params causes `vertex_project` to leak as unexpected kwarg for Vertex AI gemini-3.5-flash and gemini-3.1-flash-lite (GA) models

litellm2026-05-21 16:29:45

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Error Message

litellm.InternalServerError: Vertex_aiException InternalServerError - AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

Root Cause

litellm.InternalServerError: Vertex_aiException InternalServerError -
AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

Root cause hypothesis

Code Example

litellm.InternalServerError: Vertex_aiException InternalServerError -
    AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

---

- model_name: "vertex_ai/gemini-3.5-flash"
     litellm_params:
       model: "vertex_ai/gemini-3.5-flash"
       vertex_project: os.environ/VERTEXAI_PROJECT
       vertex_location: "global"
       cache_control_injection_points:
         - location: message
           role: system
2. Send a request with a long system prompt:

---

3. Observe the error.

Expected behavior:

The system prompt is cached and the request completes successfully, matching the behavior
of other Vertex AI Gemini models that have cache_control_injection_points via the
`vertex_ai/gemini*` wildcard.

Actual behavior:

---

Root cause hypothesis

When `cache_control_injection_points` triggers cache injection on a Gemini model, it appears
to route through a code path (possibly the async OpenAI-compatible client) that does not
expect Vertex-specific kwargs (vertex_project, vertex_location), causing the kwarg
collision. The wildcard `vertex_ai/gemini*` entry with the same config does NOT reproduce the
issue, suggesting the bug may be specific to exact model name matching + cache injection.


### Relevant log output

RAW_BUFFERClick to expand / collapse

Check for existing issues

I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Hello @ishaan-jaff , @krrish-berri-2 ! When cache_control_injection_points is set in litellm_params for Vertex AI Gemini gemini-3.5-flash and gemini-3.1-flash-lite (GA) models and a request includes a large system prompt that triggers prompt caching, LiteLLM raises an InternalServerError:

litellm.InternalServerError: Vertex_aiException InternalServerError -
    AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

This does NOT happen with the same models when cache_control_injection_points is absent from their litellm_params.

Steps to Reproduce

Configure a model in the proxy with cache_control_injection_points:

- model_name: "vertex_ai/gemini-3.5-flash"
  litellm_params:
    model: "vertex_ai/gemini-3.5-flash"
    vertex_project: os.environ/VERTEXAI_PROJECT
    vertex_location: "global"
    cache_control_injection_points:
      - location: message
        role: system

Send a request with a long system prompt:

curl http://localhost:4000/chat/completions \
  -H 'Authorization: Bearer ...' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "vertex_ai/gemini-3.5-flash",
    "messages": [
      {"role": "system", "content": "<long_text>"},
      {"role": "user", "content": "who are you?"}
    ]
  }'

Observe the error.

Expected behavior:

The system prompt is cached and the request completes successfully, matching the behavior of other Vertex AI Gemini models that have cache_control_injection_points via the vertex_ai/gemini* wildcard.

Actual behavior:

litellm.InternalServerError: Vertex_aiException InternalServerError -
AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

Root cause hypothesis

When cache_control_injection_points triggers cache injection on a Gemini model, it appears to route through a code path (possibly the async OpenAI-compatible client) that does not expect Vertex-specific kwargs (vertex_project, vertex_location), causing the kwarg collision. The wildcard vertex_ai/gemini* entry with the same config does NOT reproduce the issue, suggesting the bug may be specific to exact model name matching + cache injection.

Relevant log output

litellm.InternalServerError: Vertex_aiException InternalServerError -
AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.10

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering