litellm - 💡(How to fix) Fix [Bug]: `cache_control_injection_points` in litellm_params causes `vertex_project` to leak as unexpected kwarg for Vertex AI gemini-3.5-flash and gemini-3.1-flash-lite (GA) models

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Error Message

litellm.InternalServerError: Vertex_aiException InternalServerError - AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

Root Cause

litellm.InternalServerError: Vertex_aiException InternalServerError -
AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

Root cause hypothesis

Code Example

litellm.InternalServerError: Vertex_aiException InternalServerError -
    AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

---

- model_name: "vertex_ai/gemini-3.5-flash"
     litellm_params:
       model: "vertex_ai/gemini-3.5-flash"
       vertex_project: os.environ/VERTEXAI_PROJECT
       vertex_location: "global"
       cache_control_injection_points:
         - location: message
           role: system
2. Send a request with a long system prompt:

---

3. Observe the error.

Expected behavior:

The system prompt is cached and the request completes successfully, matching the behavior
of other Vertex AI Gemini models that have cache_control_injection_points via the
`vertex_ai/gemini*` wildcard.

Actual behavior:

---

Root cause hypothesis

When `cache_control_injection_points` triggers cache injection on a Gemini model, it appears
to route through a code path (possibly the async OpenAI-compatible client) that does not
expect Vertex-specific kwargs (vertex_project, vertex_location), causing the kwarg
collision. The wildcard `vertex_ai/gemini*` entry with the same config does NOT reproduce the
issue, suggesting the bug may be specific to exact model name matching + cache injection.


### Relevant log output
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Hello @ishaan-jaff , @krrish-berri-2 ! When cache_control_injection_points is set in litellm_params for Vertex AI Gemini gemini-3.5-flash and gemini-3.1-flash-lite (GA) models and a request includes a large system prompt that triggers prompt caching, LiteLLM raises an InternalServerError:

litellm.InternalServerError: Vertex_aiException InternalServerError -
    AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

This does NOT happen with the same models when cache_control_injection_points is absent from their litellm_params.

Steps to Reproduce

  1. Configure a model in the proxy with cache_control_injection_points:
    - model_name: "vertex_ai/gemini-3.5-flash"
      litellm_params:
        model: "vertex_ai/gemini-3.5-flash"
        vertex_project: os.environ/VERTEXAI_PROJECT
        vertex_location: "global"
        cache_control_injection_points:
          - location: message
            role: system
  2. Send a request with a long system prompt:
curl http://localhost:4000/chat/completions \
  -H 'Authorization: Bearer ...' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "vertex_ai/gemini-3.5-flash",
    "messages": [
      {"role": "system", "content": "<long_text>"},
      {"role": "user", "content": "who are you?"}
    ]
  }'
  1. Observe the error.

Expected behavior:

The system prompt is cached and the request completes successfully, matching the behavior of other Vertex AI Gemini models that have cache_control_injection_points via the vertex_ai/gemini* wildcard.

Actual behavior:

litellm.InternalServerError: Vertex_aiException InternalServerError -
AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

Root cause hypothesis

When cache_control_injection_points triggers cache injection on a Gemini model, it appears to route through a code path (possibly the async OpenAI-compatible client) that does not expect Vertex-specific kwargs (vertex_project, vertex_location), causing the kwarg collision. The wildcard vertex_ai/gemini* entry with the same config does NOT reproduce the issue, suggesting the bug may be specific to exact model name matching + cache injection.

Relevant log output

litellm.InternalServerError: Vertex_aiException InternalServerError -
AsyncCompletions.create() got an unexpected keyword argument 'vertex_project'

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.83.10

Twitter / LinkedIn details

No response

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING