litellm - ✅(Solved) Fix [Bug]: `vertex_ai/gemini-embedding-2-preview` routes to `:predict` endpoint, returns `FAILED_PRECONDITION` [4 pull requests, 2 comments, 2 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
BerriAI/litellm#23508Fetched 2026-04-08 00:43:55
View on GitHub
Comments
2
Participants
2
Timeline
13
Reactions
0
Author
Participants
Timeline (top)
referenced ×4cross-referenced ×3labeled ×3commented ×2

Error Message

Error: Vertex_aiException BadRequestError - {"error":{"code":400,"message":"Precondition check failed.","status":"FAILED_PRECONDITION"}}

Root Cause

PR #23322 added GoogleBatchEmbeddings handler for this model, but only wired it up for custom_llm_provider == "gemini" (line 5140 in main.py). The vertex_ai path (line 5193) still falls through to vertex_embedding.embedding() which calls the legacy :predict endpoint:

POST .../models/gemini-embedding-2-preview:predict  → 400 FAILED_PRECONDITION

Note: vertex_ai/gemini-embedding-001 works with :predict, but Google dropped :predict support for the newer gemini-embedding-2-preview. The vertex_ai/ prefix needs to route through GoogleBatchEmbeddings as well.

Fix Action

Fixed

PR fix notes

PR #23518: fix: route vertex_ai gemini-embedding-2-preview to correct endpoint

Description (problem / solution / changelog)

Summary

  • Adds a model-name fallback so that vertex_ai/gemini-embedding-* models always route to the embedContent endpoint (via GoogleBatchEmbeddings) instead of the legacy :predict endpoint.
  • When get_model_info() cannot resolve the uses_embed_content flag (e.g. stale or missing model cost map), the code now checks if the model name starts with gemini-embedding and sets uses_embed_content = True as a fallback.
  • Fix applied in both litellm/main.py (embedding dispatch) and litellm/llms/vertex_ai/common_utils.py (URL construction).

Fixes #23508

Motivation

vertex_ai/gemini-embedding-2-preview returns 400 FAILED_PRECONDITION because the legacy :predict endpoint is not supported for newer gemini-embedding-* models. PR #23322 added routing via uses_embed_content in the model cost map, but when model info lookup fails (exception path), requests still fall through to :predict. This adds a defensive name-based check so routing is correct regardless of model cost map state.

Test plan

  • Existing test test_vertex_ai_text_only_embedding_uses_embed_content continues to pass
  • Verify litellm.embedding(model="vertex_ai/gemini-embedding-2-preview", ...) uses :embedContent endpoint
  • Verify vertex_ai/gemini-embedding-001 (older model) still works via its existing path

Changed files

  • litellm/llms/vertex_ai/common_utils.py (modified, +5/-0)
  • litellm/main.py (modified, +5/-0)

PR #23520: fix(vertex_ai): fix embedding routing for gemini-embedding models

Description (problem / solution / changelog)

Summary

Fixes #23508 — vertex_ai/gemini-embedding-2-preview routes to :predict endpoint and returns FAILED_PRECONDITION.

Two bugs fixed:

  1. Hardcoded /v1/ in _get_embedding_url (common_utils.py line 273): The non-digit model branch ignored the vertex_api_version parameter and always used /v1/, while the digit branch correctly used {vertex_api_version}. Fixed to use the parameter consistently.

  2. Silent fallthrough when get_model_info fails: Both _get_embedding_url and main.py routing wrap get_model_info in a bare except Exception that defaults uses_embed_content=False, causing the request to fall through to the legacy :predict handler. Added a model-name fallback: any model starting with gemini-embedding is routed to the embedContent endpoint and GoogleBatchEmbeddings handler, matching Google's API design where these models only support embedContent, not :predict.

Changes

  • litellm/llms/vertex_ai/common_utils.py: Fix _get_embedding_url to use vertex_api_version instead of hardcoded /v1/; add gemini-embedding prefix fallback for uses_embed_content
  • litellm/main.py: Add gemini-embedding prefix fallback in vertex_ai embedding routing so models route to GoogleBatchEmbeddings even if model info lookup fails

Test plan

  • Verify vertex_ai/gemini-embedding-2-preview text-only embedding calls use :embedContent endpoint
  • Verify vertex_ai/gemini-embedding-001 still uses :predict endpoint (no regression)
  • Verify gemini/gemini-embedding-2-preview still works via Gemini API path
  • Existing tests in tests/litellm/llms/vertex_ai/test_gemini_batch_embeddings.py pass

🤖 Generated with Claude Code

Changed files

  • litellm/llms/vertex_ai/common_utils.py (modified, +7/-1)
  • litellm/main.py (modified, +6/-0)

PR #23620: test: add regression test for vertex_ai embedding-2 routing

Description (problem / solution / changelog)

Summary

Adds a regression test for #23508 that verifies vertex_ai/gemini-embedding-2-preview is routed through GoogleBatchEmbeddings (:embedContent endpoint) and does not fall through to the legacy vertex_embedding handler that uses :predict.

The core routing fix was already merged in #23322, but there was no test specifically asserting:

  • The legacy vertex_embedding.embedding() handler is not called
  • The :predict endpoint is not used in the URL
  • The request body uses embedContent format (content key) rather than predict format (instances key)

This test catches the exact failure mode described in the issue: Google dropped :predict support for gemini-embedding-2-preview, so the legacy handler returns 400 FAILED_PRECONDITION.

Test plan

  • New test test_vertex_ai_gemini_embedding_2_routes_to_batch_handler_not_predict added
  • Run pytest tests/litellm/llms/vertex_ai/test_gemini_batch_embeddings.py -v

Closes #23508

Changed files

  • tests/litellm/llms/vertex_ai/test_gemini_batch_embeddings.py (modified, +78/-0)

PR #23322: [Feat]: Add support for gemini embedding 2 preview

Description (problem / solution / changelog)

Relevant issues

<!-- e.g. "Fixes #000" -->

Pre-Submission checklist

Please complete all items before asking a LiteLLM maintainer to review your PR

  • I have Added testing in the tests/test_litellm/ directory, Adding at least 1 test is a hard requirement - see details
  • My PR passes all unit tests on make test-unit
  • My PR's scope is as isolated as possible, it only solves 1 specific problem
  • I have requested a Greptile review by commenting @greptileai and received a Confidence Score of at least 4/5 before requesting a maintainer review

CI (LiteLLM team)

CI status guideline:

  • 50-55 passing tests: main is stable with minor issues.
  • 45-49 passing tests: acceptable but needs attention
  • <= 40 passing tests: unstable; be careful with your merges and assess the risk.
  • Branch creation CI run
    Link:

  • CI run for the last commit
    Link:

  • Merge / cherry-pick CI run
    Links:

Type

🆕 New Feature

Changes

<img width="1049" height="523" alt="image" src="https://github.com/user-attachments/assets/9dfc4666-5a38-4450-873f-26f62017f340" />

Changed files

Code Example

import litellm

litellm.embedding(
    model="vertex_ai/gemini-embedding-2-preview",
    input=["Hello, world!"],
    vertex_project="your-project-id",
    vertex_location="us-central1",
)

---

from google import genai
client = genai.Client(vertexai=True, project="your-project-id", location="us-central1")
result = client.models.embed_content(model="gemini-embedding-2-preview", contents="Hello, world!")
# works fine

---

POST .../models/gemini-embedding-2-preview:predict  → 400 FAILED_PRECONDITION

---
RAW_BUFFERClick to expand / collapse

Check for existing issues

  • I have searched the existing issues and checked that my issue is not a duplicate.

What happened?

Version

  • SDK: 1.82.1 (latest on PyPI)
  • Proxy: 1.81.0

Both SDK and Proxy are affected. The proxy health check also fails, marking the model as permanently unhealthy.

Reproduce

import litellm

litellm.embedding(
    model="vertex_ai/gemini-embedding-2-preview",
    input=["Hello, world!"],
    vertex_project="your-project-id",
    vertex_location="us-central1",
)

Error: Vertex_aiException BadRequestError - {"error":{"code":400,"message":"Precondition check failed.","status":"FAILED_PRECONDITION"}}

The same model works fine via google-genai SDK directly:

from google import genai
client = genai.Client(vertexai=True, project="your-project-id", location="us-central1")
result = client.models.embed_content(model="gemini-embedding-2-preview", contents="Hello, world!")
# works fine

Root Cause

PR #23322 added GoogleBatchEmbeddings handler for this model, but only wired it up for custom_llm_provider == "gemini" (line 5140 in main.py). The vertex_ai path (line 5193) still falls through to vertex_embedding.embedding() which calls the legacy :predict endpoint:

POST .../models/gemini-embedding-2-preview:predict  → 400 FAILED_PRECONDITION

Note: vertex_ai/gemini-embedding-001 works with :predict, but Google dropped :predict support for the newer gemini-embedding-2-preview. The vertex_ai/ prefix needs to route through GoogleBatchEmbeddings as well.

Additional Notes

Steps to Reproduce

Described above.

Relevant log output

What part of LiteLLM is this about?

Proxy

What LiteLLM version are you on ?

v1.81.0

Twitter / LinkedIn details

No response

extent analysis

Fix Plan

To fix the issue, we need to update the litellm library to route the vertex_ai path through GoogleBatchEmbeddings for the gemini-embedding-2-preview model.

Step-by-Step Solution

  1. Update the main.py file: Modify the main.py file to include the vertex_ai path in the GoogleBatchEmbeddings handler.
  2. Add a conditional statement: Add a conditional statement to check if the model is gemini-embedding-2-preview and if the provider is vertex_ai.
  3. Use the GoogleBatchEmbeddings handler: If the condition is met, use the GoogleBatchEmbeddings handler instead of the legacy :predict endpoint.

Example Code

if custom_llm_provider == "gemini" or (custom_llm_provider == "vertex_ai" and model == "gemini-embedding-2-preview"):
    # Use GoogleBatchEmbeddings handler
    return GoogleBatchEmbeddings(model, input, vertex_project, vertex_location)
else:
    # Use the legacy :predict endpoint
    return vertex_embedding.embedding(model, input, vertex_project, vertex_location)

Verification

To verify that the fix worked, run the following code:

import litellm

litellm.embedding(
    model="vertex_ai/gemini-embedding-2-preview",
    input=["Hello, world!"],
    vertex_project="your-project-id",
    vertex_location="us-central1",
)

If the fix is successful, the code should execute without any errors and return the expected output.

Extra Tips

  • Make sure to update the litellm library to the latest version after applying the fix.
  • If you encounter any issues, check the LiteLLM documentation and GitHub repository for any updates or known issues related to the gemini-embedding-2-preview model.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING