llamaIndex - 💡(How to fix) Fix [Bug]: GoogleGenAIEmbedding with "gemini-embedding-2" produces aggregated embeddings instead of individual ones with google-genai SDK v1.71.0+ [1 pull requests]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Fix Action

Fixed

Code Example

from llama_index.embeddings.google_genai import GoogleGenAIEmbedding

embed_model = GoogleGenAIEmbedding(model_name="gemini-embedding-2")

embeddings = embed_model.get_text_embedding_batch(
    ["Hello world", "This is a test", "Embeddings are useful"],
)
print(len(embeddings))  # Expected: 3, Actual: 1

---
RAW_BUFFERClick to expand / collapse

Bug Description

Description: When using google-genai SDK version 1.71.0 or higher for gemini-embedding-2. When a list of strings is passed directly to the contents parameter, it will now aggregate these inputs into a single embedding result rather than returning individual embeddings for each input in the list.

This behaviour causes a mismatch in llama-index where a list of embeddings is expected for a list of input texts (e.g., during batch embedding of documents).

Environment:

  • llama-index-embeddings-google-genai: 0.5.0
  • google-genai: 1.71.0 or higher

Possible Fix: To restore the expected behaviour, the input strings could be wrapped in explicit types.Content objects when passed to the embedding API. This will ensure that each input string is treated as a distinct entity, allowing the API to return one embedding per input string.

Relevant Documentation: Gemini API Embedding Aggregation

Version

0.14.21 (llama-index-embeddings-google-genai: 0.5.0)

Steps to Reproduce

  1. Install google-genai SDK version 1.71.0 or higher.
  2. Create a GoogleGenAIEmbedding instance and call get_text_embedding_batch with a list of strings.
  3. Observe that the returned list of embeddings does not match the input length (it returns a single aggregated embedding).
from llama_index.embeddings.google_genai import GoogleGenAIEmbedding

embed_model = GoogleGenAIEmbedding(model_name="gemini-embedding-2")

embeddings = embed_model.get_text_embedding_batch(
    ["Hello world", "This is a test", "Embeddings are useful"],
)
print(len(embeddings))  # Expected: 3, Actual: 1

Relevant Logs/Tracebacks

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

llamaIndex - 💡(How to fix) Fix [Bug]: GoogleGenAIEmbedding with "gemini-embedding-2" produces aggregated embeddings instead of individual ones with google-genai SDK v1.71.0+ [1 pull requests]