langchain - 💡(How to fix) Fix `embed_documents` returns 1 vector for N texts with `gemini-embedding-2`

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

GoogleGenerativeAIEmbeddings.embed_documents(["t1", "t2", ..., "tN"]) always returns a list of 1 vector regardless of how many texts are passed. embed_query("t1") works correctly and returns 1 vector for 1 text.

Error Message

Error Message and Stack Trace (if applicable)

Root Cause

Root cause (investigated)

embed_documents calls batchEmbedContents with all texts as contents. gemini-embedding-2 treats a list of contents as parts of one multimodal document (by design for cross-modal retrieval), returning 1 combined vector instead of 1 vector per text.

Fix Action

Workaround

Embed each text individually via aembed_query using asyncio.gather:

async def embed_texts(texts): tasks = [embeddings.aembed_query(t) for t in texts] return list(await asyncio.gather(*tasks))

Code Example

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-2",
    output_dimensionality=768,
)
texts = ["hello world", "foo bar", "lorem ipsum"]
result = embeddings.embed_documents(texts)

print(len(result))   # prints 1  ← expected 3
print(len(result[0])) # prints 768

---
RAW_BUFFERClick to expand / collapse

Submission checklist

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(
    model="models/gemini-embedding-2",
    output_dimensionality=768,
)
texts = ["hello world", "foo bar", "lorem ipsum"]
result = embeddings.embed_documents(texts)

print(len(result))   # prints 1  ← expected 3
print(len(result[0])) # prints 768

Error Message and Stack Trace (if applicable)

Description

GoogleGenerativeAIEmbeddings.embed_documents(["t1", "t2", ..., "tN"]) always returns a list of 1 vector regardless of how many texts are passed. embed_query("t1") works correctly and returns 1 vector for 1 text.

Root cause (investigated)

embed_documents calls batchEmbedContents with all texts as contents. gemini-embedding-2 treats a list of contents as parts of one multimodal document (by design for cross-modal retrieval), returning 1 combined vector instead of 1 vector per text.

Workaround

Embed each text individually via aembed_query using asyncio.gather:

async def embed_texts(texts): tasks = [embeddings.aembed_query(t) for t in texts] return list(await asyncio.gather(*tasks))

System Info

OS: Windows OS Version: 10.0.26200 Python Version: 3.14.4 (tags/v3.14.4:23116f9, Apr 7 2026, 14:10:54) [MSC v.1944 64 bit (AMD64)]


langchain_core: 1.4.0 langchain: 1.3.1 langchain_community: 0.4.2 langsmith: 0.8.5 langchain_classic: 1.0.7 langchain_google_genai: 4.2.3 langchain_protocol: 0.0.15 langchain_pymupdf4llm: 0.5.0 langchain_qdrant: 1.1.0 langchain_text_splitters: 1.1.2 langgraph_sdk: 0.3.15

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING