langchain - ✅(Solved) Fix `InMemoryVectorStore.add_documents` silently drops documents on embedding count mismatch [2 pull requests, 3 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36745Fetched 2026-04-17 08:23:09
View on GitHub
Comments
3
Participants
4
Timeline
10
Reactions
0
Timeline (top)
commented ×3cross-referenced ×2labeled ×2mentioned ×1

add_documents and aadd_documents in InMemoryVectorStore zip documents against embedding vectors using strict=False (the default). If the embedding model returns a different number of vectors than documents passed in, the zip silently truncates to the shorter list. Documents get lost without any error.

This can happen when a batched embedding provider silently truncates input or when a custom embeddings implementation has an off-by-one bug. The fix is to use strict=True so that a ValueError is raised immediately on mismatch.

Error Message

Error Message and Stack Trace (if applicable)

No error raised. The third document is silently dropped. add_documents and aadd_documents in InMemoryVectorStore zip documents against embedding vectors using strict=False (the default). If the embedding model returns a different number of vectors than documents passed in, the zip silently truncates to the shorter list. Documents get lost without any error.

Root Cause

add_documents and aadd_documents in InMemoryVectorStore zip documents against embedding vectors using strict=False (the default). If the embedding model returns a different number of vectors than documents passed in, the zip silently truncates to the shorter list. Documents get lost without any error.

This can happen when a batched embedding provider silently truncates input or when a custom embeddings implementation has an off-by-one bug. The fix is to use strict=True so that a ValueError is raised immediately on mismatch.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

PR fix notes

PR #36767: fix(core): raise ValueError in InMemoryVectorStore on embedding count…

Description (problem / solution / changelog)

Problem

Currently, InMemoryVectorStore suffers from a silent data loss bug. If an embedding provider (or a custom implementation) returns a different number of vectors than the provided documents, the internal zip(documents, vectors, strict=False) operation silently truncates the output. Documents are dropped without raising any errors, corrupting the downstream agent's context.

Solution

This PR introduces explicit length validation to intercept mismatches before the database processes them.

Key Changes:

  • in_memory.py: Added strict length validation in both add_documents and aadd_documents. If len(documents) != len(vectors), it now immediately raises a descriptive ValueError.
  • test_in_memory.py: Updated test_inmemory_call_embeddings_async. The new strict validation successfully caught an unconfigured AsyncMock in the test suite, so the mock was updated to return the correct number of dummy vectors to match the document count.

Fixes #36745

Changed files

  • libs/core/langchain_core/vectorstores/in_memory.py (modified, +14/-0)
  • libs/core/tests/unit_tests/vectorstores/test_in_memory.py (modified, +4/-2)

PR #36772: fix(core): raise ValueError on embedding count mismatch in InMemoryVectorStore

Description (problem / solution / changelog)

Summary

Previously, InMemoryVectorStore.add_documents and aadd_documents used zip(documents, vectors, strict=False), which silently truncated documents when the embedding model returned fewer vectors than documents. This could lead to silent data loss.

Changed to strict=True so that a ValueError is raised immediately on mismatch, making debugging easier and preventing silent data loss.

Fixes #36745

Test plan

Tested with a custom Embeddings class that returns fewer vectors than documents:

class BrokenEmbeddings(Embeddings):
    def embed_documents(self, texts):
        return [[0.1, 0.2, 0.3] for _ in texts[:-1]]
    def embed_query(self, text):
        return [0.1, 0.2, 0.3]

store = InMemoryVectorStore(embedding=BrokenEmbeddings())
docs = [Document(page_content="first"), Document(page_content="second"), Document(page_content="third")]
# Previously: silently returned 2 ids. Now: raises ValueError
store.add_documents(docs)
  • Change is backward compatible (only affects broken usage that was already silently losing data)
  • Existing tests pass
  • Fix applies to both sync add_documents and async aadd_documents

Changed files

  • libs/core/langchain_core/vectorstores/in_memory.py (modified, +2/-2)

Code Example

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.embeddings import Embeddings
from langchain_core.documents import Document


class BrokenEmbeddings(Embeddings):
    """Returns fewer vectors than documents."""

    def embed_documents(self, texts: list[str]) -> list[list[float]]:
        # Silently drops the last document
        return [[0.1, 0.2, 0.3] for _ in texts[:-1]]

    def embed_query(self, text: str) -> list[float]:
        return [0.1, 0.2, 0.3]


store = InMemoryVectorStore(embedding=BrokenEmbeddings())
docs = [
    Document(page_content="first"),
    Document(page_content="second"),
    Document(page_content="third"),
]

# This should raise but silently drops the last document
ids = store.add_documents(docs)
print(f"Sent 3 docs, got {len(ids)} ids back")
# Output: Sent 3 docs, got 2 ids back

---

System Information
------------------
> OS:  Windows
> OS Version:  10.0.26200
> Python Version:  3.13.7

Package Information
-------------------
> langchain_core: 1.3.0a1
> langsmith: 0.7.13
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain-core

Reproduction Steps / Example Code (Python)

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.embeddings import Embeddings
from langchain_core.documents import Document


class BrokenEmbeddings(Embeddings):
    """Returns fewer vectors than documents."""

    def embed_documents(self, texts: list[str]) -> list[list[float]]:
        # Silently drops the last document
        return [[0.1, 0.2, 0.3] for _ in texts[:-1]]

    def embed_query(self, text: str) -> list[float]:
        return [0.1, 0.2, 0.3]


store = InMemoryVectorStore(embedding=BrokenEmbeddings())
docs = [
    Document(page_content="first"),
    Document(page_content="second"),
    Document(page_content="third"),
]

# This should raise but silently drops the last document
ids = store.add_documents(docs)
print(f"Sent 3 docs, got {len(ids)} ids back")
# Output: Sent 3 docs, got 2 ids back

Error Message and Stack Trace (if applicable)

No error raised. The third document is silently dropped.

Description

add_documents and aadd_documents in InMemoryVectorStore zip documents against embedding vectors using strict=False (the default). If the embedding model returns a different number of vectors than documents passed in, the zip silently truncates to the shorter list. Documents get lost without any error.

This can happen when a batched embedding provider silently truncates input or when a custom embeddings implementation has an off-by-one bug. The fix is to use strict=True so that a ValueError is raised immediately on mismatch.

System Info

System Information
------------------
> OS:  Windows
> OS Version:  10.0.26200
> Python Version:  3.13.7

Package Information
-------------------
> langchain_core: 1.3.0a1
> langsmith: 0.7.13

extent analysis

TL;DR

To prevent documents from being silently dropped when adding them to an InMemoryVectorStore, use strict=True when calling add_documents to ensure that a ValueError is raised if the number of documents and embedding vectors do not match.

Guidance

  • The issue arises from the zip function's behavior with strict=False, which silently truncates the longer list when the lengths of documents and embedding vectors do not match.
  • To fix this, modify the add_documents method in InMemoryVectorStore to use strict=True when zipping documents against embedding vectors.
  • Verify the fix by intentionally creating a mismatch between the number of documents and embedding vectors and checking that a ValueError is raised.
  • Consider adding error handling to custom embedding implementations to prevent silent truncation of input.

Example

from langchain_core.vectorstores import InMemoryVectorStore
from langchain_core.embeddings import Embeddings
from langchain_core.documents import Document
from itertools import zip_longest

class FixedInMemoryVectorStore(InMemoryVectorStore):
    def add_documents(self, documents):
        # Using zip_longest to find mismatches
        docs = [doc.page_content for doc in documents]
        vectors = self.embedding.embed_documents(docs)
        for doc, vector in zip_longest(docs, vectors):
            if doc is None or vector is None:
                raise ValueError("Mismatch between documents and vectors")
        # Rest of the method implementation...

store = FixedInMemoryVectorStore(embedding=BrokenEmbeddings())
docs = [
    Document(page_content="first"),
    Document(page_content="second"),
    Document(page_content="third"),
]

try:
    ids = store.add_documents(docs)
except ValueError as e:
    print(e)

Notes

This solution assumes that the add_documents method can be modified. If this is not possible, consider creating a custom vector store class that inherits from InMemoryVectorStore and overrides the add_documents method.

Recommendation

Apply the workaround by modifying the add_documents method to use strict=True or a similar approach to detect and raise an error on mismatches, as this directly addresses the root cause of the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING