langchain - 💡(How to fix) Fix VectorStore.add_texts exhausts generator inputs before creating documents

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…

Root Cause

No documents are added because the generator is exhausted while materializing texts_ for validation, and the document list is then built by iterating over the original exhausted texts iterable.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and did not find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain or the specific integration package.
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Code Example

from langchain_core.documents import Document
from langchain_core.vectorstores import VectorStore


class RecordingVectorStore(VectorStore):
    def __init__(self):
        self.documents = []

    def add_documents(self, documents: list[Document], **kwargs):
        self.documents.extend(documents)
        return [str(i) for i, _ in enumerate(documents)]

    def similarity_search(self, query: str, k: int = 4, **kwargs):
        return []


store = RecordingVectorStore()
store.add_texts((text for text in ["alpha", "beta"]))

assert [doc.page_content for doc in store.documents] == ["alpha", "beta"]

---

texts_ = texts if isinstance(texts, (list, tuple)) else list(texts)

---

for text, metadata_, id_ in zip(texts, metadatas_, ids_, strict=False)
RAW_BUFFERClick to expand / collapse

Submission checklist

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and did not find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain or the specific integration package.
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package

  • langchain-core

Reproduction Steps / Example Code

VectorStore.add_texts and VectorStore.aadd_texts accept Iterable[str], but generator inputs are consumed before documents are created. A minimal subclass that implements add_documents demonstrates the issue:

from langchain_core.documents import Document
from langchain_core.vectorstores import VectorStore


class RecordingVectorStore(VectorStore):
    def __init__(self):
        self.documents = []

    def add_documents(self, documents: list[Document], **kwargs):
        self.documents.extend(documents)
        return [str(i) for i, _ in enumerate(documents)]

    def similarity_search(self, query: str, k: int = 4, **kwargs):
        return []


store = RecordingVectorStore()
store.add_texts((text for text in ["alpha", "beta"]))

assert [doc.page_content for doc in store.documents] == ["alpha", "beta"]

Expected behavior

Generator inputs should behave the same as list inputs. The example above should add two documents.

Actual behavior

No documents are added because the generator is exhausted while materializing texts_ for validation, and the document list is then built by iterating over the original exhausted texts iterable.

The same pattern appears in both add_texts and aadd_texts.

Suggested fix

Build Document objects from the already-materialized texts_ sequence instead of the original texts iterable in both methods. Add sync and async regression tests with generator inputs.

Additional context

The relevant code materializes non-list/tuple iterables:

texts_ = texts if isinstance(texts, (list, tuple)) else list(texts)

But then uses texts rather than texts_ here:

for text, metadata_, id_ in zip(texts, metadatas_, ids_, strict=False)

I can open a small PR with tests if maintainers agree this should be fixed.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

FAQ

Expected behavior

Generator inputs should behave the same as list inputs. The example above should add two documents.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - 💡(How to fix) Fix VectorStore.add_texts exhausts generator inputs before creating documents