langchain - 💡(How to fix) Fix VectorStore.add_texts exhausts generator inputs before creating documents

langchain2026-05-24 18:20:06

ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

GitHub issue URL

Helpful · Quick feedback

Root Cause

No documents are added because the generator is exhausted while materializing texts_ for validation, and the document list is then built by iterating over the original exhausted texts iterable.

Fix Action

Fix / Workaround

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and did not find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain or the specific integration package.
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Code Example

from langchain_core.documents import Document
from langchain_core.vectorstores import VectorStore


class RecordingVectorStore(VectorStore):
    def __init__(self):
        self.documents = []

    def add_documents(self, documents: list[Document], **kwargs):
        self.documents.extend(documents)
        return [str(i) for i, _ in enumerate(documents)]

    def similarity_search(self, query: str, k: int = 4, **kwargs):
        return []


store = RecordingVectorStore()
store.add_texts((text for text in ["alpha", "beta"]))

assert [doc.page_content for doc in store.documents] == ["alpha", "beta"]

---

texts_ = texts if isinstance(texts, (list, tuple)) else list(texts)

---

for text, metadata_, id_ in zip(texts, metadatas_, ids_, strict=False)

RAW_BUFFERClick to expand / collapse

Submission checklist

This is a bug, not a usage question.
I added a clear and descriptive title that summarizes this issue.
I used the GitHub search to find a similar question and did not find it.
I am sure that this is a bug in LangChain rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain or the specific integration package.
This is not related to the langchain-community package.
I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package

langchain-core

Reproduction Steps / Example Code

VectorStore.add_texts and VectorStore.aadd_texts accept Iterable[str], but generator inputs are consumed before documents are created. A minimal subclass that implements add_documents demonstrates the issue:

from langchain_core.documents import Document
from langchain_core.vectorstores import VectorStore


class RecordingVectorStore(VectorStore):
    def __init__(self):
        self.documents = []

    def add_documents(self, documents: list[Document], **kwargs):
        self.documents.extend(documents)
        return [str(i) for i, _ in enumerate(documents)]

    def similarity_search(self, query: str, k: int = 4, **kwargs):
        return []


store = RecordingVectorStore()
store.add_texts((text for text in ["alpha", "beta"]))

assert [doc.page_content for doc in store.documents] == ["alpha", "beta"]

Expected behavior

Generator inputs should behave the same as list inputs. The example above should add two documents.

Actual behavior

No documents are added because the generator is exhausted while materializing texts_ for validation, and the document list is then built by iterating over the original exhausted texts iterable.

The same pattern appears in both add_texts and aadd_texts.

Suggested fix

Build Document objects from the already-materialized texts_ sequence instead of the original texts iterable in both methods. Add sync and async regression tests with generator inputs.

Additional context

The relevant code materializes non-list/tuple iterables:

texts_ = texts if isinstance(texts, (list, tuple)) else list(texts)

But then uses texts rather than texts_ here:

for text, metadata_, id_ in zip(texts, metadatas_, ids_, strict=False)

I can open a small PR with tests if maintainers agree this should be fixed.

Vote matrix · Quick signals

Works

Did the solution work? Tap to confirm.

Easy Fix

Was it a quick fix?

Time Saver

Did it save you time?

Blocking

Was it severely blocking?

Common Issue

Are others likely hitting this too?

Flaky / Intermittent

Is it intermittent?

Verified / Reproducible

Can you reproduce it reliably?

FAQ

Expected behavior

Generator inputs should behave the same as list inputs. The example above should add two documents.

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Data

Security

Network

Code

UI/UX

Text

System

Multimedia

Protocol

API

Engineering

langchain - 💡(How to fix) Fix VectorStore.add_texts exhausts generator inputs before creating documents

Recommended Tools

GitHub issue graph ai analysis

Root Cause

Fix Action

Fix / Workaround

Code Example

Submission checklist

Package

Reproduction Steps / Example Code

Expected behavior

Actual behavior

Suggested fix

Additional context

FAQ

Expected behavior

Still need to ship something?

TRENDING