langchain - ✅(Solved) Fix bug(core): _batch and _abatch lack batch_size validation causing infinite loop [5 pull requests, 4 comments, 4 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#36647Fetched 2026-04-11 06:12:24
View on GitHub
Comments
4
Participants
4
Timeline
13
Reactions
0
Author
Timeline (top)
commented ×4labeled ×3cross-referenced ×2issue_type_added ×1

I am using aindex and index functions from langchain_core.indexing. When the batch_size parameter is mistakenly set to 0 (or any negative value), the internal utility functions _batch and _abatch exhibit harmful behavior:

  • Synchronous _batch: Enters an infinite loop (while True never exits because islice(it, 0) returns empty list indefinitely).
  • Asynchronous _abatch: Yields an empty list for every element in the input iterable, causing silent performance degradation and wasted CPU.

Expected behavior:
The functions should raise a ValueError immediately when size <= 0, failing fast and alerting the user to the invalid configuration.

Actual behavior:
_batch hangs the process; _abatch produces a flood of empty batches without any error.

Proposed fix:
Add a simple validation at the beginning of both functions:

if size <= 0:
    raise ValueError("Batch size must be a positive integer.")

Error Message

Error Message and Stack Trace (if applicable)

_batch hangs the process; _abatch produces a flood of empty batches without any error.

Root Cause

  • Synchronous _batch: Enters an infinite loop (while True never exits because islice(it, 0) returns empty list indefinitely).
  • Asynchronous _abatch: Yields an empty list for every element in the input iterable, causing silent performance degradation and wasted CPU.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Other Dependencies

aiohttp: 3.13.3 async-timeout: 4.0.3 chromadb: 1.5.5 dataclasses-json: 0.6.7 httpx: 0.28.1 httpx-sse: 0.4.3 huggingface-hub: 1.7.1 jsonpatch: 1.33 langgraph: 1.1.2 numpy: 2.2.6 openai: 2.28.0 opentelemetry-api: 1.40.0 opentelemetry-sdk: 1.40.0 orjson: 3.11.7 packaging: 25.0 pydantic: 2.12.5 pydantic-settings: 2.13.1 pytest: 9.0.3 pyyaml: 6.0.3 PyYAML: 6.0.3 requests: 2.32.5 requests-toolbelt: 1.0.0 rich: 14.3.3 sentence-transformers: 5.3.0 SQLAlchemy: 2.0.48 sqlalchemy: 2.0.48 tenacity: 9.1.4 tiktoken: 0.12.0 tokenizers: 0.22.2 transformers: 5.3.0 typing-extensions: 4.15.0 uuid-utils: 0.14.1 websockets: 16.0 xxhash: 3.6.0 zstandard: 0.25.0

PR fix notes

PR #36648: fix(core): validate batch_size > 0 in _batch and _abatch

Description (problem / solution / changelog)

Fixes #36647

Add input validation to _batch and _abatch to raise ValueError when size <= 0, preventing infinite loop in synchronous indexing and empty-batch spam in async indexing.

Verification

  • Added unit tests test_batch_size_zero_raises_error and test_abatch_size_zero_raises_error.
  • Verified locally with manual test script and confirmed existing tests pass.

Changed files

  • libs/core/langchain_core/indexing/api.py (modified, +6/-0)
  • libs/core/tests/unit_tests/indexing/test_public_api.py (modified, +56/-0)

PR #36663: fix(core): validate batch_size in _batch and _abatch to prevent infinite loop

Description (problem / solution / changelog)

Summary

  • Add ValueError for non-positive batch_size in _batch and _abatch utility functions
  • Previously, batch_size=0 caused _batch to loop forever and _abatch to yield endless empty lists

Fixes #36647

Test plan

  • Added test_batch_validation and test_abatch_validation covering 0 and negative sizes
  • Existing test_abatch still passes

AI assistance (Claude Code) was used. All changes reviewed and validated by the submitting human.

🤖 Generated with Claude Code

Changed files

  • libs/core/langchain_core/indexing/api.py (modified, +6/-0)
  • libs/core/tests/unit_tests/indexing/test_indexing.py (modified, +21/-0)

PR #36673: fix(core): validate batch_size > 0 in _batch and _abatch to prevent infinite loop

Description (problem / solution / changelog)

_batch(0, items) and _abatch(0, items) hang forever — zero batch size means no progress per iteration. Added a ValueError guard at the top of both generators. Fixes #36647

Changed files

  • libs/core/langchain_core/indexing/api.py (modified, +4/-0)

PR #36676: fix(core): validate batch_size > 0 in _batch and _abatch

Description (problem / solution / changelog)

Fixes #36647

Add input validation to _batch and _abatch in langchain_core.indexing.api to raise ValueError when size <= 0, preventing the infinite loop in _batch and the empty-batch flood in _abatch described in the issue.

Changed files

  • libs/core/langchain_core/indexing/api.py (modified, +4/-0)

PR #36723: fix: typo in Visitor error msg, Blob.as_bytes_io string support, batch_size validation

Description (problem / solution / changelog)

Summary

Fix 1 — Typo in Visitor._validate_func error message (fixes #36701)

The error message for disallowed operators said 'Allowed comparators are...' instead of 'Allowed operators are...'. Fixed the incorrect word.

Fix 2 — Blob.as_bytes_io() raises NotImplementedError for string data (fixes #36667)

When blob.data is a str, the method raised NotImplementedError. Now encodes the string to UTF-8 bytes and yields a BytesIO object, consistent with the bytes branch.

Fix 3 — _batch/_abatch missing batch_size validation (fixes #36647)

When batch_size <= 0 was passed, both functions would loop infinitely. Added a guard at the top of each:

if size <= 0:
    raise ValueError("batch_size must be a positive integer")

Changed files

  • libs/core/langchain_core/documents/base.py (modified, +2/-0)
  • libs/core/langchain_core/indexing/api.py (modified, +4/-0)
  • libs/core/langchain_core/structured_query.py (modified, +1/-1)

Code Example

from langchain_core.indexing.api import _batch, _abatch

# Synchronous _batch with size=0 leads to infinite loop
list(_batch(0, [1, 2, 3]))

# Asynchronous _abatch with size=0 yields empty batches endlessly
import asyncio
from typing import AsyncIterable

async def async_iter(data):
    for item in data:
        yield item

async def test_abatch():
    async for batch in _abatch(0, async_iter([1, 2, 3])):
        print(batch)  # Prints empty lists indefinitely

asyncio.run(test_abatch())

---
RAW_BUFFERClick to expand / collapse

Checked other resources

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

from langchain_core.indexing.api import _batch, _abatch

# Synchronous _batch with size=0 leads to infinite loop
list(_batch(0, [1, 2, 3]))

# Asynchronous _abatch with size=0 yields empty batches endlessly
import asyncio
from typing import AsyncIterable

async def async_iter(data):
    for item in data:
        yield item

async def test_abatch():
    async for batch in _abatch(0, async_iter([1, 2, 3])):
        print(batch)  # Prints empty lists indefinitely

asyncio.run(test_abatch())

Error Message and Stack Trace (if applicable)

Description

I am using aindex and index functions from langchain_core.indexing. When the batch_size parameter is mistakenly set to 0 (or any negative value), the internal utility functions _batch and _abatch exhibit harmful behavior:

  • Synchronous _batch: Enters an infinite loop (while True never exits because islice(it, 0) returns empty list indefinitely).
  • Asynchronous _abatch: Yields an empty list for every element in the input iterable, causing silent performance degradation and wasted CPU.

Expected behavior:
The functions should raise a ValueError immediately when size <= 0, failing fast and alerting the user to the invalid configuration.

Actual behavior:
_batch hangs the process; _abatch produces a flood of empty batches without any error.

Proposed fix:
Add a simple validation at the beginning of both functions:

if size <= 0:
    raise ValueError("Batch size must be a positive integer.")

### System Info

System Information
------------------
> OS:  Linux
> OS Version:  #1 SMP Tue Nov 5 00:21:55 UTC 2024
> Python Version:  3.10.20 (main, Mar 11 2026, 17:46:40) [GCC 14.3.0]

Package Information
-------------------
> langchain_core: 1.2.19
> langchain: 1.2.13
> langchain_community: 0.4.1
> langsmith: 0.7.17
> langchain_chroma: 1.1.0
> langchain_classic: 1.0.3
> langchain_deepseek: 1.0.1
> langchain_huggingface: 1.2.1
> langchain_openai: 1.1.11
> langchain_text_splitters: 1.1.1
> langgraph_sdk: 0.3.11

Optional packages not installed
-------------------------------
> deepagents
> deepagents-cli

Other Dependencies
------------------
> aiohttp: 3.13.3
> async-timeout: 4.0.3
> chromadb: 1.5.5
> dataclasses-json: 0.6.7
> httpx: 0.28.1
> httpx-sse: 0.4.3
> huggingface-hub: 1.7.1
> jsonpatch: 1.33
> langgraph: 1.1.2
> numpy: 2.2.6
> openai: 2.28.0
> opentelemetry-api: 1.40.0
> opentelemetry-sdk: 1.40.0
> orjson: 3.11.7
> packaging: 25.0
> pydantic: 2.12.5
> pydantic-settings: 2.13.1
> pytest: 9.0.3
> pyyaml: 6.0.3
> PyYAML: 6.0.3
> requests: 2.32.5
> requests-toolbelt: 1.0.0
> rich: 14.3.3
> sentence-transformers: 5.3.0
> SQLAlchemy: 2.0.48
> sqlalchemy: 2.0.48
> tenacity: 9.1.4
> tiktoken: 0.12.0
> tokenizers: 0.22.2
> transformers: 5.3.0
> typing-extensions: 4.15.0
> uuid-utils: 0.14.1
> websockets: 16.0
> xxhash: 3.6.0
> zstandard: 0.25.0

extent analysis

TL;DR

The proposed fix is to add a validation check at the beginning of the _batch and _abatch functions to raise a ValueError when the size parameter is less than or equal to 0.

Guidance

  • Add a simple validation at the beginning of both _batch and _abatch functions to check if the size parameter is less than or equal to 0.
  • If the condition is met, raise a ValueError with a descriptive message indicating that the batch size must be a positive integer.
  • Verify the fix by testing the functions with invalid size parameters (e.g., 0, -1) and ensure that a ValueError is raised as expected.
  • Consider adding additional error handling or input validation to prevent similar issues in the future.

Example

def _batch(size, iterable):
    if size <= 0:
        raise ValueError("Batch size must be a positive integer.")
    # existing implementation

async def _abatch(size, async_iterable):
    if size <= 0:
        raise ValueError("Batch size must be a positive integer.")
    # existing implementation

Notes

The proposed fix is a simple and effective solution to prevent the infinite loop and silent performance degradation issues. However, it is essential to thoroughly test the updated functions to ensure that they behave as expected with valid and invalid inputs.

Recommendation

Apply the proposed workaround by adding the validation check to the _batch and _abatch functions, as it is a straightforward and effective solution to address the issue.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING