langchain - ✅(Solved) Fix Validation order in NLTKTextSplitter raises misleading error when NLTK is missing [1 pull requests, 1 comments, 1 participants]

Official PRs (…)
ON THIS PAGE

Recommended Tools

×6

Utilities matched from this issue’s tags and category — try them while you read without losing context.

GitHub issue graph ai analysis

Paste a GitHub issue URL. We fetch that issue, discover linked issues from bodies/comments/timeline, collect linked pull requests, and produce a structured English report.

The report is written in English Markdown for sharing and archival.

Helpful · Quick feedback

Loading…
GitHub stats
langchain-ai/langchain#37075Fetched 2026-04-30 06:18:38
View on GitHub
Comments
1
Participants
1
Timeline
5
Reactions
0
Participants
Timeline (top)
labeled ×3cross-referenced ×1issue_type_added ×1

In NLTKTextSplitter.init, argument validation is performed before checking whether NLTK is installed.

Because of this, when use_span_tokenize=True and NLTK is not installed, a ValueError about the separator is raised before the actual ImportError. This can mislead users into fixing arguments before discovering the real issue.

Expected behavior: The dependency check (_HAS_NLTK) should run first, so that an ImportError is raised immediately if NLTK is not available.

Proposed change: Reorder the checks in init:

Check for NLTK availability, then validate argument combinations

Impact: Improves error clarity and developer experience without changing functionality.

I’ve already opened a PR implementing this fix. Could you please assign this issue to me?

Error Message

Improves error clarity and developer experience without changing functionality.

Root Cause

Because of this, when use_span_tokenize=True and NLTK is not installed, a ValueError about the separator is raised before the actual ImportError. This can mislead users into fixing arguments before discovering the real issue.

Fix Action

Fix / Workaround

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Other Dependencies

httpx: 0.28.1 jsonpatch: 1.33 orjson: 3.11.8 packaging: 26.0 pydantic: 2.13.3 pyyaml: 6.0.3 requests: 2.32.5 requests-toolbelt: 1.0.0 tenacity: 9.1.4 typing-extensions: 4.15.0 uuid-utils: 0.14.1 xxhash: 3.7.0 zstandard: 0.25.0

PR fix notes

PR #37076: fix(text-splitters): raise ImportError before argument validation in NLTKTextSplitter

Description (problem / solution / changelog)

Fixes #37075


NLTKTextSplitter.__init__ currently checks the argument combination before checking if NLTK is installed. Because of this, users without NLTK may see a ValueError about the separator instead of the actual ImportError.

This is misleading, since the real issue is the missing dependency.

This PR fixes the order by checking for NLTK first, so the correct error is shown immediately.

It also adds two unit tests to ensure ImportError is raised when NLTK is not available.


Twitter: @abhishekbuild LinkedIn: https://linkedin.com/in/abhishekbuild

Changed files

  • libs/text-splitters/langchain_text_splitters/nltk.py (modified, +3/-3)
  • libs/text-splitters/tests/unit_tests/test_nltk.py (added, +36/-0)

Code Example

# Ensure NLTK is NOT installed:
# pip uninstall nltk
# Run the following code:

from langchain_text_splitters import NLTKTextSplitter

splitter = NLTKTextSplitter(use_span_tokenize=True)
RAW_BUFFERClick to expand / collapse

Submission checklist

  • This is a bug, not a usage question.
  • I added a clear and descriptive title that summarizes this issue.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
  • This is not related to the langchain-community package.
  • I posted a self-contained, minimal, reproducible example. A maintainer can copy it and run it AS IS.

Package (Required)

  • langchain
  • langchain-openai
  • langchain-anthropic
  • langchain-classic
  • langchain-core
  • langchain-model-profiles
  • langchain-tests
  • langchain-text-splitters
  • langchain-chroma
  • langchain-deepseek
  • langchain-exa
  • langchain-fireworks
  • langchain-groq
  • langchain-huggingface
  • langchain-mistralai
  • langchain-nomic
  • langchain-ollama
  • langchain-openrouter
  • langchain-perplexity
  • langchain-qdrant
  • langchain-xai
  • Other / not sure / general

Related Issues / PRs

No response

Reproduction Steps / Example Code (Python)

# Ensure NLTK is NOT installed:
# pip uninstall nltk
# Run the following code:

from langchain_text_splitters import NLTKTextSplitter

splitter = NLTKTextSplitter(use_span_tokenize=True)

Description

In NLTKTextSplitter.init, argument validation is performed before checking whether NLTK is installed.

Because of this, when use_span_tokenize=True and NLTK is not installed, a ValueError about the separator is raised before the actual ImportError. This can mislead users into fixing arguments before discovering the real issue.

Expected behavior: The dependency check (_HAS_NLTK) should run first, so that an ImportError is raised immediately if NLTK is not available.

Proposed change: Reorder the checks in init:

Check for NLTK availability, then validate argument combinations

Impact: Improves error clarity and developer experience without changing functionality.

I’ve already opened a PR implementing this fix. Could you please assign this issue to me?

System Info

System Information

OS: Windows OS Version: 10.0.26200 Python Version: 3.14.0 (tags/v3.14.0:ebf955d, Oct 7 2025, 10:15:03) [MSC v.1944 64 bit (AMD64)]

Package Information

langchain_core: 1.3.2 langsmith: 0.7.38 langchain_protocol: 0.0.13

Optional packages not installed

deepagents deepagents-cli

Other Dependencies

httpx: 0.28.1 jsonpatch: 1.33 orjson: 3.11.8 packaging: 26.0 pydantic: 2.13.3 pyyaml: 6.0.3 requests: 2.32.5 requests-toolbelt: 1.0.0 tenacity: 9.1.4 typing-extensions: 4.15.0 uuid-utils: 0.14.1 xxhash: 3.7.0 zstandard: 0.25.0

extent analysis

TL;DR

Reorder the checks in __init__ to prioritize the NLTK availability check before argument validation to improve error clarity.

Guidance

  • Verify that the issue is resolved by checking if the ImportError is raised immediately when NLTK is not installed and use_span_tokenize=True.
  • Review the proposed change in the PR to ensure it correctly reorders the checks in __init__.
  • Test the fix with different argument combinations to ensure it does not introduce any new issues.
  • Consider adding a test case to cover this specific scenario to prevent similar issues in the future.

Example

No code snippet is provided as the issue is related to the internal implementation of the NLTKTextSplitter class.

Notes

The fix is specific to the NLTKTextSplitter class and does not affect other parts of the langchain package. The issue is not related to the version of langchain or other dependencies.

Recommendation

Apply the workaround by reordering the checks in __init__ as proposed in the PR, as it improves error clarity and developer experience without changing functionality.

Vote matrix · Quick signals

Works
Did the solution work? Tap to confirm.
Easy Fix
Was it a quick fix?
Time Saver
Did it save you time?
Blocking
Was it severely blocking?
Common Issue
Are others likely hitting this too?
Flaky / Intermittent
Is it intermittent?
Verified / Reproducible
Can you reproduce it reliably?
Loading…

Still need to ship something?

×6

Another batch ranked right after the header list — different links, same matching logic.

Back to top recommendations

TRENDING

langchain - ✅(Solved) Fix Validation order in NLTKTextSplitter raises misleading error when NLTK is missing [1 pull requests, 1 comments, 1 participants]